Skip to content

Issue #91 — Independent Verification Report (2026-04-23)

A fresh, from-scratch verification of issue #91 against origin/main @ 9b6064f. Written by a verifier with no use of docs/verification/issue91.md or issue91-verify.sh (authored by the same contributor who implemented the #91 stack). Assertions were derived solely from the 32 Done When checkboxes in the issue body.

Purpose: provide an audit trail for whether #91 is ready to close and, where the implementation diverges from the ticket, to explicitly flag the divergence rather than silently adopt the implementation as the spec.

Method

  1. Pulled issue #91 body via gh issue view 91 --repo tradai-bot/tradai. Extracted the 32 Done When items.
  2. Authenticated to AWS dev (account 600802701449, eu-central-1).
  3. Started four fresh Step Functions executions on tradai-retraining-workflow-dev with execution names prefixed independent-* so they are distinguishable from the author's verify-* runs:
Scenario Execution name
Failure (invalid model_name) independent-fail-20260423T074145Z
Happy (full training + registration) independent-happy-20260423T074145Z
Skip (after happy) independent-skip-20260423T075330Z
config_version_id passthrough independent-cvid-20260423T075330Z
  1. For each Done-When item, issued a query against AWS / MLflow REST / S3 that maps directly to the ticket text. No ticket-text was reinterpreted silently.

Happy-path evidence (primary)

Field Value
Execution ARN arn:aws:states:eu-central-1:600802701449:execution:tradai-retraining-workflow-dev:independent-happy-20260423T074145Z
Status SUCCEEDED
State path NormalizeInput → CheckRetrainingNeeded → EvaluateRetrainingNeed → RunRetraining → CompareModels → DecidePromotion → KeepCurrentModel → UpdateRetrainingState → NotifyCompletion
RunRetraining duration 339 s (5.65 min)
ECS task ID c9f67b6517a647259b3b9a5e37dcb1c8 (launchType=FARGATE, capacityProvider=FARGATEFARGATE_SPOT weight=1 preferred by cluster default but Spot capacity unavailable in the placement window)
MLflow run fb8dc111d79549e088fed947e8c8eadf in experiment default_training (id=9)
Model registry E2ETestStrategy v24, source=runs:/fb8dc111…/model, stage None
S3 artefacts 76 objects, 152.2 KiB under s3://tradai-mlflow-dev/artifacts/9/fb8dc111…/
DynamoDB row tradai-workflow-state-dev HASH run_id=<ARN>, status=completed, created_at=2026-04-23T07:42:38Z, updated_at=2026-04-23T07:46:57Z
last_retrained written tradai-retraining-state-dev[E2ETestStrategy].last_retrained = 2026-04-23T07:48:01.396Z

Per-criterion status

Legend: ✅ PASS · 🔶 literal FAIL / spirit PASS · ⚠️ partial FAIL · ❌ FAIL · ∅ not independently verifiable · ✳ substitute/meta

E2ETestStrategy created (5 items)

# Criterion Status Evidence
1 Strategy in tradai-strategies/strategies/e2e-test-strategy/ 7 files present in tradai-bot/strategies#11 (OPEN, CLEAN, MERGEABLE)
2 Unit tests pass Lint & Test step Run lint and tests = success on PR #11
3 Lint + typecheck pass same step
4 Docker image pushed to ECR tradai/e2eteststrategy:latest sha256:33c057d… pushed 2026-04-17
5 Smoke backtest completes locally Not reproducible from this env; Smoke Backtest CI job blocked by #354 (Binance HTTP 451 to GitHub runners)

Happy path (11 items)

# Criterion Status Evidence
6 Workflow succeeds, all states green state path above
7 ECS training task runs without errors aws logs filter-log-events /ecs/tradai/dev with pattern ?ERROR ?Traceback ?CRITICAL ?Exception → 0 events
8 Training completes in ~5 minutes 339 s
9 MLflow experiment created for E2ETestStrategy 🔶 Experiment strategies/e2eteststrategy does not exist. The run lives in the shared default_training experiment (id=9) with tag strategy=E2ETestStrategy. Literal ticket text unsatisfied.
10 Model metrics + parameters logged correctly ⚠️ 3 metrics present (training_profit_pct, training_sharpe_ratio, training_total_trades). 0 params — issue-body-required n_estimators=50, learning_rate=0.1, max_depth=3 are NOT in params nor in tags. → fixed by this PR's commit 1.
11 Model artefacts in S3 76 objects at expected path
12 Feature importance stored (rsi, sma-ratio) No feature_importance.{json,csv} in S3. No booster pickle. training_features_list in per-sub-train _metadata.json names the features but not importance scores. → fixed by this PR's commit 2.
13 Model registered in MLflow Model Registry E2ETestStrategy v24
14 CompareModels returns decision + confidence decision=needs_more_data, confidence=0.0
15 DecidePromotion routes correctly routed to KeepCurrentModel (expected for needs_more_data)
16 NotifyCompletion fires retraining_success notification_type=retraining_success, sns=true, sent=true

Skip path (1 item)

# Criterion Status Evidence
17 Fresh model + force=falseSkipRetraining Execution after happy, path ends at SkipRetraining, RunRetraining not entered

Failure path (3 items)

# Criterion Status Evidence
18 Bad input triggers NotifyFailure model_name="not a valid model name" → path … → HandleInvalidModel → NotifyFailure
19 retraining_failed fires and includes error details ⚠️ Type fires, but SNS body = "Model <name> retraining failed" — no error cause in the message. Lambda receives $.error.Cause via details.error but does not render it. → fixed by this PR's commit 3.
20 No orphaned ECS tasks after failure RunRetraining not entered; aws ecs list-tasks --cluster tradai-dev = empty

config_version_id (2 items)

# Criterion Status Evidence
21 Defaults to "" when absent NormalizeInput.output.config_version_id=''; live ECS env on my training task: CONFIG_VERSION_ID=""
22 Passes through when supplied NormalizeInput preserves; CheckRetrainingNeeded input carries; ASL wires CONFIG_VERSION_ID = $.config_version_id in RunRetraining.Parameters.Overrides.ContainerOverrides[0].Environment

Manual verification (5 items)

# Criterion Status Evidence
23 Step Functions console correct 13 states, all expected transitions, all 5 task states have Catch: States.ALL (CheckRetrainingNeeded/RunRetraining/CompareModels/PromoteModel → NotifyFailure; UpdateRetrainingState → NotifyCompletion on catch — intentional: a state-write failure should not silence a successful training)
24 CloudWatch logs clean duplicate of DW#7
25 MLflow UI shows experiment/run/metrics/model REST substitute (/ajax-api/2.0/mlflow/runs/search, /runs/get, /model-versions/search)
26 S3 model artefacts at expected path duplicate of DW#11
27 DynamoDB tradai-models-{env} updated with job status 🔶 Table tradai-models-dev does not exist. Closest analog tradai-workflow-state-dev IS updated by the happy-path training job (row run_id=<ARN>, status=completed). But the failure path writes nothing — my ARN_FAIL has 0 rows in any DynamoDB table. Literal ticket text (table name) unsatisfied; functional analog partial.

Deliverables (5 items)

# Criterion Status Evidence
28 Execution ARN of a successful run independent-happy-20260423T074145Z
29 MLflow experiment screenshot REST-substitute: tags/metrics/status captured in this report
30 Model Registry screenshot REST-substitute: v24 captured
31 Document issues found This report + PR description
32 Update runbook docs/runbooks/retraining-workflow.md 311 lines, landed via #363

Tally

  • ✅ PASS: 21
  • 🔶 Literal FAIL / spirit PASS: 2 (#9, #27)
  • ⚠️ Partial FAIL: 2 (#10, #19) — both fixed by this PR
  • ❌ FAIL: 1 (#12) — fixed by this PR
  • ∅ Not indep verifiable: 1 (#5)
  • ✳ Substitute/meta: 5 (#29, #30, #31, #32, #24 is duplicate)

Side findings (outside the DW list, surfaced during investigation)

All of these are legitimate production issues I observed but not tracked by #91 text. They are flagged here so maintainers can triage; fixes are NOT in this PR.

  1. IAM: tradai-notify-completion-dev lambda receives AccessDeniedException on tradai-notifications-dev for dynamodb:GetItem (throttle check) and dynamodb:PutItem (audit record). SNS primary path still works; deduplication + audit silently broken.
  2. Reproducibility manifest: artifacts/.../reproducibility/*.json shows git_commit="unknown" and feature_schema.feature_count=0 despite 4 real features (present in sub-train metadata). Manifest is shipped but not functional.
  3. tradai-workflow-state-dev schema gap: strategy_name column in the row written by the training job is empty. Column is declared in the table schema but the training handler never populates it.
  4. Heartbeat divergence: runbook table documents heartbeat_seconds=300; ASL actual = HeartbeatSeconds: 900. Safe today (5.65-min training < 15 min), but documentation-implementation drift.
  5. Fargate placement: cluster default strategy prefers FARGATE_SPOT (weight=1) over FARGATE (weight=0, base=0). My run landed on FARGATE because Spot capacity was temporarily unavailable in eu-central-1 for the chosen task size. Fallback works; expected.

Recommendation (at the time of this report)

Do not close #91 at 9b6064f. Three criteria with unambiguous ticket-text violations (#10, #12, #19) need code, not reinterpretation. Two with divergent ticket text (#9, #27) need an explicit product-owner decision — either code to match the ticket, or amend the ticket.

This PR (fix/91-verification-gaps) addresses #10, #12, #19 with code + tests. #9 and #27 are called out in the PR description for product-owner judgement. Side findings (1–5 above) remain as follow-up issues.

Resolution (post-report)

All three code gaps (#10, #12, #19) were fixed in PR #370. An additional gap discovered during re-verification — format-valid but unknown model names bypassing validation when force=true — was fixed in PR #371 (ALLOWED_MODELS allowlist). Both PRs merged to main. Final verification with 4 live Step Functions executions confirmed all scenarios pass. Caveats #9 and #27 accepted as implementation decisions.

How to reproduce this report

Run docs/verification/issue91-verify.sh from the repo root with AWS_PROFILE=tradai and a valid SSO session. The script executes all checks from sections A–G of docs/verification/issue91.md and prints PASS/FAIL per check. See that file for manual check-by-check reproduction.