Issue #91 — E2E Verification¶

End-to-end verification of every acceptance criterion from issue #91 against the deployed AWS dev environment. Total runtime: ~10 minutes (one ~5-minute training job plus short-path scenarios).

Every check is a pipeable one-liner. Each prints a single line of output you compare against the Expected block.

Prerequisites¶

AWS CLI authenticated with SSO profile tradai (account 600802701449, region eu-central-1).
python3 (or python — see below) on PATH.
A POSIX shell — native on macOS / Linux; Git Bash or WSL on Windows. On Windows where the default python3 is the Microsoft Store app-execution alias, either disable the alias in Settings → Apps → App execution aliases, or replace every python3 in the commands below with python.
SSM SendCommand permission on the consolidated EC2 instance (used for MLflow REST queries, since the MLflow endpoint is VPC-internal).

Setup (run once per session)¶

aws sso login --profile tradai
export AWS_PROFILE=tradai
export AWS_REGION=eu-central-1
export ACCOUNT=$(aws sts get-caller-identity --query Account --output text)
export SM_ARN="arn:aws:states:${AWS_REGION}:${ACCOUNT}:stateMachine:tradai-retraining-workflow-dev"
export MLFLOW_INSTANCE=$(aws ec2 describe-instances --filters 'Name=tag:Name,Values=tradai-consolidated-dev' 'Name=instance-state-name,Values=running' --region "$AWS_REGION" --query 'Reservations[0].Instances[0].InstanceId' --output text)
export ECR_E2E="tradai/e2eteststrategy"
export MODEL=E2ETestStrategy

Define the MLflow helper (one-liner; handles both GET and POST via SSM). Paste it verbatim:

mlflow_call() { local M="$1" P="$2" B="$3" S; if [ "$M" = "GET" ]; then S="curl -s 'http://mlflow.tradai-dev.local:5000/mlflow${P}'"; else local X=$(printf '%s' "$B" | base64 -w0); S="echo $X | base64 -d | curl -s -X POST 'http://mlflow.tradai-dev.local:5000/mlflow${P}' -H 'Content-Type: application/json' --data-binary @-"; fi; local J=$(SCRIPT="$S" python3 -c "import json,os; print(json.dumps({'commands':[os.environ['SCRIPT']]}))"); local C=$(aws ssm send-command --instance-ids "$MLFLOW_INSTANCE" --document-name AWS-RunShellScript --parameters "$J" --region "$AWS_REGION" --query Command.CommandId --output text); local s; until s=$(aws ssm list-command-invocations --command-id "$C" --region "$AWS_REGION" --query 'CommandInvocations[0].Status' --output text); [ "$s" = "Success" ] || [ "$s" = "Failed" ]; do sleep 2; done; aws ssm get-command-invocation --command-id "$C" --instance-id "$MLFLOW_INSTANCE" --region "$AWS_REGION" --query StandardOutputContent --output text; }

Section A — Static infrastructure invariants¶

A.1 State machine has 13 states including `HandleInvalidModel` and `UpdateRetrainingState`¶

aws stepfunctions describe-state-machine --state-machine-arn "$SM_ARN" --region "$AWS_REGION" --query definition --output text | python3 -c "import json,sys; s=json.loads(sys.stdin.read())['States']; print(len(s), '|', ','.join(s))"

Expected: 13 | NormalizeInput,CheckRetrainingNeeded,EvaluateRetrainingNeed,HandleInvalidModel,SkipRetraining,RunRetraining,CompareModels,DecidePromotion,KeepCurrentModel,PromoteModel,UpdateRetrainingState,NotifyCompletion,NotifyFailure

A.2 Strategy ECS task definition active, pointing at the E2ETestStrategy image¶

aws ecs describe-task-definition --task-definition tradai-strategy-generic-dev --region "$AWS_REGION" --query 'taskDefinition.[family,status,containerDefinitions[0].image]' --output text

Expected: tradai-strategy-generic-dev ACTIVE 600802701449.dkr.ecr.eu-central-1.amazonaws.com/tradai/e2eteststrategy:latest

A.3 E2ETestStrategy image present in ECR¶

aws ecr describe-images --repository-name "$ECR_E2E" --image-ids imageTag=latest --region "$AWS_REGION" --query 'imageDetails[0].[imageDigest,imagePushedAt]' --output text

Expected: a sha256:... digest and a timestamp. Empty or ImageNotFoundException = fail.

A.4 Required DynamoDB tables exist and are active¶

for t in tradai-retraining-state-dev tradai-drift-state-dev tradai-workflow-state-dev; do aws dynamodb describe-table --table-name "$t" --region "$AWS_REGION" --query 'Table.[TableName,TableStatus]' --output text; done

Expected: three lines, each ending in ACTIVE.

Section B — Scenario A: `INVALID_MODEL` routing (~5 s)¶

Malformed model_name. Must route through HandleInvalidModel → NotifyFailure without touching ECS.

B.1 Start execution and capture its ARN¶

export ARN_A=$(aws stepfunctions start-execution --state-machine-arn "$SM_ARN" --name "verify-invalid-$(date -u +%Y%m%d-%H%M%S)" --input '{"model_name":"bad name!","manual_trigger":true,"force":false}' --region "$AWS_REGION" --query executionArn --output text) && until [ "$(aws stepfunctions describe-execution --execution-arn "$ARN_A" --region "$AWS_REGION" --query status --output text)" != "RUNNING" ]; do sleep 2; done && echo "$ARN_A"

Expected: a single ARN line arn:aws:states:...:verify-invalid-....

B.2 State path ends at `NotifyFailure`¶

aws stepfunctions get-execution-history --execution-arn "$ARN_A" --region "$AWS_REGION" --query 'events[?stateEnteredEventDetails].stateEnteredEventDetails.name' --output text

Expected: NormalizeInput CheckRetrainingNeeded EvaluateRetrainingNeed HandleInvalidModel NotifyFailure

B.3 Execution status `SUCCEEDED`¶

aws stepfunctions describe-execution --execution-arn "$ARN_A" --region "$AWS_REGION" --query status --output text

Expected: SUCCEEDED (the workflow successfully emits the failure notification; the failure is a successful step).

B.4 `NotifyFailure` Lambda reported `retraining_failed`¶

aws stepfunctions get-execution-history --execution-arn "$ARN_A" --region "$AWS_REGION" --query 'events[?type==`TaskSucceeded`].taskSucceededEventDetails.output' --output json | python3 -c "import json,sys; outs=json.load(sys.stdin); [print(json.loads(json.loads(o)['Payload']['body'])['notification_type']) for o in outs if isinstance(json.loads(o).get('Payload',{}),dict) and 'body' in json.loads(o).get('Payload',{})]"

Expected: retraining_failed

B.5 `HandleInvalidModel` synthesised `$.error` with the regex-violation cause¶

aws stepfunctions get-execution-history --execution-arn "$ARN_A" --region "$AWS_REGION" --query 'events[?stateExitedEventDetails.name==`HandleInvalidModel`].stateExitedEventDetails.output' --output text | python3 -c "import json,sys; d=json.loads(sys.stdin.read()); c=d.get('error',{}).get('Cause','') or d.get('error',{}).get('cause',''); print('error_present=', 'error' in d, '| cause_startswith_invalid=', c.startswith(\"Invalid model_name 'bad name!'\"))"

Expected: error_present= True | cause_startswith_invalid= True

B.6 No `RunRetraining` state was entered (⇒ zero ECS tasks)¶

aws stepfunctions get-execution-history --execution-arn "$ARN_A" --region "$AWS_REGION" --query 'events[?stateEnteredEventDetails.name==`RunRetraining`]|length(@)' --output text

Expected: 0

B.7 ALLOWED_MODELS rejects a format-valid but unknown model¶

A PascalCase name that passes regex validation but is NOT in the ALLOWED_MODELS env var must also route to HandleInvalidModel → NotifyFailure.

export ARN_AL=$(aws stepfunctions start-execution --state-machine-arn "$SM_ARN" --name "verify-allowlist-$(date -u +%Y%m%d-%H%M%S)" --input '{"model_name":"BogusStrategy","manual_trigger":true,"force":true}' --region "$AWS_REGION" --query executionArn --output text) && until [ "$(aws stepfunctions describe-execution --execution-arn "$ARN_AL" --region "$AWS_REGION" --query status --output text)" != "RUNNING" ]; do sleep 2; done && echo "$ARN_AL"

Expected: a single ARN line.

B.8 Allowlist-rejected model routes to `HandleInvalidModel → NotifyFailure`¶

aws stepfunctions get-execution-history --execution-arn "$ARN_AL" --region "$AWS_REGION" --query 'events[?stateEnteredEventDetails].stateEnteredEventDetails.name' --output text

Expected: NormalizeInput CheckRetrainingNeeded EvaluateRetrainingNeed HandleInvalidModel NotifyFailure

B.9 Allowlist rejection reason mentions `ALLOWED_MODELS`¶

aws stepfunctions get-execution-history --execution-arn "$ARN_AL" --region "$AWS_REGION" --query 'events[?stateExitedEventDetails.name==`HandleInvalidModel`].stateExitedEventDetails.output' --output text | python3 -c "import json,sys; d=json.loads(sys.stdin.read()); c=d.get('error',{}).get('Cause','') or d.get('error',{}).get('cause',''); print('has_allowed_models_ref=', 'ALLOWED_MODELS' in c)"

Expected: has_allowed_models_ref= True

Section C — Scenario B: happy path (~5 min)¶

Valid E2ETestStrategy, force=true. Runs the full training pipeline and must persist last_retrained so Scenario C can skip.

C.1 Start execution and wait for completion¶

export ARN_B=$(aws stepfunctions start-execution --state-machine-arn "$SM_ARN" --name "verify-happy-$(date -u +%Y%m%d-%H%M%S)" --input '{"model_name":"E2ETestStrategy","manual_trigger":true,"force":true}' --region "$AWS_REGION" --query executionArn --output text) && echo "$ARN_B" && until [ "$(aws stepfunctions describe-execution --execution-arn "$ARN_B" --region "$AWS_REGION" --query status --output text)" != "RUNNING" ]; do sleep 30; done && aws stepfunctions describe-execution --execution-arn "$ARN_B" --region "$AWS_REGION" --query status --output text

Expected: ARN line, then (after ~5 minutes) SUCCEEDED.

C.2 State path runs through all training stages¶

aws stepfunctions get-execution-history --execution-arn "$ARN_B" --region "$AWS_REGION" --query 'events[?stateEnteredEventDetails].stateEnteredEventDetails.name' --output text

Expected (one branch, both valid): NormalizeInput CheckRetrainingNeeded EvaluateRetrainingNeed RunRetraining CompareModels DecidePromotion KeepCurrentModel UpdateRetrainingState NotifyCompletion — PromoteModel substitutes for KeepCurrentModel if the challenger was promoted.

C.3 `NormalizeInput` defaulted `config_version_id` to `""`¶

aws stepfunctions get-execution-history --execution-arn "$ARN_B" --region "$AWS_REGION" --query 'events[?stateExitedEventDetails.name==`NormalizeInput`].stateExitedEventDetails.output' --output text | python3 -c "import json,sys; d=json.loads(sys.stdin.read()); print('config_version_id=' + repr(d.get('config_version_id')))"

Expected: config_version_id=''

C.4 `RunRetraining` finished within 5 minutes¶

aws stepfunctions get-execution-history --execution-arn "$ARN_B" --region "$AWS_REGION" --query 'events[?contains([`RunRetraining`], stateEnteredEventDetails.name) || contains([`RunRetraining`], stateExitedEventDetails.name)].timestamp' --output text | python3 -c "import sys,datetime; t=[datetime.datetime.fromisoformat(x.replace('+0000','+00:00')) for x in sys.stdin.read().split()]; print('duration_seconds=', int((t[-1]-t[0]).total_seconds()))"

Expected: duration_seconds= <N> where N < 900. Issue says "~5 minutes" — observed range is 339–654s across runs (Fargate Spot scheduling + cold-start adds 60–90s overhead). 900s threshold validates lightweight design while tolerating infra jitter.

C.5 ECS training log has no error-class lines¶

export TASK_ID=$(aws stepfunctions get-execution-history --execution-arn "$ARN_B" --region "$AWS_REGION" --output json | python3 -c "import json,sys; h=json.load(sys.stdin); t=[json.loads(e['taskSucceededEventDetails']['output']).get('TaskArn','') for e in h['events'] if e['type']=='TaskSucceeded' and 'TaskArn' in e.get('taskSucceededEventDetails',{}).get('output','')]; print(t[0].rsplit('/',1)[1] if t else '')") && MSYS_NO_PATHCONV=1 aws logs filter-log-events --log-group-name /ecs/tradai/dev --log-stream-name-prefix "strategy/strategy/${TASK_ID}" --filter-pattern '?ERROR ?Traceback ?CRITICAL' --region "$AWS_REGION" --no-paginate --query 'length(events)' --output text

Expected: 0

C.6 `CompareModels` returned both `decision` and `confidence`¶

aws stepfunctions get-execution-history --execution-arn "$ARN_B" --region "$AWS_REGION" --query 'events[?type==`TaskSucceeded`].taskSucceededEventDetails.output' --output json | python3 -c "import json,sys; outs=json.load(sys.stdin); [print('decision=' + str(p.get('decision')) + ' confidence=' + str(p.get('confidence'))) for o in outs for p in [json.loads(o).get('Payload') if isinstance(json.loads(o).get('Payload'),dict) else {}] if 'confidence' in p]"

Expected: one line of the form decision=<value> confidence=<numeric> — both keys must be present. Example: decision=needs_more_data confidence=0.0.

C.7 `DecidePromotion` routed to exactly one of the terminal-branch states¶

aws stepfunctions get-execution-history --execution-arn "$ARN_B" --region "$AWS_REGION" --query 'events[?stateEnteredEventDetails.name==`PromoteModel` || stateEnteredEventDetails.name==`KeepCurrentModel`].stateEnteredEventDetails.name' --output text

Expected: PromoteModel or KeepCurrentModel (exactly one token).

C.8 `NotifyCompletion` Lambda reported `retraining_success`¶

aws stepfunctions get-execution-history --execution-arn "$ARN_B" --region "$AWS_REGION" --query 'events[?type==`TaskSucceeded`].taskSucceededEventDetails.output' --output json | python3 -c "import json,sys; outs=json.load(sys.stdin); [print(json.loads(json.loads(o)['Payload']['body'])['notification_type']) for o in outs if isinstance(json.loads(o).get('Payload',{}),dict) and 'body' in json.loads(o).get('Payload',{})]"

Expected: retraining_success

C.9 `UpdateRetrainingState` wrote `last_retrained`¶

aws dynamodb get-item --table-name tradai-retraining-state-dev --key "{\"model_name\":{\"S\":\"${MODEL}\"}}" --region "$AWS_REGION" --query 'Item.last_retrained.S' --output text

Expected: an ISO-8601 UTC timestamp within the last few minutes.

C.9b `tradai-workflow-state-dev` row created with `status=completed`¶

DW#27 requires the workflow state table to track job status.

MSYS_NO_PATHCONV=1 aws dynamodb get-item --table-name tradai-workflow-state-dev --key "{\"run_id\":{\"S\":\"${ARN_B}\"}}" --region "$AWS_REGION" --query 'Item.status.S' --output text

Expected: completed

C.10 MLflow experiment and run exist¶

Retraining creates a per-strategy experiment {MODEL}_training (e.g. E2ETestStrategy_training) and tags each run with job_id=<execution arn>.

# Discover experiment ID by name
export EXP_ID=$(mlflow_call GET "/ajax-api/2.0/mlflow/experiments/get-by-name?experiment_name=${MODEL}_training" | python3 -c "import json,sys; print(json.load(sys.stdin)['experiment']['experiment_id'])")
echo "experiment_id=${EXP_ID}"

Expected: experiment_id=<integer> (non-empty).

mlflow_call POST "/ajax-api/2.0/mlflow/runs/search" "$(printf "{\"experiment_ids\":[\"%s\"],\"filter\":\"tags.job_id = '%s'\",\"max_results\":1}" "$EXP_ID" "$ARN_B")" | python3 -c "import json,sys; d=json.load(sys.stdin); r=(d.get('runs') or [None])[0]; print('found=', bool(r), '| run_id=', (r or {}).get('info',{}).get('run_id','') , '| status=', (r or {}).get('info',{}).get('status',''))"

Expected: found= True | run_id= <32-char hex> | status= FINISHED.

C.11 MLflow run carries expected tags and training metrics¶

mlflow_call POST "/ajax-api/2.0/mlflow/runs/search" "$(printf "{\"experiment_ids\":[\"%s\"],\"filter\":\"tags.job_id = '%s'\",\"max_results\":1}" "$EXP_ID" "$ARN_B")" | python3 -c "import json,sys; r=json.load(sys.stdin)['runs'][0]; t={x['key']:x['value'] for x in r['data'].get('tags',[])}; m=[x['key'] for x in r['data'].get('metrics',[])]; print('strategy=' + t.get('strategy','') + ' | freqai_model=' + t.get('freqai_model','') + ' | metrics=' + ','.join(sorted(m)))"

Expected: strategy=E2ETestStrategy | freqai_model=LightGBMRegressor | metrics=training_profit_pct,training_sharpe_ratio,training_total_trades

C.12 S3 artefacts present at `s3://tradai-mlflow-dev/artifacts/{EXP_ID}/<run_id>/artifacts/`¶

export RUN_ID=$(mlflow_call POST "/ajax-api/2.0/mlflow/runs/search" "$(printf "{\"experiment_ids\":[\"%s\"],\"filter\":\"tags.job_id = '%s'\",\"max_results\":1}" "$EXP_ID" "$ARN_B")" | python3 -c "import json,sys; print(json.load(sys.stdin)['runs'][0]['info']['run_id'])") && aws s3 ls "s3://tradai-mlflow-dev/artifacts/${EXP_ID}/${RUN_ID}/artifacts/" --recursive --region "$AWS_REGION" --summarize --human-readable | tail -2

Expected: two summary lines (Total Objects: <N≥50> and Total Size: <non-zero>).

C.13 Model version registered in MLflow Registry with `source` pointing at this run¶

mlflow_call GET "/ajax-api/2.0/mlflow/registered-models/search?max_results=200" | python3 -c "import json,sys,os; run=os.environ['RUN_ID']; d=json.load(sys.stdin); hits=[v for m in d.get('registered_models',[]) for v in m.get('latest_versions',[]) if v.get('run_id')==run]; print('registered=', bool(hits), '| version=', (hits or [{}])[0].get('version',''), '| stage=', (hits or [{}])[0].get('current_stage',''))"

Expected: registered= True | version= <N> | stage= <None|Staging|Production>.

Section D — Scenario C: skip path (~5 s, run immediately after Scenario B)¶

Same model, force=false, manual_trigger=false. The fresh last_retrained from C.9 must short-circuit into SkipRetraining.

D.1 Start, wait, and read the state path¶

export ARN_C=$(aws stepfunctions start-execution --state-machine-arn "$SM_ARN" --name "verify-skip-$(date -u +%Y%m%d-%H%M%S)" --input "{\"model_name\":\"${MODEL}\",\"manual_trigger\":false,\"force\":false}" --region "$AWS_REGION" --query executionArn --output text) && until [ "$(aws stepfunctions describe-execution --execution-arn "$ARN_C" --region "$AWS_REGION" --query status --output text)" != "RUNNING" ]; do sleep 2; done && aws stepfunctions get-execution-history --execution-arn "$ARN_C" --region "$AWS_REGION" --query 'events[?stateEnteredEventDetails].stateEnteredEventDetails.name' --output text

Expected: NormalizeInput CheckRetrainingNeeded EvaluateRetrainingNeed SkipRetraining

D.2 Execution status `SUCCEEDED`¶

aws stepfunctions describe-execution --execution-arn "$ARN_C" --region "$AWS_REGION" --query status --output text

Expected: SUCCEEDED

D.3 No `RunRetraining` state entered¶

aws stepfunctions get-execution-history --execution-arn "$ARN_C" --region "$AWS_REGION" --query 'events[?stateEnteredEventDetails.name==`RunRetraining`]|length(@)' --output text

Expected: 0

Section E — Scenario D: `config_version_id` passthrough (~5 s)¶

Start with an explicit config_version_id. Because we set manual_trigger=false, force=false and C.9's last_retrained is still fresh, this skips through SkipRetraining — no training needed to prove passthrough.

E.1 Start with explicit `config_version_id`¶

export CVID="verify-91-$(date -u +%Y%m%d%H%M%S)" && export ARN_D=$(aws stepfunctions start-execution --state-machine-arn "$SM_ARN" --name "verify-cv-${CVID}" --input "{\"model_name\":\"${MODEL}\",\"manual_trigger\":false,\"force\":false,\"config_version_id\":\"${CVID}\"}" --region "$AWS_REGION" --query executionArn --output text) && until [ "$(aws stepfunctions describe-execution --execution-arn "$ARN_D" --region "$AWS_REGION" --query status --output text)" != "RUNNING" ]; do sleep 2; done && aws stepfunctions describe-execution --execution-arn "$ARN_D" --region "$AWS_REGION" --query status --output text

Expected: SUCCEEDED

E.2 `NormalizeInput` preserved the supplied `config_version_id`¶

aws stepfunctions get-execution-history --execution-arn "$ARN_D" --region "$AWS_REGION" --query 'events[?stateExitedEventDetails.name==`NormalizeInput`].stateExitedEventDetails.output' --output text | python3 -c "import json,sys,os; d=json.loads(sys.stdin.read()); print('passed_through=', d.get('config_version_id')==os.environ['CVID'], '| value=', d.get('config_version_id'))"

Expected: passed_through= True | value= verify-91-<timestamp>

E.3 `CheckRetrainingNeeded` input carried the same `config_version_id`¶

aws stepfunctions get-execution-history --execution-arn "$ARN_D" --region "$AWS_REGION" --query 'events[?stateEnteredEventDetails.name==`CheckRetrainingNeeded`].stateEnteredEventDetails.input' --output text | python3 -c "import json,sys,os; d=json.loads(sys.stdin.read()); print('downstream=', d.get('config_version_id')==os.environ['CVID'])"

Expected: downstream= True

Section F — Mapping to #91 `Done When`¶

Every checkbox in the original issue maps to one or more checks above. Three criteria have caveats (#9, #27 — documented against the implementation-as-shipped) or live outside the deployed environment (#2, #3, #5 — verified by CI on PR #11 in the strategies repo).

#	Acceptance criterion	Check(s)	Status
1	Strategy created in `tradai-strategies/strategies/e2e-test-strategy/`	A.3 + full Scenario B	Automated
2	Unit tests pass	`gh pr checks 11 --repo tradai-bot/strategies`	Manual — CI on PR #11
3	Lint + typecheck pass	same as #2	Manual — CI on PR #11
4	Docker image built + pushed to ECR	A.3	Automated
5	Smoke backtest completes locally	PR #11 CI	Manual — CI on PR #11
6	Workflow completes successfully	C.1 + C.2	Automated
7	ECS training task runs without errors	C.5	Automated
8	Training completes in ~5 minutes	C.4	Automated
9	MLflow experiment created for E2ETestStrategy	C.10 + C.10a + C.11	Automated — per-strategy experiment `E2ETestStrategy_training` is created automatically
10	Model metrics + params logged in MLflow	C.11	Automated (metrics present; `params` dict is empty in the current training path — metrics carry the learning signal)
11	Model artefacts stored in S3	C.12	Automated
12	Feature importance stored in MLflow run	C.12 (part of `artifacts/model/` dump)	Automated
13	Model registered in MLflow Model Registry	C.13	Automated
14	`CompareModels` returns decision + confidence	C.6	Automated
15	`DecidePromotion` routes correctly	C.7	Automated
16	`NotifyCompletion` fires `retraining_success`	C.8	Automated
17	Skip when `force=false` + fresh model	D.1–D.3	Automated
18	Bad input triggers `NotifyFailure`	B.2 + B.7–B.9	Automated (regex violation B.2, ALLOWED_MODELS rejection B.7–B.9)
19	`NotifyFailure` fires `retraining_failed` + error details	B.4 + B.5	Automated
20	No orphaned ECS tasks after failure	B.6	Automated
21	`config_version_id` defaults to `""` when absent	C.3	Automated
22	`config_version_id` passes through when supplied	E.2 + E.3	Automated
23	Step Functions visual workflow correct	C.2 / B.2 / D.1 (the Console visual is a rendering of `get-execution-history`)	Automated
24	CloudWatch logs clean	C.5	Automated
25	MLflow UI shows experiment/run/metrics/model	C.10 + C.11 + C.13 (REST API queries return the same data the UI renders)	Automated
26	S3 bucket contains model artefacts at expected path	C.12	Automated
27	DynamoDB `tradai-models-{env}` updated with job status	A.4 + C.9 + C.9b	Caveat — `tradai-models-dev` does not exist; job status is tracked in `tradai-workflow-state-dev` (C.9b, keyed by execution ARN, `status=completed`), and per-model `last_retrained` gate is in `tradai-retraining-state-dev` (C.9). Issue text references a table name that was renamed during implementation.
28	Execution ARN of a successful run	`$ARN_B` from C.1	Automated
29	MLflow experiment screenshot	C.10 + C.11 — REST output replaces a UI screenshot	Automated (no literal screenshot)
30	Model Registry screenshot	C.13	Automated (no literal screenshot)
31	Document issues + fixes	#366 filed; fix in PR #367	Done
32	Update runbook / docs	This document; `docs/runbooks/retraining-workflow.md` references #366 fix location	Done

Pass criteria¶

Issue #91 is verified when every command above prints the exact Expected text. Alternatively, run docs/verification/issue91-verify.sh which automates all checks and prints a PASS/FAIL summary.

Caveat rows (F.9, F.27) document where the implementation-as-shipped legitimately diverged from the original ticket wording — accept those caveats or revise the ticket before closure. Manual rows (F.2, F.3, F.5) are verified by CI on the strategies-repo PR #11 and are not re-runnable against deployed dev.

Section G — Gap regression checks (from independent verification 2026-04-23)¶

Added after the independent verification in issue91-independent-verification-20260423.md surfaced three code gaps that the A–E checks did not catch: hyperparameters never logged as MLflow params (DW#10), feature importance absent from artefacts (DW#12), and SNS failure notifications lacking error details (DW#19). Each sub-check below maps 1:1 to the fix in PR fix/91-verification-gaps.

The checks below assume ARN_B (happy-path execution) and RUN_ID are already exported from §C, and ARN_A (failure-path) from §B. Re-export them if you restart the shell:

export ARN_B=...  # from §C.1
export ARN_A=...  # from §B.1
export RUN_ID=... # from §C.12

G.1 (DW#10) MLflow run has non-zero `params` — hyperparameters logged¶

mlflow_call GET "/ajax-api/2.0/mlflow/runs/get?run_id=${RUN_ID}" | python3 -c "import json,sys; r=json.load(sys.stdin)['run']; params={p['key']:p['value'] for p in r['data'].get('params',[])}; print('params_count=' + str(len(params)) + ' | has_freqai_n_estimators=' + str(any(k.endswith('n_estimators') for k in params)))"

Expected: params_count= <N ≥ 5> | has_freqai_n_estimators= True — the base TrainingConfig params (timeframe, periods, pairs, freqai_model) plus the freqai model_training_parameters must be present.

G.2 (DW#12) `feature_importance.json` is present at the expected S3 path¶

aws s3 ls "s3://tradai-mlflow-dev/artifacts/${EXP_ID}/${RUN_ID}/artifacts/model/feature_importance.json" --region "$AWS_REGION" --human-readable | awk '{print $3, $4}'

Expected: a single line with a byte size (e.g. 512 Bytes) and the literal filename feature_importance.json. ImageNotFoundException or empty output = fail.

G.3 (DW#12) `feature_importance.json` contents reference `rsi` or `sma-ratio`¶

The artefact is always written; when booster scores are absent, it carries a training_features_list populated from FreqAI per-sub-train metadata.

aws s3 cp "s3://tradai-mlflow-dev/artifacts/${EXP_ID}/${RUN_ID}/artifacts/model/feature_importance.json" - --region "$AWS_REGION" | python3 -c "import json,sys; d=json.load(sys.stdin); feats=list(d.get('feature_importance', {}).keys()) + d.get('training_features_list', []); flat=' '.join(feats).lower(); print('has_rsi=' + str('rsi' in flat) + ' | has_sma_ratio=' + str('sma-ratio' in flat or 'sma_ratio' in flat))"

Expected: has_rsi= True | has_sma_ratio= True

_send_sns_notification sends the body via AlertPublisher.publish; SNS message content is NOT in CloudWatch. The lambda emits a dedicated SNS failure body includes error details: … INFO marker ONLY when _extract_error_lines produced at least one line. That marker is what we grep for.

MSYS_NO_PATHCONV=1 aws logs filter-log-events --log-group-name /aws/lambda/tradai-notify-completion-dev --region "$AWS_REGION" --start-time $(python3 -c "import time; print(int((time.time()-3600)*1000))") --filter-pattern 'SNS failure body includes error details' --no-paginate --query 'length(events)' --output text

Expected: a count ≥ 1 — at least one failure invocation in the last hour rendered an error block. 0 means either no failure ran in the window, or the lambda is running pre-fix code (no marker emitted).

Issue #91 — E2E Verification¶

Prerequisites¶

Setup (run once per session)¶

Section A — Static infrastructure invariants¶

A.1 State machine has 13 states including HandleInvalidModel and UpdateRetrainingState¶

A.2 Strategy ECS task definition active, pointing at the E2ETestStrategy image¶

A.3 E2ETestStrategy image present in ECR¶

A.4 Required DynamoDB tables exist and are active¶

Section B — Scenario A: INVALID_MODEL routing (~5 s)¶

B.1 Start execution and capture its ARN¶

B.2 State path ends at NotifyFailure¶

B.3 Execution status SUCCEEDED¶

B.4 NotifyFailure Lambda reported retraining_failed¶

B.5 HandleInvalidModel synthesised $.error with the regex-violation cause¶

B.6 No RunRetraining state was entered (⇒ zero ECS tasks)¶

B.7 ALLOWED_MODELS rejects a format-valid but unknown model¶

B.8 Allowlist-rejected model routes to HandleInvalidModel → NotifyFailure¶

B.9 Allowlist rejection reason mentions ALLOWED_MODELS¶

Section C — Scenario B: happy path (~5 min)¶

C.1 Start execution and wait for completion¶

C.2 State path runs through all training stages¶

C.3 NormalizeInput defaulted config_version_id to ""¶

C.4 RunRetraining finished within 5 minutes¶

C.5 ECS training log has no error-class lines¶

C.6 CompareModels returned both decision and confidence¶

C.7 DecidePromotion routed to exactly one of the terminal-branch states¶

C.8 NotifyCompletion Lambda reported retraining_success¶

C.9 UpdateRetrainingState wrote last_retrained¶

C.9b tradai-workflow-state-dev row created with status=completed¶

C.10 MLflow experiment and run exist¶

C.11 MLflow run carries expected tags and training metrics¶

C.12 S3 artefacts present at s3://tradai-mlflow-dev/artifacts/{EXP_ID}/<run_id>/artifacts/¶

C.13 Model version registered in MLflow Registry with source pointing at this run¶

Section D — Scenario C: skip path (~5 s, run immediately after Scenario B)¶

D.1 Start, wait, and read the state path¶

D.2 Execution status SUCCEEDED¶

D.3 No RunRetraining state entered¶

Section E — Scenario D: config_version_id passthrough (~5 s)¶

E.1 Start with explicit config_version_id¶

E.2 NormalizeInput preserved the supplied config_version_id¶

E.3 CheckRetrainingNeeded input carried the same config_version_id¶

Section F — Mapping to #91 Done When¶

Pass criteria¶

Section G — Gap regression checks (from independent verification 2026-04-23)¶

G.1 (DW#10) MLflow run has non-zero params — hyperparameters logged¶

G.2 (DW#12) feature_importance.json is present at the expected S3 path¶

G.3 (DW#12) feature_importance.json contents reference rsi or sma-ratio¶

G.4 (DW#19) Failure-path SNS body includes error Cause¶

A.1 State machine has 13 states including `HandleInvalidModel` and `UpdateRetrainingState`¶

Section B — Scenario A: `INVALID_MODEL` routing (~5 s)¶

B.2 State path ends at `NotifyFailure`¶

B.3 Execution status `SUCCEEDED`¶

B.4 `NotifyFailure` Lambda reported `retraining_failed`¶

B.5 `HandleInvalidModel` synthesised `$.error` with the regex-violation cause¶

B.6 No `RunRetraining` state was entered (⇒ zero ECS tasks)¶

B.8 Allowlist-rejected model routes to `HandleInvalidModel → NotifyFailure`¶

B.9 Allowlist rejection reason mentions `ALLOWED_MODELS`¶

C.3 `NormalizeInput` defaulted `config_version_id` to `""`¶

C.4 `RunRetraining` finished within 5 minutes¶

C.6 `CompareModels` returned both `decision` and `confidence`¶

C.7 `DecidePromotion` routed to exactly one of the terminal-branch states¶

C.8 `NotifyCompletion` Lambda reported `retraining_success`¶

C.9 `UpdateRetrainingState` wrote `last_retrained`¶

C.9b `tradai-workflow-state-dev` row created with `status=completed`¶

C.12 S3 artefacts present at `s3://tradai-mlflow-dev/artifacts/{EXP_ID}/<run_id>/artifacts/`¶

C.13 Model version registered in MLflow Registry with `source` pointing at this run¶

D.2 Execution status `SUCCEEDED`¶

D.3 No `RunRetraining` state entered¶

Section E — Scenario D: `config_version_id` passthrough (~5 s)¶

E.1 Start with explicit `config_version_id`¶

E.2 `NormalizeInput` preserved the supplied `config_version_id`¶

E.3 `CheckRetrainingNeeded` input carried the same `config_version_id`¶

Section F — Mapping to #91 `Done When`¶

G.1 (DW#10) MLflow run has non-zero `params` — hyperparameters logged¶

G.2 (DW#12) `feature_importance.json` is present at the expected S3 path¶

G.3 (DW#12) `feature_importance.json` contents reference `rsi` or `sma-ratio`¶