Skip to content

Summary

Infrastructure audit comparing issue acceptance criteria against real AWS dev environment (account 600802701449, eu-central-1), not code or reports.

Audit date: 2026-04-27

Issue Title State Real Status
#89 Deploy dev environment to AWS CLOSED PARTIALLY RESOLVED
#90 E2E test backtest flow in AWS CLOSED PARTIALLY RESOLVED
#91 E2E test model training flow in AWS CLOSED RESOLVED
#92 E2E test promotion and rollback flow OPEN NOT RESOLVED (correctly open)
#308 Config versioning E2E epic CLOSED RESOLVED
#309 Config versioning: Backend API + DI CLOSED RESOLVED
#310 Config versioning: CLI commands CLOSED RESOLVED (cannot verify CLI from AWS)
#311 Config versioning: Pipeline data flow CLOSED RESOLVED
#312 Config versioning: Training pipeline CLOSED RESOLVED
#313 Config versioning: LocalStack + tests CLOSED RESOLVED

#89 - Deploy dev environment to AWS - PARTIALLY RESOLVED

What the issue requires vs reality

Requirement Issue Says Real Infrastructure Match
ECS Services 7 on Fargate (FARGATE_SPOT + FARGATE) 2 ECS services (live-trading, dry-run-trading). 4 main services run on consolidated EC2 t3.small via Docker Compose CHANGED
Lambda Functions 16 18 (16 original + update-nat-routes + 1 more) BETTER
DynamoDB Tables "12 created, 1 missing" (tradai-models-dev) 12 tables exist. tradai-models-dev STILL MISSING SAME GAP
S3 Buckets 5 5 OK
RDS PostgreSQL for MLflow db.t4g.micro, PostgreSQL 15.13, available OK
ALB With SSL Active, DNS resolves, targets healthy OK
Cognito User pool tradai-users-dev exists OK
CodeArtifact Repository tradai-python-dev exists OK

Issues

  1. Architecture changed: Issue describes Fargate deployment, actual uses consolidated EC2 for dev (cost optimization). Not a bug, but the issue verification commands won't find the 4 main services via aws ecs list-services.

  2. tradai-models-dev DynamoDB table still does not exist - mentioned as "1 missing" in the issue body, never created. Required by #92 (promotion/rollback).


#90 - E2E test backtest flow in AWS - PARTIALLY RESOLVED

Acceptance criteria verification

Criterion Evidence Verified
Backtest submitted successfully via API API GW route POST /api/v1/backtests exists with JWT auth Route exists, but SQS path never used
Step Functions execution completes 5 SUCCEEDED executions (Apr 10-12) Yes, but triggered directly, not via SQS
Lambda/backtest execution runs without errors tradai-backtest-consumer-dev has 0 bytes in CloudWatch logs NEVER INVOKED
Results stored in DynamoDB 65 items in tradai-workflow-state-dev Yes
Artifacts uploaded to S3 4 files in s3://tradai-results-dev/backtests/StochRsiStrategy/ Yes
Results retrievable via API Backend health shows all services healthy Likely yes
Equity data retrievable via API Not tested in this audit Unknown

Critical finding

The full E2E pipeline was NEVER tested through the production path:

API Gateway -> SQS FIFO -> backtest-consumer Lambda -> Step Functions

Evidence: - tradai-backtest-consumer-dev Lambda: 0 bytes in CloudWatch logs (never invoked) - SQS queue: 0 messages currently, no messages-not-visible - DynamoDB: 0 items with SQS-originated run IDs - Step Functions executions were triggered via direct StartExecution API calls (names like backtest-6a89061f-...), not via the Lambda consumer

The backtest workflow itself works (Step Functions -> ECS -> S3 -> DynamoDB), but the entry point (API -> SQS -> Lambda) remains untested.


#91 - E2E test model training flow in AWS - RESOLVED

Evidence

Criterion Evidence Status
E2ETestStrategy in ECR 30+ images, latest + 0.1.0 tags, last push 2026-04-23 OK
Retraining workflow executes 10+ SUCCEEDED executions on Apr 23 (verify-happy, verify-skip, verify-invalid, verify-allowlist, verify-cv) OK
MLflow running Responds on ALB at /mlflow/health OK
MLflow experiments E2ETestStrategy_training experiment exists OK
Model registered E2ETestStrategy: 34 versions, latest v34 status=READY OK
StochRsiStrategy tested Also in ECR with latest tag, backtest results in S3 OK

Notes

  • Very thorough testing was done: multiple verification runs with different scenarios (happy path, skip, invalid input, allowlist, config version)
  • Both E2ETestStrategy and StochRsiStrategy strategies have been validated

#92 - E2E test promotion and rollback flow - NOT RESOLVED (correctly OPEN)

Evidence

Requirement Status Evidence
tradai-models-dev DynamoDB table MISSING ResourceNotFoundException on describe-table
tradai-rollback-state-dev table Exists 0 items (never used)
tradai-deployments-dev table Exists 0 items (never used)
promote-model Lambda Exists 0 bytes in logs (never invoked)
model-rollback Lambda Exists 0 bytes in logs (never invoked)
compare-models Lambda Exists Has been invoked (during retraining workflow)
Models promoted to Production NONE All models at stage None
Models promoted to Staging NONE E2ETestStrategy v34: current_stage=None, aliases=null

Blockers

  1. Missing tradai-models-dev table - required for model stage tracking
  2. No model has ever been promoted - E2ETestStrategy has 34 versions, all at stage "None"
  3. promote-model and model-rollback Lambdas never executed - the promotion/rollback lifecycle is completely untested
  4. Note: MLflow 3.x uses aliases instead of stages, so current_stage=None might be expected, but no aliases are set either (aliases=null)

#308-313 - Config Versioning - ALL RESOLVED

#308 (E2E epic) - Evidence

Gap from issue Status Evidence
No backend API routes FIXED GET /api/v1/configs/test returns {"strategy_name":"test","items":[],"total":0}
ConfigVersionService not wired FIXED 2 config versions exist in DynamoDB (created via wired service)
No CLI commands FIXED Config versions with real data exist (StochRsiStrategy v1, v2)
No --config-version flag FIXED MLflow experiments named e2e-config-version-test-* exist
BacktestHandler does not write config_version_id FIXED E2E tests passed
config_data not applied FIXED Verified in E2E runs
Retraining workflow missing CONFIG_VERSION_ID FIXED CONFIG_VERSION_ID found in deployed retraining workflow ASL (1 occurrence)
LocalStack table missing FIXED config-versions appears 7 times in init-aws.sh
No integration/e2e tests FIXED MLflow has dedicated test experiments

#309 (Backend API + DI) - Evidence

  • GET /api/v1/configs/{strategy} returns valid JSON response
  • Config versions exist in DynamoDB -> DI + service wiring works

#310 (CLI commands) - Evidence

  • DynamoDB has 2 config versions for StochRsiStrategy with real lifecycle (v1=deprecated, v2=active)
  • This data was created via CLI/API -> commands exist

#311 (Pipeline data flow) - Evidence

  • CONFIG_VERSION_ID in backtest workflow ASL: 1 occurrence
  • MLflow experiments with config version test prefix exist

#312 (Training pipeline) - Evidence

  • CONFIG_VERSION_ID in retraining workflow ASL: 1 occurrence
  • Retraining execution verify-cv-verify-91-* (config version verification) succeeded

#313 (LocalStack + tests) - Evidence

  • LocalStack init-aws.sh: 7 references to config-versions
  • DynamoDB table exists with correct GSIs (config_hash-index, status-index)
  • E2E experiments in MLflow confirm tests were executed

Action Items

Must fix (from closed issues with gaps)

  1. Create tradai-models-dev DynamoDB table - Missing since #89, blocks #92
  2. Add to Pulumi persistent stack
  3. Required for model promotion/rollback lifecycle

  4. Test API Gateway -> SQS -> backtest-consumer -> Step Functions path (#90)

  5. Submit a backtest through POST https://z9uaqcerrd.execute-api.eu-central-1.amazonaws.com/api/v1/backtests
  6. Verify backtest-consumer Lambda picks it up
  7. The current backtest flow bypasses SQS entirely

Should fix

  1. Complete #92 - Test full promotion/rollback lifecycle
  2. Promote E2ETestStrategy to Staging/Production
  3. Test model-rollback Lambda
  4. Verify DynamoDB state tracking

Consider

  1. Update #89 to reflect consolidated EC2 architecture change
  2. Issue describes Fargate, reality uses EC2 for dev cost savings
  3. Not a regression, but documentation should match

  • 394 - Dev environment health audit (broader infrastructure issues)

  • Audit documents: docs/verification/dev-environment-audit.md, docs/verification/dev-environment-health-audit-20260427.md