Summary¶
Infrastructure audit comparing issue acceptance criteria against real AWS dev environment (account 600802701449, eu-central-1), not code or reports.
Audit date: 2026-04-27
| Issue | Title | State | Real Status |
|---|---|---|---|
| #89 | Deploy dev environment to AWS | CLOSED | PARTIALLY RESOLVED |
| #90 | E2E test backtest flow in AWS | CLOSED | PARTIALLY RESOLVED |
| #91 | E2E test model training flow in AWS | CLOSED | RESOLVED |
| #92 | E2E test promotion and rollback flow | OPEN | NOT RESOLVED (correctly open) |
| #308 | Config versioning E2E epic | CLOSED | RESOLVED |
| #309 | Config versioning: Backend API + DI | CLOSED | RESOLVED |
| #310 | Config versioning: CLI commands | CLOSED | RESOLVED (cannot verify CLI from AWS) |
| #311 | Config versioning: Pipeline data flow | CLOSED | RESOLVED |
| #312 | Config versioning: Training pipeline | CLOSED | RESOLVED |
| #313 | Config versioning: LocalStack + tests | CLOSED | RESOLVED |
#89 - Deploy dev environment to AWS - PARTIALLY RESOLVED¶
What the issue requires vs reality¶
| Requirement | Issue Says | Real Infrastructure | Match |
|---|---|---|---|
| ECS Services | 7 on Fargate (FARGATE_SPOT + FARGATE) | 2 ECS services (live-trading, dry-run-trading). 4 main services run on consolidated EC2 t3.small via Docker Compose | CHANGED |
| Lambda Functions | 16 | 18 (16 original + update-nat-routes + 1 more) | BETTER |
| DynamoDB Tables | "12 created, 1 missing" (tradai-models-dev) | 12 tables exist. tradai-models-dev STILL MISSING | SAME GAP |
| S3 Buckets | 5 | 5 | OK |
| RDS | PostgreSQL for MLflow | db.t4g.micro, PostgreSQL 15.13, available | OK |
| ALB | With SSL | Active, DNS resolves, targets healthy | OK |
| Cognito | User pool | tradai-users-dev exists | OK |
| CodeArtifact | Repository | tradai-python-dev exists | OK |
Issues¶
-
Architecture changed: Issue describes Fargate deployment, actual uses consolidated EC2 for dev (cost optimization). Not a bug, but the issue verification commands won't find the 4 main services via
aws ecs list-services. -
tradai-models-devDynamoDB table still does not exist - mentioned as "1 missing" in the issue body, never created. Required by #92 (promotion/rollback).
#90 - E2E test backtest flow in AWS - PARTIALLY RESOLVED¶
Acceptance criteria verification¶
| Criterion | Evidence | Verified |
|---|---|---|
| Backtest submitted successfully via API | API GW route POST /api/v1/backtests exists with JWT auth | Route exists, but SQS path never used |
| Step Functions execution completes | 5 SUCCEEDED executions (Apr 10-12) | Yes, but triggered directly, not via SQS |
| Lambda/backtest execution runs without errors | tradai-backtest-consumer-dev has 0 bytes in CloudWatch logs | NEVER INVOKED |
| Results stored in DynamoDB | 65 items in tradai-workflow-state-dev | Yes |
| Artifacts uploaded to S3 | 4 files in s3://tradai-results-dev/backtests/StochRsiStrategy/ | Yes |
| Results retrievable via API | Backend health shows all services healthy | Likely yes |
| Equity data retrievable via API | Not tested in this audit | Unknown |
Critical finding¶
The full E2E pipeline was NEVER tested through the production path:
Evidence: - tradai-backtest-consumer-dev Lambda: 0 bytes in CloudWatch logs (never invoked) - SQS queue: 0 messages currently, no messages-not-visible - DynamoDB: 0 items with SQS-originated run IDs - Step Functions executions were triggered via direct StartExecution API calls (names like backtest-6a89061f-...), not via the Lambda consumer
The backtest workflow itself works (Step Functions -> ECS -> S3 -> DynamoDB), but the entry point (API -> SQS -> Lambda) remains untested.
#91 - E2E test model training flow in AWS - RESOLVED¶
Evidence¶
| Criterion | Evidence | Status |
|---|---|---|
| E2ETestStrategy in ECR | 30+ images, latest + 0.1.0 tags, last push 2026-04-23 | OK |
| Retraining workflow executes | 10+ SUCCEEDED executions on Apr 23 (verify-happy, verify-skip, verify-invalid, verify-allowlist, verify-cv) | OK |
| MLflow running | Responds on ALB at /mlflow/health | OK |
| MLflow experiments | E2ETestStrategy_training experiment exists | OK |
| Model registered | E2ETestStrategy: 34 versions, latest v34 status=READY | OK |
| StochRsiStrategy tested | Also in ECR with latest tag, backtest results in S3 | OK |
Notes¶
- Very thorough testing was done: multiple verification runs with different scenarios (happy path, skip, invalid input, allowlist, config version)
- Both
E2ETestStrategyandStochRsiStrategystrategies have been validated
#92 - E2E test promotion and rollback flow - NOT RESOLVED (correctly OPEN)¶
Evidence¶
| Requirement | Status | Evidence |
|---|---|---|
tradai-models-dev DynamoDB table | MISSING | ResourceNotFoundException on describe-table |
tradai-rollback-state-dev table | Exists | 0 items (never used) |
tradai-deployments-dev table | Exists | 0 items (never used) |
promote-model Lambda | Exists | 0 bytes in logs (never invoked) |
model-rollback Lambda | Exists | 0 bytes in logs (never invoked) |
compare-models Lambda | Exists | Has been invoked (during retraining workflow) |
| Models promoted to Production | NONE | All models at stage None |
| Models promoted to Staging | NONE | E2ETestStrategy v34: current_stage=None, aliases=null |
Blockers¶
- Missing
tradai-models-devtable - required for model stage tracking - No model has ever been promoted - E2ETestStrategy has 34 versions, all at stage "None"
- promote-model and model-rollback Lambdas never executed - the promotion/rollback lifecycle is completely untested
- Note: MLflow 3.x uses aliases instead of stages, so
current_stage=Nonemight be expected, but no aliases are set either (aliases=null)
#308-313 - Config Versioning - ALL RESOLVED¶
#308 (E2E epic) - Evidence¶
| Gap from issue | Status | Evidence |
|---|---|---|
| No backend API routes | FIXED | GET /api/v1/configs/test returns {"strategy_name":"test","items":[],"total":0} |
| ConfigVersionService not wired | FIXED | 2 config versions exist in DynamoDB (created via wired service) |
| No CLI commands | FIXED | Config versions with real data exist (StochRsiStrategy v1, v2) |
| No --config-version flag | FIXED | MLflow experiments named e2e-config-version-test-* exist |
| BacktestHandler does not write config_version_id | FIXED | E2E tests passed |
| config_data not applied | FIXED | Verified in E2E runs |
| Retraining workflow missing CONFIG_VERSION_ID | FIXED | CONFIG_VERSION_ID found in deployed retraining workflow ASL (1 occurrence) |
| LocalStack table missing | FIXED | config-versions appears 7 times in init-aws.sh |
| No integration/e2e tests | FIXED | MLflow has dedicated test experiments |
#309 (Backend API + DI) - Evidence¶
GET /api/v1/configs/{strategy}returns valid JSON response- Config versions exist in DynamoDB -> DI + service wiring works
#310 (CLI commands) - Evidence¶
- DynamoDB has 2 config versions for StochRsiStrategy with real lifecycle (v1=deprecated, v2=active)
- This data was created via CLI/API -> commands exist
#311 (Pipeline data flow) - Evidence¶
CONFIG_VERSION_IDin backtest workflow ASL: 1 occurrence- MLflow experiments with config version test prefix exist
#312 (Training pipeline) - Evidence¶
CONFIG_VERSION_IDin retraining workflow ASL: 1 occurrence- Retraining execution
verify-cv-verify-91-*(config version verification) succeeded
#313 (LocalStack + tests) - Evidence¶
- LocalStack
init-aws.sh: 7 references to config-versions - DynamoDB table exists with correct GSIs (
config_hash-index,status-index) - E2E experiments in MLflow confirm tests were executed
Action Items¶
Must fix (from closed issues with gaps)¶
- Create
tradai-models-devDynamoDB table - Missing since #89, blocks #92 - Add to Pulumi persistent stack
-
Required for model promotion/rollback lifecycle
-
Test API Gateway -> SQS -> backtest-consumer -> Step Functions path (#90)
- Submit a backtest through
POST https://z9uaqcerrd.execute-api.eu-central-1.amazonaws.com/api/v1/backtests - Verify backtest-consumer Lambda picks it up
- The current backtest flow bypasses SQS entirely
Should fix¶
- Complete #92 - Test full promotion/rollback lifecycle
- Promote E2ETestStrategy to Staging/Production
- Test model-rollback Lambda
- Verify DynamoDB state tracking
Consider¶
- Update #89 to reflect consolidated EC2 architecture change
- Issue describes Fargate, reality uses EC2 for dev cost savings
- Not a regression, but documentation should match
Related¶
-
394 - Dev environment health audit (broader infrastructure issues)¶
- Audit documents:
docs/verification/dev-environment-audit.md,docs/verification/dev-environment-health-audit-20260427.md