TradAI Dev Environment — Full Component Map
Generated: 2026-04-27 Purpose: Complete audit of all AWS components in dev environment
Stack Architecture (deploy order)
PERSISTENT → FOUNDATION → COMPUTE → EDGE
Region: eu-central-1, AWS Profile: tradai
1. PERSISTENT Stack (permanent data resources)
S3 Buckets (5)
| Bucket | Name | Versioning | Lifecycle |
| configs | tradai-configs-dev | Yes | None |
| results | tradai-results-dev | Yes | Glacier at 30 days |
| arcticdb | tradai-arcticdb-dev | Yes | None |
| logs | tradai-logs-dev | No | Delete at 90 days |
| mlflow | tradai-mlflow-dev | Yes | None |
DynamoDB Tables (12)
| Table | Purpose |
tradai-workflow-state-dev | Backtest job tracking (GSI) |
tradai-health-state-dev | Service health check state |
tradai-trading-state-dev | Live trading heartbeats |
tradai-deployments-dev | Deployment tracking |
tradai-drift-state-dev | Model drift detection |
tradai-retraining-state-dev | Retraining job tracking |
tradai-rollback-state-dev | Model rollback history |
tradai-shadow-test-state-dev | Shadow testing |
tradai-notifications-dev | Notification state |
tradai-idempotency-dev | Idempotency keys |
tradai-infra-drift-state-dev | Pulumi drift detection |
tradai-config-versions-dev | Config version registry |
ECR Repositories (24)
Services (6): tradai/backend, tradai/data-collection, tradai/strategy-service, tradai/mlflow, tradai/live-trading, tradai/dry-run-trading
Lambdas (18): tradai/lambda-base, lambda-backtest-consumer, lambda-sqs-consumer, lambda-orphan-scanner, lambda-health-check, lambda-trading-heartbeat-check, lambda-drift-monitor, lambda-retraining-scheduler, lambda-validate-strategy, lambda-data-collection-proxy, lambda-notify-completion, lambda-check-retraining-needed, lambda-compare-models, lambda-promote-model, lambda-model-rollback, lambda-cleanup-resources, lambda-update-status, lambda-pulumi-drift-detector
Cognito
- User Pool:
tradai-users-dev - MFA: Required (TOTP)
- Password: Min 12 chars, upper/lower/numbers/symbols
- Clients: Public (PKCE) + M2M (Client Credentials)
CodeArtifact
- Domain:
tradai-dev - Repository:
tradai-dev
CloudTrail
- Audit log to
tradai-logs-dev bucket
2. FOUNDATION Stack (networking & data infrastructure)
VPC & Networking
| Resource | Details |
| VPC CIDR | 10.0.0.0/16 |
| AZs | eu-central-1a, eu-central-1b |
| Public Subnets | 10.0.1.0/24, 10.0.2.0/24 |
| Private Subnets | 10.0.11.0/24, 10.0.12.0/24 |
| Database Subnets | 10.0.21.0/24, 10.0.22.0/24 |
| NAT | t4g.nano instance (~$3/mo) |
| VPC Endpoints | S3, DynamoDB, ECR, CloudWatch, Secrets Manager, STS |
Security Groups
ALB, ECS, Lambda, RDS, NAT, VPC Link, VPC Endpoints, EC2 Consolidated
RDS PostgreSQL
| Setting | Value |
| Engine | PostgreSQL 15.13 |
| Instance | db.t4g.micro |
| Database | mlflow |
| Storage | 20 GB |
| Multi-AZ | No |
| Backup | 7 days |
| SSL | Forced |
SQS
| Queue | Type | Details |
tradai-backtest-queue-dev.fifo | FIFO | Content-based dedup, 15min visibility, 4d retention |
tradai-backtest-dlq-dev.fifo | FIFO DLQ | 3 max receives, 14d retention |
SNS Topics
| Topic | Purpose |
tradai-alerts-dev | Service health alerts |
tradai-registration-dev | Strategy registration events |
3. COMPUTE Stack (services & compute)
ECS Cluster
- Name:
tradai-dev - Container Insights: Enabled
- Log Group:
/aws/ecs/tradai-dev (30d retention)
ECS Services
| Service | Port | CPU/Mem | Desired | Health Check | Notes |
| backend-api | 8000 | 512/1024 | 1 | GET /api/v1/health | API gateway |
| data-collection | 8002 | 256/512 | 1 | GET /api/v1/health | OHLCV sync |
| strategy-service | 8003 | 512/1024 | 1 | GET /api/v1/health | Strategy mgmt |
| mlflow | 5000 | 512/1024 | 1 | GET /health | ML tracking |
| strategy-container | — | 1024/2048 | 0 | — | On-demand (backtests) |
| live-trading | 8004 | 1024/2048 | 0 | GET /api/v1/health | Live trading |
| dry-run-trading | 8005 | 1024/2048 | 0 | GET /api/v1/health | Paper trading (Spot) |
Dev mode: Consolidated EC2 t3.small runs backend-api, data-collection, mlflow, strategy-service (~$15/mo vs ~$37/mo Fargate)
Service Discovery
Namespace: tradai-dev.local - backend-api.tradai-dev.local:8000 - data-collection.tradai-dev.local:8002 - strategy-service.tradai-dev.local:8003 - mlflow.tradai-dev.local:5000
ALB (Application Load Balancer)
- Type: Internet-facing (public subnets)
- Routing:
/mlflow/* → mlflow:5000 /api/v1/live/* → live-trading:8004 /api/v1/dry-run/* → dry-run-trading:8005 /api/v1/* → backend-api:8000 (catch-all)
Lambda Functions
Required (always running)
| Lambda | Trigger | Mem/Timeout | VPC | Purpose |
| backtest-consumer | SQS | 256MB/30s | Yes | SQS → ECS/StepFunctions |
| health-check | EventBridge rate(2 min) | 256MB/60s | Yes | Ping all 4 services |
| orphan-scanner | EventBridge rate(5 min) | 128MB/60s | Yes | Find orphaned ECS tasks |
| trading-heartbeat-check | EventBridge rate(5 min) | 256MB/60s | Yes | Check trading heartbeats |
| drift-monitor | EventBridge rate(12 hours) | 512MB/120s | Yes | PSI model drift analysis |
| retraining-scheduler | EventBridge rate(6 hours) | 256MB/60s | Yes | Schedule retraining |
| pulumi-drift-detector | EventBridge rate(6 hours) | 512MB/300s | No | Infrastructure drift |
Optional (Step Functions)
| Lambda | Purpose |
| validate-strategy | Validate strategy config |
| data-collection-proxy | Proxy to data-collection |
| update-status | Update DynamoDB state |
| notify-completion | SNS notifications |
| check-retraining-needed | Evaluate retraining need |
| compare-models | Champion vs challenger |
| promote-model | MLflow stage transition |
| model-rollback | Performance-triggered rollback |
| cleanup-resources | Stop ECS tasks on error |
| sqs-consumer | Retraining queue consumer |
Step Functions Workflows
Backtest Workflow (tradai-backtest-workflow-dev, 2h timeout)
ValidateStrategy → EnsureData → UpdateStatus(RUNNING) → RunBacktest(ECS)
↓ (error)
CleanupResources → NotifyFailure
→ HandleSuccess → NotifyCompletion
Retraining Workflow (tradai-retraining-workflow-dev, 3h timeout)
CheckRetrainingNeeded → RunRetraining(ECS) → CompareModels
→ PromoteModel → NotifyCompletion
→ ModelRollback → NotifyFailure
IAM Roles
- ECS Execution Role: ECR pull, CloudWatch Logs, Secrets Manager
- ECS Task Role: DynamoDB, S3, Secrets Manager, CloudWatch, SNS, CodeArtifact, ECS control
- Lambda Role: ECS control, DynamoDB, SNS, SQS, Secrets Manager, S3, Step Functions
- Consolidated EC2 Role: ECR, CloudWatch, DynamoDB, S3, Secrets Manager (dev/staging only)
4. EDGE Stack (API Gateway, monitoring)
API Gateway (HTTP API)
- VPC Link to ALB
- JWT auth (Cognito)
- CORS:
* (dev) - Throttling: 100 req/s default, 10 req/s on POST /backtests
Routes
| Method | Path | Backend | Auth |
| GET | /api/v1/health | backend-api | No |
| POST | /api/v1/backtests | SQS direct | Yes |
| GET | /api/v1/backtests | backend-api | Yes |
| GET | /api/v1/backtests/{job_id} | backend-api | Yes |
| GET | /api/v1/backtests/{job_id}/equity | backend-api | Yes |
| GET | /api/v1/backtests/{job_id}/report-data | backend-api | Yes |
| POST | /api/v1/backtests/{job_id}/cancel | backend-api | Yes |
| POST | /api/v1/data/sync | backend-api | Yes |
| GET | /api/v1/data/symbols | backend-api | Yes |
| GET | /api/v1/data/freshness | backend-api | Yes |
| GET | /api/v1/data/ohlcv | backend-api | Yes |
| GET | /api/v1/strategies | backend-api | Yes |
| POST | /api/v1/strategies | backend-api | Yes |
| GET | /api/v1/strategies/{id} | backend-api | Yes |
| POST | /api/v1/strategies/{name}/stage | backend-api | Yes |
| POST | /api/v1/strategies/{name}/promote | backend-api | Yes |
| GET | /api/v1/catalog/strategies | backend-api | Yes |
| GET | /api/v1/catalog/strategies/{name} | backend-api | Yes |
| GET | /api/v1/catalog/strategies/{name}/compare | backend-api | Yes |
| GET | /api/v1/experiments | backend-api | Yes |
| GET | /api/v1/experiments/{id} | backend-api | Yes |
| GET | /api/v1/runs/{id} | backend-api | Yes |
| GET | /api/v1/runs/detail/{id} | backend-api | Yes |
| GET | /api/v1/runs/{id}/metrics | backend-api | Yes |
| GET | /api/v1/runs/{id}/metrics/history | backend-api | Yes |
| GET | /api/v1/models/{name}/versions | backend-api | Yes |
| POST | /api/v1/models/{name}/rollback | backend-api | Yes |
| ANY | /mlflow/{proxy+} | mlflow | Yes |
WAF
- WebACL created but NOT associated (bug:
$default stage ARN parsing)
CloudWatch
- Alarms: RDS CPU/disk, API 4xx/5xx, Lambda errors, ECS tasks, Step Functions
- Composite alarm → SNS alerts
- Dashboard with all metrics
End-to-End Data Flow
User → API Gateway (JWT) → POST /api/v1/data/sync
→ backend:8000 → data-collection:8002 → CCXT(Binance) → ArcticDB(S3)
User → API Gateway → POST /api/v1/backtests → SQS FIFO
→ Lambda(backtest-consumer) → Step Functions
→ ValidateStrategy → EnsureData → RunBacktest(ECS strategy-container)
→ Freqtrade backtest → S3 results + MLflow experiments
→ DynamoDB workflow_state (PENDING→RUNNING→COMPLETED)
User → API Gateway → GET /api/v1/backtests/{job_id} → status + results
Monitoring:
Lambda health-check (2min) → ping services → CloudWatch + DynamoDB
Lambda orphan-scanner (5min) → find orphaned ECS tasks
Lambda drift-monitor (12h) → PSI analysis → SNS alerts
Lambda retraining-scheduler (6h) → evaluate + launch retraining
Expected Artifacts
| Artifact | Storage | Producer |
| OHLCV market data | S3 tradai-arcticdb-dev | data-collection |
| Strategy configs | S3 tradai-configs-dev | strategy-service |
| Backtest results | S3 tradai-results-dev | strategy-container (ECS) |
| MLflow experiments | RDS PostgreSQL + S3 tradai-mlflow-dev | strategy-container → MLflow |
| Workflow state | DynamoDB workflow_state | Lambda update-status, backend |
| Health metrics | CloudWatch + DynamoDB health_state | Lambda health-check |
| Drift metrics | DynamoDB drift_state + CloudWatch | Lambda drift-monitor |
| Alerts | SNS → Email | Lambda monitors |
| ALB/audit logs | S3 tradai-logs-dev | ALB, CloudTrail |
| Container logs | CloudWatch Logs | ECS, Lambda |
Service Endpoints (Backend API)
Backtest Management
POST /api/v1/backtests — submit backtest job GET /api/v1/backtests — list backtests (paginated) GET /api/v1/backtests/{job_id} — get status POST /api/v1/backtests/{job_id}/cancel — cancel GET /api/v1/backtests/{job_id}/equity — equity curve GET /api/v1/backtests/{job_id}/report-data — full report
Data Operations
POST /api/v1/sync — sync OHLCV data POST /api/v1/sync/incremental — incremental sync GET /api/v1/freshness — data freshness GET /api/v1/symbols — available symbols GET /api/v1/ohlcv — export OHLCV POST /api/v1/export — export with POST
Strategy Catalog
GET /api/v1/catalog/strategies — list strategies GET /api/v1/catalog/strategies/{name} — details GET /api/v1/catalog/leaderboard — leaderboard
Model Management
POST /api/v1/promote-model — promote POST /api/v1/demote-model — demote GET /api/v1/model-promotion-history — history POST /api/v1/compare-models — compare
Strategy Service Endpoints (internal, :8003)
GET/POST /api/v1/configs — strategy configs POST /api/v1/strategies/register — register strategy POST /api/v1/hyperopt/run — hyperopt POST /api/v1/challenges/create — A/B testing
Data Collection Endpoints (internal, :8002)
POST /api/v1/sync — sync OHLCV POST /api/v1/sync/incremental — incremental sync GET /api/v1/freshness — data freshness GET /api/v1/symbols — available symbols GET /api/v1/ohlcv — export data