Skip to content

TradAI Dev Environment — Full Component Map

Generated: 2026-04-27 Purpose: Complete audit of all AWS components in dev environment

Stack Architecture (deploy order)

PERSISTENT → FOUNDATION → COMPUTE → EDGE

Region: eu-central-1, AWS Profile: tradai


1. PERSISTENT Stack (permanent data resources)

S3 Buckets (5)

Bucket Name Versioning Lifecycle
configs tradai-configs-dev Yes None
results tradai-results-dev Yes Glacier at 30 days
arcticdb tradai-arcticdb-dev Yes None
logs tradai-logs-dev No Delete at 90 days
mlflow tradai-mlflow-dev Yes None

DynamoDB Tables (12)

Table Purpose
tradai-workflow-state-dev Backtest job tracking (GSI)
tradai-health-state-dev Service health check state
tradai-trading-state-dev Live trading heartbeats
tradai-deployments-dev Deployment tracking
tradai-drift-state-dev Model drift detection
tradai-retraining-state-dev Retraining job tracking
tradai-rollback-state-dev Model rollback history
tradai-shadow-test-state-dev Shadow testing
tradai-notifications-dev Notification state
tradai-idempotency-dev Idempotency keys
tradai-infra-drift-state-dev Pulumi drift detection
tradai-config-versions-dev Config version registry

ECR Repositories (24)

Services (6): tradai/backend, tradai/data-collection, tradai/strategy-service, tradai/mlflow, tradai/live-trading, tradai/dry-run-trading

Lambdas (18): tradai/lambda-base, lambda-backtest-consumer, lambda-sqs-consumer, lambda-orphan-scanner, lambda-health-check, lambda-trading-heartbeat-check, lambda-drift-monitor, lambda-retraining-scheduler, lambda-validate-strategy, lambda-data-collection-proxy, lambda-notify-completion, lambda-check-retraining-needed, lambda-compare-models, lambda-promote-model, lambda-model-rollback, lambda-cleanup-resources, lambda-update-status, lambda-pulumi-drift-detector

Cognito

  • User Pool: tradai-users-dev
  • MFA: Required (TOTP)
  • Password: Min 12 chars, upper/lower/numbers/symbols
  • Clients: Public (PKCE) + M2M (Client Credentials)

CodeArtifact

  • Domain: tradai-dev
  • Repository: tradai-dev

CloudTrail

  • Audit log to tradai-logs-dev bucket

2. FOUNDATION Stack (networking & data infrastructure)

VPC & Networking

Resource Details
VPC CIDR 10.0.0.0/16
AZs eu-central-1a, eu-central-1b
Public Subnets 10.0.1.0/24, 10.0.2.0/24
Private Subnets 10.0.11.0/24, 10.0.12.0/24
Database Subnets 10.0.21.0/24, 10.0.22.0/24
NAT t4g.nano instance (~$3/mo)
VPC Endpoints S3, DynamoDB, ECR, CloudWatch, Secrets Manager, STS

Security Groups

ALB, ECS, Lambda, RDS, NAT, VPC Link, VPC Endpoints, EC2 Consolidated

RDS PostgreSQL

Setting Value
Engine PostgreSQL 15.13
Instance db.t4g.micro
Database mlflow
Storage 20 GB
Multi-AZ No
Backup 7 days
SSL Forced

SQS

Queue Type Details
tradai-backtest-queue-dev.fifo FIFO Content-based dedup, 15min visibility, 4d retention
tradai-backtest-dlq-dev.fifo FIFO DLQ 3 max receives, 14d retention

SNS Topics

Topic Purpose
tradai-alerts-dev Service health alerts
tradai-registration-dev Strategy registration events

3. COMPUTE Stack (services & compute)

ECS Cluster

  • Name: tradai-dev
  • Container Insights: Enabled
  • Log Group: /aws/ecs/tradai-dev (30d retention)

ECS Services

Service Port CPU/Mem Desired Health Check Notes
backend-api 8000 512/1024 1 GET /api/v1/health API gateway
data-collection 8002 256/512 1 GET /api/v1/health OHLCV sync
strategy-service 8003 512/1024 1 GET /api/v1/health Strategy mgmt
mlflow 5000 512/1024 1 GET /health ML tracking
strategy-container 1024/2048 0 On-demand (backtests)
live-trading 8004 1024/2048 0 GET /api/v1/health Live trading
dry-run-trading 8005 1024/2048 0 GET /api/v1/health Paper trading (Spot)

Dev mode: Consolidated EC2 t3.small runs backend-api, data-collection, mlflow, strategy-service (~$15/mo vs ~$37/mo Fargate)

Service Discovery

Namespace: tradai-dev.local - backend-api.tradai-dev.local:8000 - data-collection.tradai-dev.local:8002 - strategy-service.tradai-dev.local:8003 - mlflow.tradai-dev.local:5000

ALB (Application Load Balancer)

  • Type: Internet-facing (public subnets)
  • Routing:
  • /mlflow/* → mlflow:5000
  • /api/v1/live/* → live-trading:8004
  • /api/v1/dry-run/* → dry-run-trading:8005
  • /api/v1/* → backend-api:8000 (catch-all)

Lambda Functions

Required (always running)

Lambda Trigger Mem/Timeout VPC Purpose
backtest-consumer SQS 256MB/30s Yes SQS → ECS/StepFunctions
health-check EventBridge rate(2 min) 256MB/60s Yes Ping all 4 services
orphan-scanner EventBridge rate(5 min) 128MB/60s Yes Find orphaned ECS tasks
trading-heartbeat-check EventBridge rate(5 min) 256MB/60s Yes Check trading heartbeats
drift-monitor EventBridge rate(12 hours) 512MB/120s Yes PSI model drift analysis
retraining-scheduler EventBridge rate(6 hours) 256MB/60s Yes Schedule retraining
pulumi-drift-detector EventBridge rate(6 hours) 512MB/300s No Infrastructure drift

Optional (Step Functions)

Lambda Purpose
validate-strategy Validate strategy config
data-collection-proxy Proxy to data-collection
update-status Update DynamoDB state
notify-completion SNS notifications
check-retraining-needed Evaluate retraining need
compare-models Champion vs challenger
promote-model MLflow stage transition
model-rollback Performance-triggered rollback
cleanup-resources Stop ECS tasks on error
sqs-consumer Retraining queue consumer

Step Functions Workflows

Backtest Workflow (tradai-backtest-workflow-dev, 2h timeout)

ValidateStrategy → EnsureData → UpdateStatus(RUNNING) → RunBacktest(ECS)
                                                              ↓ (error)
                                                        CleanupResources → NotifyFailure
    → HandleSuccess → NotifyCompletion

Retraining Workflow (tradai-retraining-workflow-dev, 3h timeout)

CheckRetrainingNeeded → RunRetraining(ECS) → CompareModels
    → PromoteModel → NotifyCompletion
    → ModelRollback → NotifyFailure

IAM Roles

  • ECS Execution Role: ECR pull, CloudWatch Logs, Secrets Manager
  • ECS Task Role: DynamoDB, S3, Secrets Manager, CloudWatch, SNS, CodeArtifact, ECS control
  • Lambda Role: ECS control, DynamoDB, SNS, SQS, Secrets Manager, S3, Step Functions
  • Consolidated EC2 Role: ECR, CloudWatch, DynamoDB, S3, Secrets Manager (dev/staging only)

4. EDGE Stack (API Gateway, monitoring)

API Gateway (HTTP API)

  • VPC Link to ALB
  • JWT auth (Cognito)
  • CORS: * (dev)
  • Throttling: 100 req/s default, 10 req/s on POST /backtests

Routes

Method Path Backend Auth
GET /api/v1/health backend-api No
POST /api/v1/backtests SQS direct Yes
GET /api/v1/backtests backend-api Yes
GET /api/v1/backtests/{job_id} backend-api Yes
GET /api/v1/backtests/{job_id}/equity backend-api Yes
GET /api/v1/backtests/{job_id}/report-data backend-api Yes
POST /api/v1/backtests/{job_id}/cancel backend-api Yes
POST /api/v1/data/sync backend-api Yes
GET /api/v1/data/symbols backend-api Yes
GET /api/v1/data/freshness backend-api Yes
GET /api/v1/data/ohlcv backend-api Yes
GET /api/v1/strategies backend-api Yes
POST /api/v1/strategies backend-api Yes
GET /api/v1/strategies/{id} backend-api Yes
POST /api/v1/strategies/{name}/stage backend-api Yes
POST /api/v1/strategies/{name}/promote backend-api Yes
GET /api/v1/catalog/strategies backend-api Yes
GET /api/v1/catalog/strategies/{name} backend-api Yes
GET /api/v1/catalog/strategies/{name}/compare backend-api Yes
GET /api/v1/experiments backend-api Yes
GET /api/v1/experiments/{id} backend-api Yes
GET /api/v1/runs/{id} backend-api Yes
GET /api/v1/runs/detail/{id} backend-api Yes
GET /api/v1/runs/{id}/metrics backend-api Yes
GET /api/v1/runs/{id}/metrics/history backend-api Yes
GET /api/v1/models/{name}/versions backend-api Yes
POST /api/v1/models/{name}/rollback backend-api Yes
ANY /mlflow/{proxy+} mlflow Yes

WAF

  • WebACL created but NOT associated (bug: $default stage ARN parsing)

CloudWatch

  • Alarms: RDS CPU/disk, API 4xx/5xx, Lambda errors, ECS tasks, Step Functions
  • Composite alarm → SNS alerts
  • Dashboard with all metrics

End-to-End Data Flow

User → API Gateway (JWT) → POST /api/v1/data/sync
    → backend:8000 → data-collection:8002 → CCXT(Binance) → ArcticDB(S3)

User → API Gateway → POST /api/v1/backtests → SQS FIFO
    → Lambda(backtest-consumer) → Step Functions
    → ValidateStrategy → EnsureData → RunBacktest(ECS strategy-container)
    → Freqtrade backtest → S3 results + MLflow experiments
    → DynamoDB workflow_state (PENDING→RUNNING→COMPLETED)

User → API Gateway → GET /api/v1/backtests/{job_id} → status + results

Monitoring:
    Lambda health-check (2min) → ping services → CloudWatch + DynamoDB
    Lambda orphan-scanner (5min) → find orphaned ECS tasks
    Lambda drift-monitor (12h) → PSI analysis → SNS alerts
    Lambda retraining-scheduler (6h) → evaluate + launch retraining

Expected Artifacts

Artifact Storage Producer
OHLCV market data S3 tradai-arcticdb-dev data-collection
Strategy configs S3 tradai-configs-dev strategy-service
Backtest results S3 tradai-results-dev strategy-container (ECS)
MLflow experiments RDS PostgreSQL + S3 tradai-mlflow-dev strategy-container → MLflow
Workflow state DynamoDB workflow_state Lambda update-status, backend
Health metrics CloudWatch + DynamoDB health_state Lambda health-check
Drift metrics DynamoDB drift_state + CloudWatch Lambda drift-monitor
Alerts SNS → Email Lambda monitors
ALB/audit logs S3 tradai-logs-dev ALB, CloudTrail
Container logs CloudWatch Logs ECS, Lambda

Service Endpoints (Backend API)

Backtest Management

  • POST /api/v1/backtests — submit backtest job
  • GET /api/v1/backtests — list backtests (paginated)
  • GET /api/v1/backtests/{job_id} — get status
  • POST /api/v1/backtests/{job_id}/cancel — cancel
  • GET /api/v1/backtests/{job_id}/equity — equity curve
  • GET /api/v1/backtests/{job_id}/report-data — full report

Data Operations

  • POST /api/v1/sync — sync OHLCV data
  • POST /api/v1/sync/incremental — incremental sync
  • GET /api/v1/freshness — data freshness
  • GET /api/v1/symbols — available symbols
  • GET /api/v1/ohlcv — export OHLCV
  • POST /api/v1/export — export with POST

Strategy Catalog

  • GET /api/v1/catalog/strategies — list strategies
  • GET /api/v1/catalog/strategies/{name} — details
  • GET /api/v1/catalog/leaderboard — leaderboard

Model Management

  • POST /api/v1/promote-model — promote
  • POST /api/v1/demote-model — demote
  • GET /api/v1/model-promotion-history — history
  • POST /api/v1/compare-models — compare

Strategy Service Endpoints (internal, :8003)

  • GET/POST /api/v1/configs — strategy configs
  • POST /api/v1/strategies/register — register strategy
  • POST /api/v1/hyperopt/run — hyperopt
  • POST /api/v1/challenges/create — A/B testing

Data Collection Endpoints (internal, :8002)

  • POST /api/v1/sync — sync OHLCV
  • POST /api/v1/sync/incremental — incremental sync
  • GET /api/v1/freshness — data freshness
  • GET /api/v1/symbols — available symbols
  • GET /api/v1/ohlcv — export data