TradAI Final Architecture - Cost Analysis Version: 9.2 | Date: 2025-12-09
Cost Summary Before vs After Optimization Metric Original (v5.1) Corrected Baseline Optimized (v8.0) Documented Cost $109/mo $220/mo $78/mo Accuracy Low (missing components) High High Savings - Baseline 64%
Detailed Cost Breakdown 1. Compute - Long-Running Services Service vCPU Memory Hours/Month Rate Monthly Cost Backend API 0.5 1 GB 730 $0.020/hr $14.60 Data Collection 0.25 512 MB 730 $0.010/hr $7.30 MLflow 0.5 1 GB 730 $0.020/hr $14.60 Subtotal $36.50
2. Compute - On-Demand Tasks Assumptions: 50 backtests/month, 20 data syncs/month
Task vCPU Memory Avg Duration Runs/Month Monthly Cost Strategy Service 0.5 1 GB 10 min 50 $1.67 Strategy Container* 1 2 GB 30 min 50 $5.00 Data Collection Task 0.5 1 GB 15 min 20 $1.00 Subtotal (On-Demand) $7.67 With Fargate Spot (70% savings) $1.92
*Strategy Container uses Fargate Spot for cost optimization
3. Networking Component Configuration Baseline Cost Optimized Cost NAT Gateway (2 AZs) Standard $64.80 - NAT Instance (t4g.nano) 1 instance - $6.13 ALB 1 LB, minimal traffic $17.00 $17.00 VPC Endpoints (ECR, Secrets) 4 interface endpoints $28.00 $0* Subtotal $109.80 $23.13
*VPC Endpoints removed - using NAT Instance for all AWS API calls
4. Storage & Database Component Configuration Baseline Cost Optimized Cost RDS PostgreSQL db.t4g.micro, Multi-AZ $36.00 $18.00* S3 Storage 100 GB $2.50 $1.00** S3 Requests 100K requests $0.50 $0.50 ECR 50 GB images $5.00 $5.00 DynamoDB On-demand, ~100 ops/day $2.00 $2.00 Subtotal $46.00 $26.50
Single-AZ for dev environment *With lifecycle policies (delete temp after 7 days)
5. Serverless Component Configuration Monthly Cost AWS API Gateway HTTP API, ~50K requests $3.50 Lambda (8 functions) ~10K invocations $0.00 (free tier) Step Functions ~500 transitions $0.50 SQS (FIFO) ~1K messages $0.40 Subtotal $4.40
6. Observability Component Configuration Baseline Cost Optimized Cost CloudWatch Logs 10 GB, 30-day retention $10.00 $5.00* CloudWatch Metrics Custom metrics $3.00 $3.00 CloudWatch Alarms 10 alarms $1.00 $1.00 CloudWatch Dashboard 1 dashboard $3.00 $3.00 Subtotal $17.00 $12.00
*7-day retention instead of 30-day
7. Security Component Configuration Monthly Cost Secrets Manager 5 secrets $2.00 CloudTrail 1 trail $2.00 Cognito < 50K MAU $0.00 (free tier) WAF Web ACL + managed rules $5.00 Subtotal $9.00
8. Live Trading (v9.1) Note: Live trading costs are per-strategy and added on top of the base platform.
Component Specification Per Strategy/Month ECS Fargate (24/7) 0.5 vCPU, 1GB $15.33 CloudWatch Logs ~1 GB/month $0.50 DynamoDB ~1M reads (heartbeats) $0.25 Secrets Manager 2 secrets (exchange keys) $0.80 EventBridge 8,640 invocations/month $0.01 Lambda (health-check) 8,640 × 128MB × 1s $0.11 SNS ~100 alerts $0.01 Infrastructure Subtotal $17.01 Reserve (exchange fees, data feeds) +$29.00 Total per Strategy ~$46/month
Example: 3 Live Strategies
Item Monthly Cost Base Platform (Phases 1-5) $78-99 Pascal Strategy (live) $46 Momentum Strategy (dry-run) $46 ML Trend Strategy (live) $46 Platform Total $216-237
Total Cost Comparison Category Baseline Optimized Savings Long-Running Services $36.50 $36.50 $0 On-Demand Tasks $7.67 $1.92 $5.75 Networking $109.80 $23.13 $86.67 Storage & Database $46.00 $26.50 $19.50 Serverless $4.40 $4.40 $0 Observability $17.00 $12.00 $5.00 Security $9.00 $9.00 $0 BASE PLATFORM $230.37 $113.45 $116.92 With Reserved RDS (1yr) $99.45 $130.92 Final Target (Backtesting only) ~$78-99/mo 57-66%
Monthly Cost Summary (With Live Trading - v9.1) Component Cost Base Platform (optimized) $78-99 Live Strategy #1 $46 Live Strategy #2 $46 Live Strategy #3 $46 TOTAL (3 strategies) $216-237
Live trading costs scale linearly with number of strategies.
Cost by Category (Optimized) ┌─────────────────────────────────────────────────────────────┐
│ Monthly Cost: $99.45 │
├─────────────────────────────────────────────────────────────┤
│ │
│ Long-Running Services ████████████████████ $36.50 (37%)│
│ Storage & Database ███████████ $26.50 (27%)│
│ Networking █████████ $23.13 (23%)│
│ Observability █████ $12.00 (12%)│
│ Security ███ $9.00 (9%) │
│ Serverless ██ $4.40 (4%) │
│ On-Demand Tasks █ $1.92 (2%) │
│ │
└─────────────────────────────────────────────────────────────┘
Optimization Details Optimization 1: NAT Instance vs NAT Gateway NAT Gateway (2 AZs):
├─ Hourly charge: $0.045/hr × 730 hrs × 2 = $65.70
├─ Data processing: $0.045/GB × ~100 GB = $4.50
└─ Total: ~$70/month
NAT Instance (t4g.nano):
├─ Instance: $0.0042/hr × 730 hrs = $3.07
├─ EIP: $0 (attached to instance)
├─ Data: Included
└─ Total: ~$3/month + some overhead = $6.13
SAVINGS: $64/month (91% reduction)
Trade-offs:
- Lower bandwidth (5 Gbps vs 45 Gbps) - acceptable for our workload
- Self-managed (patching, monitoring) - minimal effort with ASG
- Single point of failure - mitigated with ASG health checks
Optimization 2: Remove VPC Endpoints VPC Interface Endpoints:
├─ ECR API: $7/mo × 2 AZs = $14
├─ ECR DKR: $7/mo × 2 AZs = $14
├─ Secrets Manager: $7/mo × 2 AZs = $14 (optional)
└─ Total: $28-42/month
Alternative: Route through NAT Instance
├─ Additional NAT traffic: ~10 GB/month
├─ Cost: $0.045/GB × 10 GB = $0.45/month
└─ Total: ~$0.45/month
SAVINGS: $28-42/month
Trade-offs:
- Slightly higher latency for AWS API calls (~10ms)
- Traffic goes through internet (still encrypted)
- Acceptable for non-high-frequency operations
Optimization 3: RDS Single-AZ RDS Multi-AZ:
├─ Primary: $18/month
├─ Standby: $18/month
└─ Total: $36/month
RDS Single-AZ:
└─ Total: $18/month
SAVINGS: $18/month (50% reduction)
Trade-offs:
- No automatic failover
- 15-30 minute recovery on failure
- Acceptable for dev/staging environments
Production recommendation:
- Keep Multi-AZ for production
- Use Single-AZ for dev/staging
Optimization 4: Fargate Spot Fargate On-Demand (Strategy Tasks):
├─ 50 runs × 30 min × $0.04/hr = $10/month
└─ Total: ~$10/month
Fargate Spot:
├─ 70% discount on Fargate pricing
├─ 50 runs × 30 min × $0.012/hr = $3/month
└─ Total: ~$3/month
SAVINGS: $7/month (70% reduction)
Trade-offs:
- Tasks may be interrupted (2-minute warning)
- Mitigation: Checkpoint progress to S3
- Acceptable for batch processing (backtests)
Optimization 5: CloudWatch Logs Retention 30-day retention:
├─ 10 GB × $0.50/GB ingestion = $5
├─ 10 GB × $0.03/GB storage × 30 days = $9
└─ Total: ~$14/month
7-day retention:
├─ 10 GB × $0.50/GB ingestion = $5
├─ 10 GB × $0.03/GB storage × 7 days = $2.10
└─ Total: ~$7/month
SAVINGS: $7/month (50% reduction)
Trade-offs:
- Shorter debugging window
- Mitigation: Export important logs to S3 for long-term storage
Optimization 6: S3 Lifecycle Policies Without lifecycle:
├─ Temp files accumulate
├─ 100 GB × $0.023/GB = $2.30
└─ Growing over time
With lifecycle:
├─ Temp files deleted after 7 days
├─ Results archived to Glacier after 30 days
├─ ~40 GB average = $1.00
└─ Stable cost
SAVINGS: $1.30/month + prevents cost growth
Optimization 7: Reserved Capacity (Optional) RDS On-Demand:
└─ db.t4g.micro: $18/month
RDS Reserved (1-year, no upfront):
└─ db.t4g.micro: $12/month
SAVINGS: $6/month (33% reduction)
Requirements:
- 1-year commitment
- Wait 1 month to verify stable usage
Cost Scaling Analysis Cost at Different Usage Levels Backtests/Month 10 50 100 200 On-Demand Tasks $0.40 $1.92 $3.84 $7.68 Step Functions $0.10 $0.50 $1.00 $2.00 S3 (results) $0.50 $1.00 $2.00 $4.00 Variable Cost $1.00 $3.42 $6.84 $13.68 Fixed Cost $95.03 $95.03 $95.03 $95.03 Total $96.03 $98.45 $101.87 $108.71
Break-Even Analysis Current architecture supports 50-200 backtests/month at ~$100/month
At 500+ backtests/month:
- Consider provisioned Step Functions
- Consider reserved Fargate capacity
- Estimated cost: ~$150/month
At 1000+ backtests/month:
- Consider dedicated EC2 instances for backtesting
- Consider Kubernetes (EKS) for orchestration
- Estimated cost: ~$300/month
Cost Monitoring Budget Alerts Budget Configuration:
├─ Monthly Budget: $100
├─ Alert 1: 50% ($50) - Email notification
├─ Alert 2: 80% ($80) - Email + Slack notification
├─ Alert 3: 100% ($100) - Email + Slack + PagerDuty
└─ Forecast Alert: 110% projected - Email notification
Required Tags:
├─ Application: tradai
├─ Environment: production | staging | dev
├─ Service: backend-api | data-collection | mlflow | strategy
├─ Owner: team-name
└─ CostCenter: trading-platform
Cost Explorer Queries Monthly by Service:
- Filter: Application = tradai
- Group by: Service tag
- Time: Last 3 months
Daily Trend:
- Filter: Application = tradai
- Group by: Day
- Time: Last 30 days
Anomaly Detection:
- Enable AWS Cost Anomaly Detection
- Threshold: 20% above normal
- Alert: Email + Slack
Future Cost Optimizations (Phase 2) Optimization Potential Savings Effort Priority Graviton instances for ECS 20% on compute Medium High Spot instances for MLflow $5/month Low Medium S3 Intelligent Tiering $0.50/month Low Low Reserved Fargate 30% on compute Low Medium Right-sizing after 1 month 10-20% Medium High
Summary Metric Value Final Monthly Cost $78-99 Cost per Backtest $0.07 Fixed Costs $95/month Variable Costs $0.07/backtest Savings vs Baseline 57-66%
Next Steps Review 08-IMPLEMENTATION-ROADMAP.md for deployment plan Set up budget alerts in AWS Cost Explorer Apply cost allocation tags to all resources