TradAI Final Architecture - Cost Analysis¶
Version: 9.2.3 | Date: 2026-03-28
TL;DR: Base platform $122-128/month (reserved vs on-demand RDS). Per live strategy +$61/month. Key optimizations: NAT Instance over Gateway, VPC endpoint removal, Fargate Spot, single-AZ RDS, reserved RDS.
Cost Breakdown (Read This First)
- Detailed sum: $127.70/month (all optimizations applied, on-demand RDS)
- With Reserved RDS (1-year no-upfront): $121.70/month (recommended)
- If VPC endpoints kept deployed: $150-156/month (add ~$66 for 9 interface endpoints)
The optimized column assumes VPC endpoints are removed and traffic routed through the NAT instance. The $6 reserved RDS savings requires a 1-year commitment (Section 7, Optimization 7). Strategy Service is correctly classified as an always-on service (port 8003, desired_count: 1).
Cost Summary¶
Before vs After Optimization¶
| Metric | Original (v5.1) | Corrected Baseline | Optimized (v8.0) |
|---|---|---|---|
| Documented Cost | $109/mo | $244/mo | $122-128/mo |
| Accuracy | Low (missing components) | High | High |
| Savings | - | Baseline | 48-50% |
Detailed Cost Breakdown¶
1. Compute - Long-Running Services¶
| Service | vCPU | Memory | Hours/Month | Rate | Monthly Cost |
|---|---|---|---|---|---|
| Backend API | 0.5 | 1 GB | 730 | $0.020/hr | $14.60 |
| Data Collection | 0.25 | 512 MB | 730 | $0.010/hr | $7.30 |
| MLflow | 0.5 | 1 GB | 730 | $0.020/hr | $14.60 |
| Strategy Service | 0.5 | 1 GB | 730 | $0.020/hr | $14.60 |
| Subtotal | $51.10 |
Note: Strategy Service (
strategy-service, port 8003,desired_count: 1) is an always-on ECS service that handles strategy API requests. It is distinct fromstrategy-container(no port,desired_count: 0), which is the on-demand backtest runner.
2. Compute - On-Demand Tasks¶
Assumptions: 50 backtests/month, 20 data syncs/month
| Task | vCPU | Memory | Avg Duration | Runs/Month | Monthly Cost |
|---|---|---|---|---|---|
| Strategy Container* | 1 | 2 GB | 30 min | 50 | $5.00 |
| Data Collection Task | 0.5 | 1 GB | 15 min | 20 | $1.00 |
| Subtotal (On-Demand) | $6.00 | ||||
| With Fargate Spot (70% savings) | $1.50 |
*Strategy Container (strategy-container, desired_count: 0) uses Fargate Spot for cost optimization
3. Networking¶
| Component | Configuration | Baseline Cost | Optimized Cost |
|---|---|---|---|
| NAT Gateway (2 AZs) | Standard | $70.20 | - |
| NAT Instance (t4g.nano) | 1 instance | - | $6.13 |
| ALB | 1 LB, minimal traffic | $17.00 | $17.00 |
| VPC Endpoints (ECR, Secrets, etc.) | 9 interface endpoints | $65.70 | $0* |
| Subtotal | $115.20 | $23.13 |
*VPC Endpoints removed - using NAT Instance for all AWS API calls
4. Storage & Database¶
| Component | Configuration | Baseline Cost | Optimized Cost |
|---|---|---|---|
| RDS PostgreSQL | db.t4g.micro, Multi-AZ | $36.00 | $18.00* |
| S3 Storage | 100 GB | $2.50 | $1.00** |
| S3 Requests | 100K requests | $0.50 | $0.50 |
| ECR | 50 GB images | $5.00 | $5.00 |
| DynamoDB | On-demand, ~100 ops/day | $2.00 | $2.00 |
| Subtotal | $46.00 | $26.50 |
Single-AZ for dev environment *With lifecycle policies (delete temp after 7 days)
5. Serverless¶
| Component | Configuration | Monthly Cost |
|---|---|---|
| AWS API Gateway | HTTP API, ~50K requests | $3.50 |
| Lambda (18 functions: 7 required + 10 optional + 1 NAT utility) | ~10K invocations | $0.00 (free tier) |
| Step Functions | ~500 transitions | $0.50 |
| SQS (FIFO) | ~1K messages | $0.40 |
| Subtotal | $4.40 |
6. Observability¶
| Component | Configuration | Baseline Cost | Optimized Cost |
|---|---|---|---|
| CloudWatch Logs | 10 GB ingested/month | $5.30 | $5.07* |
| CloudWatch Metrics | Custom metrics | $3.00 | $3.00 |
| CloudWatch Alarms | 10 alarms | $1.00 | $1.00 |
| CloudWatch Dashboard | 1 dashboard | $3.00 | $3.00 |
| Subtotal | $12.30 | $12.07 |
*7-day retention instead of 30-day (storage savings minimal -- ingestion is the dominant cost)
7. Security¶
| Component | Configuration | Monthly Cost |
|---|---|---|
| Secrets Manager | 5 secrets | $2.00 |
| CloudTrail | 1 trail | $2.00 |
| Cognito | < 50K MAU | $0.00 (free tier) |
| WAF | Web ACL + managed rules | $5.00 |
| Subtotal | $9.00 |
8. Live Trading (v9.1)¶
Note: Live trading costs are per-strategy and added on top of the base platform.
| Component | Specification | Per Strategy/Month |
|---|---|---|
| ECS Fargate (24/7) | 1 vCPU, 2GB | $30.66 |
| CloudWatch Logs | ~1 GB/month | $0.50 |
| DynamoDB | ~1M reads (heartbeats) | $0.25 |
| Secrets Manager | 2 secrets (exchange keys) | $0.80 |
| EventBridge | 8,640 invocations/month | $0.01 |
| Lambda (heartbeat-check) | 8,640 × 256MB × 1s | $0.22 |
| SNS | ~100 alerts | $0.01 |
| Infrastructure Subtotal | $32.45 | |
| Reserve (exchange fees, data feeds) | +$29.00 | |
| Total per Strategy | ~$61/month |
Example: 3 Live Strategies
| Item | Monthly Cost |
|---|---|
| Base Platform (Phases 1-5) | $122-128 |
| Pascal Strategy (live) | $61 |
| Momentum Strategy (dry-run) | $61 |
| ML Trend Strategy (live) | $61 |
| Platform Total | $305-311 |
Total Cost Comparison¶
Monthly Cost Summary (Base Platform)¶
| Category | Baseline | Optimized | Savings |
|---|---|---|---|
| Long-Running Services | $51.10 | $51.10 | $0 |
| On-Demand Tasks | $6.00 | $1.50 | $4.50 |
| Networking | $115.20 | $23.13 | $92.07 |
| Storage & Database | $46.00 | $26.50 | $19.50 |
| Serverless | $4.40 | $4.40 | $0 |
| Observability | $12.30 | $12.07 | $0.23 |
| Security | $9.00 | $9.00 | $0 |
| BASE PLATFORM | $244.00 | $127.70 | $116.30 |
| With Reserved RDS (1yr, -$6) | $121.70 | $122.30 |
Note: The optimized column includes VPC endpoint removal (~$66 savings already applied in networking). If VPC endpoints remain deployed, add ~$66 to the optimized total. Strategy Service is an always-on ECS service (port 8003, desired_count: 1), now correctly listed under Long-Running Services.
Monthly Cost Summary (With Live Trading - v9.1)¶
| Component | Cost |
|---|---|
| Base Platform (optimized) | $122-128 |
| Live Strategy #1 | $61 |
| Live Strategy #2 | $61 |
| Live Strategy #3 | $61 |
| TOTAL (3 strategies) | $305-311 |
Live trading costs scale linearly with number of strategies.
Cost by Category (Optimized)¶
pie title Monthly Cost Breakdown ($127.70)
"Long-Running Services" : 51.10
"Storage & Database" : 26.50
"Networking" : 23.13
"Observability" : 12.07
"Security" : 9.00
"Serverless" : 4.40
"On-Demand Tasks" : 1.50 Optimization Details¶
Optimization 1: NAT Instance vs NAT Gateway¶
NAT Gateway (2 AZs):
├─ Hourly charge: $0.045/hr × 730 hrs × 2 = $65.70
├─ Data processing: $0.045/GB × ~100 GB = $4.50
└─ Total: ~$70/month
NAT Instance (t4g.nano):
├─ Instance: $0.0042/hr × 730 hrs = $3.07
├─ EIP: $0 (attached to instance)
├─ Data: Included
└─ Total: ~$3/month + some overhead = $6.13
SAVINGS: ~$64/month (91% reduction)
Trade-offs:
- Lower bandwidth (5 Gbps vs 45 Gbps) - acceptable for our workload
- Self-managed (patching, monitoring) - minimal effort with ASG
- Single point of failure - mitigated with ASG health checks
Optimization 2: VPC Endpoint Removal (NOT Applied)¶
VPC Interface Endpoints (currently deployed -- see 03-VPC-NETWORKING.md):
├─ ECR API: ~$7.30/mo/AZ × 2 AZs = $14.60
├─ ECR DKR: ~$7.30/mo/AZ × 2 AZs = $14.60
├─ STS: ~$7.30/mo/AZ × 2 AZs = $14.60
├─ Secrets Manager: ~$7.30/mo/AZ × 2 AZs = $14.60
├─ CloudWatch Logs: ~$7.30/mo/AZ × 2 AZs = $14.60
├─ SSM: ~$7.30/mo/AZ × 2 AZs = $14.60
├─ SSM Messages: ~$7.30/mo/AZ × 2 AZs = $14.60
├─ EC2 Messages: ~$7.30/mo/AZ × 2 AZs = $14.60
├─ SQS: ~$7.30/mo/AZ × 2 AZs = $14.60
└─ Total: ~$65.70/month (9 interface endpoints)
NOTE: This optimization was NOT applied. The 9 interface endpoints
ARE deployed (per 03-VPC-NETWORKING.md Section 6.2). The cost tables
above assume endpoint removal ($0 in Networking optimized column),
but actual spend includes ~$65.70/month for these endpoints.
Potential savings if endpoints are removed:
├─ Route all traffic through NAT Instance instead
├─ Additional NAT traffic: ~10 GB/month = $0.45/month
└─ SAVINGS: ~$65/month
Trade-offs of removal:
- Slightly higher latency for AWS API calls (~10ms)
- Traffic goes through internet (still encrypted via TLS)
- Acceptable for non-high-frequency operations
- SSM Session Manager would require NAT for access
Optimization 3: RDS Single-AZ¶
RDS Multi-AZ:
├─ Primary: $18/month
├─ Standby: $18/month
└─ Total: $36/month
RDS Single-AZ:
└─ Total: $18/month
SAVINGS: $18/month (50% reduction)
Trade-offs:
- No automatic failover
- 15-30 minute recovery on failure
- Acceptable for dev/staging environments
Production recommendation:
- Keep Multi-AZ for production
- Use Single-AZ for dev/staging
Optimization 4: Fargate Spot¶
Fargate On-Demand (Strategy Tasks):
├─ 50 runs × 30 min × $0.04/hr = $10/month
└─ Total: ~$10/month
Fargate Spot:
├─ 70% discount on Fargate pricing
├─ 50 runs × 30 min × $0.012/hr = $3/month
└─ Total: ~$3/month
SAVINGS: $7/month (70% reduction)
Trade-offs:
- Tasks may be interrupted (2-minute warning)
- Mitigation: Checkpoint progress to S3
- Acceptable for batch processing (backtests)
Optimization 5: CloudWatch Logs Retention¶
CloudWatch Logs pricing:
├─ Ingestion: $0.50/GB (one-time, on ingest)
├─ Storage: $0.03/GB-month (NOT per-day — this is a monthly rate)
30-day retention:
├─ 10 GB × $0.50/GB ingestion = $5.00
├─ 10 GB × $0.03/GB-month storage = $0.30
└─ Total: ~$5.30/month
7-day retention:
├─ 10 GB × $0.50/GB ingestion = $5.00
├─ ~2.3 GB avg stored × $0.03/GB-month = $0.07
└─ Total: ~$5.07/month
SAVINGS: $0.23/month (negligible — ingestion dominates CW Logs cost)
Note: Reducing retention saves very little money. To meaningfully cut
CW Logs costs, reduce log VOLUME (structured logging, sampling) or
export to S3 ($0.023/GB-month) for long-term storage.
Trade-offs:
- Shorter debugging window
- Mitigation: Export important logs to S3 for long-term storage
Optimization 6: S3 Lifecycle Policies¶
Without lifecycle:
├─ Temp files accumulate
├─ 100 GB × $0.023/GB = $2.30
└─ Growing over time
With lifecycle:
├─ Temp files deleted after 7 days
├─ Results archived to Glacier after 30 days
├─ ~40 GB average = $1.00
└─ Stable cost
SAVINGS: $1.30/month + prevents cost growth
Optimization 7: Reserved Capacity (Optional)¶
RDS On-Demand:
└─ db.t4g.micro: $18/month
RDS Reserved (1-year, no upfront):
└─ db.t4g.micro: $12/month
SAVINGS: $6/month (33% reduction)
Requirements:
- 1-year commitment
- Wait 1 month to verify stable usage
Cost Scaling Analysis¶
Cost at Different Usage Levels¶
| Backtests/Month | 10 | 50 | 100 | 200 |
|---|---|---|---|---|
| On-Demand Tasks | $0.30 | $1.50 | $3.00 | $6.00 |
| Step Functions | $0.10 | $0.50 | $1.00 | $2.00 |
| S3 (results) | $0.50 | $1.00 | $2.00 | $4.00 |
| Variable Cost | $0.90 | $3.00 | $6.00 | $12.00 |
| Fixed Cost | $124.70 | $124.70 | $124.70 | $124.70 |
| Total | $125.60 | $127.70 | $130.70 | $136.70 |
Break-Even Analysis¶
Current architecture supports 50-200 backtests/month at ~$128-137/month
At 500+ backtests/month:
- Consider provisioned Step Functions
- Consider reserved Fargate capacity
- Estimated cost: ~$165/month
At 1000+ backtests/month:
- Consider dedicated EC2 instances for backtesting
- Consider Kubernetes (EKS) for orchestration
- Estimated cost: ~$315/month
Cost Monitoring¶
Budget Alerts¶
Budget Configuration:
├─ Monthly Budget: $135
├─ Alert 1: 50% ($68) - Email notification
├─ Alert 2: 80% ($108) - Email + Slack notification
├─ Alert 3: 100% ($135) - Email + Slack + PagerDuty
└─ Forecast Alert: 110% projected - Email notification
Cost Allocation Tags¶
Required Tags:
├─ Application: tradai
├─ Environment: production | staging | dev
├─ Service: backend-api | data-collection | mlflow | strategy
├─ Owner: team-name
└─ CostCenter: trading-platform
Cost Explorer Queries¶
Monthly by Service:
- Filter: Application = tradai
- Group by: Service tag
- Time: Last 3 months
Daily Trend:
- Filter: Application = tradai
- Group by: Day
- Time: Last 30 days
Anomaly Detection:
- Enable AWS Cost Anomaly Detection
- Threshold: 20% above normal
- Alert: Email + Slack
Future Cost Optimizations (Phase 2)¶
| Optimization | Potential Savings | Effort | Priority |
|---|---|---|---|
| Graviton instances for ECS | 20% on compute | Medium | High |
| Spot instances for MLflow | $5/month | Low | Medium |
| S3 Intelligent Tiering | $0.50/month | Low | Low |
| Reserved Fargate | 30% on compute | Low | Medium |
| Right-sizing after 1 month | 10-20% | Medium | High |
Summary¶
Base platform: $122-128/month
After optimizations (NAT instance, VPC endpoint removal, single-AZ RDS, Fargate Spot, S3 lifecycle, reserved capacity), the base platform runs at $121.70/month (with reserved RDS) to $127.70/month (on-demand RDS) -- a 48-50% reduction from the $244/month corrected baseline. Variable cost is only $0.06 per backtest. Strategy Service is correctly classified as an always-on service (port 8003, desired_count: 1).
Per live strategy: ~$61/month
Each live trading strategy adds ~$61/month (ECS Fargate 24/7 at 1 vCPU/2GB + monitoring + exchange fee reserve). Costs scale linearly: 3 live strategies bring the total to $305-311/month.
| Metric | Value |
|---|---|
| Final Monthly Cost | $122-128 |
| Cost per Backtest | $0.06 |
| Fixed Costs | $125/month |
| Variable Costs | $0.06/backtest |
| Savings vs Baseline | 48-50% |
Next Steps¶
- Review 09-PULUMI-CODE.md for infrastructure deployment code
- Set up budget alerts in AWS Cost Explorer
- Apply cost allocation tags to all resources
See Also¶
- Canonical Config -- all infrastructure configuration values
- Services -- ECS service and Lambda function details
- Deployment Pipeline -- how infrastructure is deployed
Changelog¶
| Version | Date | Changes |
|---|---|---|
| 9.2.3 | 2026-03-28 | Moved Strategy Service from On-Demand Tasks to Long-Running Services (port 8003, desired_count: 1 is always-on); distinguished from strategy-container (no port, desired_count: 0, on-demand backtest runner); corrected Lambda count 17->18; updated all totals ($107-114 -> $122-128) |
| 9.2.2 | 2026-03-28 | Cost reconciliation: fixed conflicting totals ($78-99 -> $107-114), Lambda memory (128->256MB per lambda_funcs.py), NAT Gateway ($64.80->$70.20 matching optimization detail), CW Logs formula (per-day -> per-GB-month), per-strategy cost ($46 -> $61 for 1vCPU/2GB), added TL;DR/cost cascade/changelog |
| 9.2 | 2025-12-09 | Initial cost analysis with 7 optimizations |
Dependencies¶
| If This Changes | Update This Doc |
|---|---|
infra/shared/tradai_infra_shared/config.py SERVICES dict | Service compute costs (Section 1-2) |
infra/foundation/modules/nat_instance.py NAT config | Networking costs (Section 3) |
infra/persistent/modules/s3.py, infra/foundation/modules/rds.py | Storage & Database costs (Section 4) |
infra/compute/modules/lambda_funcs.py LAMBDA_CONFIGS | Serverless costs (Section 5) |
| AWS pricing changes | All cost estimates |