Skip to content

TradAI Final Architecture - Cost Analysis

Version: 9.2.3 | Date: 2026-03-28

TL;DR: Base platform $122-128/month (reserved vs on-demand RDS). Per live strategy +$61/month. Key optimizations: NAT Instance over Gateway, VPC endpoint removal, Fargate Spot, single-AZ RDS, reserved RDS.

Cost Breakdown (Read This First)

  • Detailed sum: $127.70/month (all optimizations applied, on-demand RDS)
  • With Reserved RDS (1-year no-upfront): $121.70/month (recommended)
  • If VPC endpoints kept deployed: $150-156/month (add ~$66 for 9 interface endpoints)

The optimized column assumes VPC endpoints are removed and traffic routed through the NAT instance. The $6 reserved RDS savings requires a 1-year commitment (Section 7, Optimization 7). Strategy Service is correctly classified as an always-on service (port 8003, desired_count: 1).


Cost Summary

Before vs After Optimization

Metric Original (v5.1) Corrected Baseline Optimized (v8.0)
Documented Cost $109/mo $244/mo $122-128/mo
Accuracy Low (missing components) High High
Savings - Baseline 48-50%

Detailed Cost Breakdown

1. Compute - Long-Running Services

Service vCPU Memory Hours/Month Rate Monthly Cost
Backend API 0.5 1 GB 730 $0.020/hr $14.60
Data Collection 0.25 512 MB 730 $0.010/hr $7.30
MLflow 0.5 1 GB 730 $0.020/hr $14.60
Strategy Service 0.5 1 GB 730 $0.020/hr $14.60
Subtotal $51.10

Note: Strategy Service (strategy-service, port 8003, desired_count: 1) is an always-on ECS service that handles strategy API requests. It is distinct from strategy-container (no port, desired_count: 0), which is the on-demand backtest runner.

2. Compute - On-Demand Tasks

Assumptions: 50 backtests/month, 20 data syncs/month

Task vCPU Memory Avg Duration Runs/Month Monthly Cost
Strategy Container* 1 2 GB 30 min 50 $5.00
Data Collection Task 0.5 1 GB 15 min 20 $1.00
Subtotal (On-Demand) $6.00
With Fargate Spot (70% savings) $1.50

*Strategy Container (strategy-container, desired_count: 0) uses Fargate Spot for cost optimization

3. Networking

Component Configuration Baseline Cost Optimized Cost
NAT Gateway (2 AZs) Standard $70.20 -
NAT Instance (t4g.nano) 1 instance - $6.13
ALB 1 LB, minimal traffic $17.00 $17.00
VPC Endpoints (ECR, Secrets, etc.) 9 interface endpoints $65.70 $0*
Subtotal $115.20 $23.13

*VPC Endpoints removed - using NAT Instance for all AWS API calls

4. Storage & Database

Component Configuration Baseline Cost Optimized Cost
RDS PostgreSQL db.t4g.micro, Multi-AZ $36.00 $18.00*
S3 Storage 100 GB $2.50 $1.00**
S3 Requests 100K requests $0.50 $0.50
ECR 50 GB images $5.00 $5.00
DynamoDB On-demand, ~100 ops/day $2.00 $2.00
Subtotal $46.00 $26.50

Single-AZ for dev environment *With lifecycle policies (delete temp after 7 days)

5. Serverless

Component Configuration Monthly Cost
AWS API Gateway HTTP API, ~50K requests $3.50
Lambda (18 functions: 7 required + 10 optional + 1 NAT utility) ~10K invocations $0.00 (free tier)
Step Functions ~500 transitions $0.50
SQS (FIFO) ~1K messages $0.40
Subtotal $4.40

6. Observability

Component Configuration Baseline Cost Optimized Cost
CloudWatch Logs 10 GB ingested/month $5.30 $5.07*
CloudWatch Metrics Custom metrics $3.00 $3.00
CloudWatch Alarms 10 alarms $1.00 $1.00
CloudWatch Dashboard 1 dashboard $3.00 $3.00
Subtotal $12.30 $12.07

*7-day retention instead of 30-day (storage savings minimal -- ingestion is the dominant cost)

7. Security

Component Configuration Monthly Cost
Secrets Manager 5 secrets $2.00
CloudTrail 1 trail $2.00
Cognito < 50K MAU $0.00 (free tier)
WAF Web ACL + managed rules $5.00
Subtotal $9.00

8. Live Trading (v9.1)

Note: Live trading costs are per-strategy and added on top of the base platform.

Component Specification Per Strategy/Month
ECS Fargate (24/7) 1 vCPU, 2GB $30.66
CloudWatch Logs ~1 GB/month $0.50
DynamoDB ~1M reads (heartbeats) $0.25
Secrets Manager 2 secrets (exchange keys) $0.80
EventBridge 8,640 invocations/month $0.01
Lambda (heartbeat-check) 8,640 × 256MB × 1s $0.22
SNS ~100 alerts $0.01
Infrastructure Subtotal $32.45
Reserve (exchange fees, data feeds) +$29.00
Total per Strategy ~$61/month

Example: 3 Live Strategies

Item Monthly Cost
Base Platform (Phases 1-5) $122-128
Pascal Strategy (live) $61
Momentum Strategy (dry-run) $61
ML Trend Strategy (live) $61
Platform Total $305-311

Total Cost Comparison

Monthly Cost Summary (Base Platform)

Category Baseline Optimized Savings
Long-Running Services $51.10 $51.10 $0
On-Demand Tasks $6.00 $1.50 $4.50
Networking $115.20 $23.13 $92.07
Storage & Database $46.00 $26.50 $19.50
Serverless $4.40 $4.40 $0
Observability $12.30 $12.07 $0.23
Security $9.00 $9.00 $0
BASE PLATFORM $244.00 $127.70 $116.30
With Reserved RDS (1yr, -$6) $121.70 $122.30

Note: The optimized column includes VPC endpoint removal (~$66 savings already applied in networking). If VPC endpoints remain deployed, add ~$66 to the optimized total. Strategy Service is an always-on ECS service (port 8003, desired_count: 1), now correctly listed under Long-Running Services.

Monthly Cost Summary (With Live Trading - v9.1)

Component Cost
Base Platform (optimized) $122-128
Live Strategy #1 $61
Live Strategy #2 $61
Live Strategy #3 $61
TOTAL (3 strategies) $305-311

Live trading costs scale linearly with number of strategies.

Cost by Category (Optimized)

pie title Monthly Cost Breakdown ($127.70)
    "Long-Running Services" : 51.10
    "Storage & Database" : 26.50
    "Networking" : 23.13
    "Observability" : 12.07
    "Security" : 9.00
    "Serverless" : 4.40
    "On-Demand Tasks" : 1.50

Optimization Details

Optimization 1: NAT Instance vs NAT Gateway

NAT Gateway (2 AZs):
├─ Hourly charge: $0.045/hr × 730 hrs × 2 = $65.70
├─ Data processing: $0.045/GB × ~100 GB = $4.50
└─ Total: ~$70/month

NAT Instance (t4g.nano):
├─ Instance: $0.0042/hr × 730 hrs = $3.07
├─ EIP: $0 (attached to instance)
├─ Data: Included
└─ Total: ~$3/month + some overhead = $6.13

SAVINGS: ~$64/month (91% reduction)

Trade-offs:
- Lower bandwidth (5 Gbps vs 45 Gbps) - acceptable for our workload
- Self-managed (patching, monitoring) - minimal effort with ASG
- Single point of failure - mitigated with ASG health checks

Optimization 2: VPC Endpoint Removal (NOT Applied)

VPC Interface Endpoints (currently deployed -- see 03-VPC-NETWORKING.md):
├─ ECR API:          ~$7.30/mo/AZ × 2 AZs = $14.60
├─ ECR DKR:          ~$7.30/mo/AZ × 2 AZs = $14.60
├─ STS:              ~$7.30/mo/AZ × 2 AZs = $14.60
├─ Secrets Manager:  ~$7.30/mo/AZ × 2 AZs = $14.60
├─ CloudWatch Logs:  ~$7.30/mo/AZ × 2 AZs = $14.60
├─ SSM:              ~$7.30/mo/AZ × 2 AZs = $14.60
├─ SSM Messages:     ~$7.30/mo/AZ × 2 AZs = $14.60
├─ EC2 Messages:     ~$7.30/mo/AZ × 2 AZs = $14.60
├─ SQS:              ~$7.30/mo/AZ × 2 AZs = $14.60
└─ Total: ~$65.70/month (9 interface endpoints)

NOTE: This optimization was NOT applied. The 9 interface endpoints
ARE deployed (per 03-VPC-NETWORKING.md Section 6.2). The cost tables
above assume endpoint removal ($0 in Networking optimized column),
but actual spend includes ~$65.70/month for these endpoints.

Potential savings if endpoints are removed:
├─ Route all traffic through NAT Instance instead
├─ Additional NAT traffic: ~10 GB/month = $0.45/month
└─ SAVINGS: ~$65/month

Trade-offs of removal:
- Slightly higher latency for AWS API calls (~10ms)
- Traffic goes through internet (still encrypted via TLS)
- Acceptable for non-high-frequency operations
- SSM Session Manager would require NAT for access

Optimization 3: RDS Single-AZ

RDS Multi-AZ:
├─ Primary: $18/month
├─ Standby: $18/month
└─ Total: $36/month

RDS Single-AZ:
└─ Total: $18/month

SAVINGS: $18/month (50% reduction)

Trade-offs:
- No automatic failover
- 15-30 minute recovery on failure
- Acceptable for dev/staging environments

Production recommendation:
- Keep Multi-AZ for production
- Use Single-AZ for dev/staging

Optimization 4: Fargate Spot

Fargate On-Demand (Strategy Tasks):
├─ 50 runs × 30 min × $0.04/hr = $10/month
└─ Total: ~$10/month

Fargate Spot:
├─ 70% discount on Fargate pricing
├─ 50 runs × 30 min × $0.012/hr = $3/month
└─ Total: ~$3/month

SAVINGS: $7/month (70% reduction)

Trade-offs:
- Tasks may be interrupted (2-minute warning)
- Mitigation: Checkpoint progress to S3
- Acceptable for batch processing (backtests)

Optimization 5: CloudWatch Logs Retention

CloudWatch Logs pricing:
├─ Ingestion: $0.50/GB (one-time, on ingest)
├─ Storage: $0.03/GB-month (NOT per-day — this is a monthly rate)

30-day retention:
├─ 10 GB × $0.50/GB ingestion = $5.00
├─ 10 GB × $0.03/GB-month storage = $0.30
└─ Total: ~$5.30/month

7-day retention:
├─ 10 GB × $0.50/GB ingestion = $5.00
├─ ~2.3 GB avg stored × $0.03/GB-month = $0.07
└─ Total: ~$5.07/month

SAVINGS: $0.23/month (negligible — ingestion dominates CW Logs cost)

Note: Reducing retention saves very little money. To meaningfully cut
CW Logs costs, reduce log VOLUME (structured logging, sampling) or
export to S3 ($0.023/GB-month) for long-term storage.

Trade-offs:
- Shorter debugging window
- Mitigation: Export important logs to S3 for long-term storage

Optimization 6: S3 Lifecycle Policies

Without lifecycle:
├─ Temp files accumulate
├─ 100 GB × $0.023/GB = $2.30
└─ Growing over time

With lifecycle:
├─ Temp files deleted after 7 days
├─ Results archived to Glacier after 30 days
├─ ~40 GB average = $1.00
└─ Stable cost

SAVINGS: $1.30/month + prevents cost growth

Optimization 7: Reserved Capacity (Optional)

RDS On-Demand:
└─ db.t4g.micro: $18/month

RDS Reserved (1-year, no upfront):
└─ db.t4g.micro: $12/month

SAVINGS: $6/month (33% reduction)

Requirements:
- 1-year commitment
- Wait 1 month to verify stable usage

Cost Scaling Analysis

Cost at Different Usage Levels

Backtests/Month 10 50 100 200
On-Demand Tasks $0.30 $1.50 $3.00 $6.00
Step Functions $0.10 $0.50 $1.00 $2.00
S3 (results) $0.50 $1.00 $2.00 $4.00
Variable Cost $0.90 $3.00 $6.00 $12.00
Fixed Cost $124.70 $124.70 $124.70 $124.70
Total $125.60 $127.70 $130.70 $136.70

Break-Even Analysis

Current architecture supports 50-200 backtests/month at ~$128-137/month

At 500+ backtests/month:
- Consider provisioned Step Functions
- Consider reserved Fargate capacity
- Estimated cost: ~$165/month

At 1000+ backtests/month:
- Consider dedicated EC2 instances for backtesting
- Consider Kubernetes (EKS) for orchestration
- Estimated cost: ~$315/month

Cost Monitoring

Budget Alerts

Budget Configuration:
├─ Monthly Budget: $135
├─ Alert 1: 50% ($68) - Email notification
├─ Alert 2: 80% ($108) - Email + Slack notification
├─ Alert 3: 100% ($135) - Email + Slack + PagerDuty
└─ Forecast Alert: 110% projected - Email notification

Cost Allocation Tags

Required Tags:
├─ Application: tradai
├─ Environment: production | staging | dev
├─ Service: backend-api | data-collection | mlflow | strategy
├─ Owner: team-name
└─ CostCenter: trading-platform

Cost Explorer Queries

Monthly by Service:
- Filter: Application = tradai
- Group by: Service tag
- Time: Last 3 months

Daily Trend:
- Filter: Application = tradai
- Group by: Day
- Time: Last 30 days

Anomaly Detection:
- Enable AWS Cost Anomaly Detection
- Threshold: 20% above normal
- Alert: Email + Slack

Future Cost Optimizations (Phase 2)

Optimization Potential Savings Effort Priority
Graviton instances for ECS 20% on compute Medium High
Spot instances for MLflow $5/month Low Medium
S3 Intelligent Tiering $0.50/month Low Low
Reserved Fargate 30% on compute Low Medium
Right-sizing after 1 month 10-20% Medium High

Summary

Base platform: $122-128/month

After optimizations (NAT instance, VPC endpoint removal, single-AZ RDS, Fargate Spot, S3 lifecycle, reserved capacity), the base platform runs at $121.70/month (with reserved RDS) to $127.70/month (on-demand RDS) -- a 48-50% reduction from the $244/month corrected baseline. Variable cost is only $0.06 per backtest. Strategy Service is correctly classified as an always-on service (port 8003, desired_count: 1).

Per live strategy: ~$61/month

Each live trading strategy adds ~$61/month (ECS Fargate 24/7 at 1 vCPU/2GB + monitoring + exchange fee reserve). Costs scale linearly: 3 live strategies bring the total to $305-311/month.

Metric Value
Final Monthly Cost $122-128
Cost per Backtest $0.06
Fixed Costs $125/month
Variable Costs $0.06/backtest
Savings vs Baseline 48-50%

Next Steps

  1. Review 09-PULUMI-CODE.md for infrastructure deployment code
  2. Set up budget alerts in AWS Cost Explorer
  3. Apply cost allocation tags to all resources

See Also


Changelog

Version Date Changes
9.2.3 2026-03-28 Moved Strategy Service from On-Demand Tasks to Long-Running Services (port 8003, desired_count: 1 is always-on); distinguished from strategy-container (no port, desired_count: 0, on-demand backtest runner); corrected Lambda count 17->18; updated all totals ($107-114 -> $122-128)
9.2.2 2026-03-28 Cost reconciliation: fixed conflicting totals ($78-99 -> $107-114), Lambda memory (128->256MB per lambda_funcs.py), NAT Gateway ($64.80->$70.20 matching optimization detail), CW Logs formula (per-day -> per-GB-month), per-strategy cost ($46 -> $61 for 1vCPU/2GB), added TL;DR/cost cascade/changelog
9.2 2025-12-09 Initial cost analysis with 7 optimizations

Dependencies

If This Changes Update This Doc
infra/shared/tradai_infra_shared/config.py SERVICES dict Service compute costs (Section 1-2)
infra/foundation/modules/nat_instance.py NAT config Networking costs (Section 3)
infra/persistent/modules/s3.py, infra/foundation/modules/rds.py Storage & Database costs (Section 4)
infra/compute/modules/lambda_funcs.py LAMBDA_CONFIGS Serverless costs (Section 5)
AWS pricing changes All cost estimates