Production Deployment Checklist¶

Comprehensive checklist for deploying TradAI to production.

Pre-Deployment¶

Code Quality¶

All tests pass: just test
Type checking passes: just typecheck
Linting passes: just lint
80%+ test coverage for new code (CI gate: 60%, target: 80%)
No # type: ignore comments without justification
No hardcoded secrets or credentials

Strategy Validation¶

Preflight validation passes
Backtests run successfully on production data sample
Sanity checks pass (drawdown limits, trade frequency)
Model drift monitoring configured

Documentation¶

README updated for new features
API documentation updated
Environment variables documented
Runbook updated for new failure modes

Infrastructure¶

AWS Resources¶

VPC and subnets configured
Security groups reviewed (least privilege)
IAM roles created with minimal permissions
S3 buckets created with encryption enabled
DynamoDB tables provisioned
Secrets Manager secrets created

ECS Configuration¶

Task definitions reviewed
Container resource limits set (CPU, memory)
Health checks configured
Auto-scaling policies defined
Service discovery registered

Database¶

RDS/Aurora instance sized appropriately
Backup retention configured
Multi-AZ enabled for production
Connection pooling configured
Performance Insights enabled

Security¶

Authentication¶

Cognito User Pool configured (if applicable)
API Gateway authentication enabled
JWT validation configured
Token expiration settings reviewed

Secrets Management¶

All secrets in AWS Secrets Manager
Rotation policies configured
Secrets are injected from AWS Secrets Manager into container environment variables at runtime -- never stored as plaintext in config files or .env
No secrets in code or config files

Network Security¶

Services in private subnets
ALB in public subnet (if needed)
Security groups restrict ingress
VPC flow logs enabled
WAF rules configured (if applicable)

Monitoring¶

CloudWatch¶

Alerting¶

SNS topics configured
Alert notifications set up (email, Slack, PagerDuty)
On-call rotation defined
Escalation policy documented

MLflow¶

Experiment tracking configured
Model registry set up
Artifact storage configured (S3)
Authentication enabled

Data¶

ArcticDB¶

S3 bucket configured
Library created
Data backfill completed
Data quality validation passed

Exchange Connectivity¶

API keys configured in Secrets Manager
Rate limits configured
Fallback exchange configured (if applicable)

Deployment¶

CI/CD Pipeline¶

GitHub Actions configured
Docker images built and pushed to ECR
Deployment workflow tested
Rollback procedure documented and tested

Pulumi Infrastructure¶

# Preview all stacks
just infra-preview dev

# Deploy all stacks
just infra-bootstrap dev

# Verify outputs
just infra-outputs compute dev

Post-Deployment Verification¶

Rollback Plan¶

Automated Rollback¶

Configure ECS to automatically rollback on failed deployments:

# Task definition
deployment_circuit_breaker:
  enable: true
  rollback: true

Manual Rollback¶

# Revert to previous task definition
aws ecs update-service \
  --cluster tradai-prod \
  --service tradai-strategy-service-prod \
  --task-definition tradai-strategy-service-prod:PREVIOUS_VERSION

# Or revert Pulumi stack
cd infra/compute && source ../.env && pulumi stack export --stack prod > backup.json
cd infra/compute && source ../.env && pulumi up --target-replace urn:pulumi:prod::tradai::aws:ecs/service:Service::strategy-service

Post-Deployment¶

Smoke Tests¶

Monitoring Check¶

Logs appearing in CloudWatch
Metrics being recorded
No error spikes in dashboards
Performance within expected bounds

Documentation Update¶

Deployment notes recorded
Known issues documented
Changelog updated

Production Environment Variables¶

Ensure these are set in production:

# AWS
AWS_REGION=eu-central-1
# (IAM roles handle credentials)

# MLflow
MLFLOW_TRACKING_URI=https://mlflow.your-domain.com
MLFLOW_TRACKING_USERNAME=<from-secrets-manager>
MLFLOW_TRACKING_PASSWORD=<from-secrets-manager>

# Services
BACKEND_EXECUTOR_MODE=stepfunctions
STRATEGY_SERVICE_S3_CONFIG_BUCKET=tradai-configs-prod
STRATEGY_SERVICE_EXCHANGE_SECRET_NAME=tradai/exchange-credentials

# Data
DATA_COLLECTION_ARCTIC_S3_BUCKET=tradai-arcticdb-prod

Production Deployment Checklist¶

Pre-Deployment¶

Code Quality¶

Strategy Validation¶

Documentation¶

Infrastructure¶

AWS Resources¶

ECS Configuration¶

Database¶

Security¶

Authentication¶

Secrets Management¶

Network Security¶

Monitoring¶

CloudWatch¶

Alerting¶

MLflow¶

Data¶

ArcticDB¶

Exchange Connectivity¶

Deployment¶

CI/CD Pipeline¶

Pulumi Infrastructure¶

Post-Deployment Verification¶

Rollback Plan¶

Automated Rollback¶

Manual Rollback¶

Post-Deployment¶

Smoke Tests¶

Monitoring Check¶

Documentation Update¶

Production Environment Variables¶

See Also¶