Pulumi Module Reference¶
Complete reference for all 28 infrastructure modules in /infra/modules/.
Module Inventory¶
Phase 0: State Management¶
| Module | File | Task ID | Description |
|---|---|---|---|
| Pulumi Backend | pulumi_backend.py | - | S3 bucket for Pulumi state, IAM roles for CI/CD |
Phase 1: Foundation¶
| Module | File | Task ID | Description |
|---|---|---|---|
| VPC Network | vpc.py | IF002 | VPC, 6 subnets (2 AZs, 3 tiers), IGW, route tables |
| VPC Endpoints | vpc_endpoints.py | SEC002 | Gateway endpoints for S3 and DynamoDB |
| VPC Flow Logs | vpc_flow_logs.py | SEC004 | Flow logs to CloudWatch for audit |
| Network ACLs | nacl.py | SEC005 | Stateless firewall rules per subnet tier |
| S3 Buckets | s3.py | IS001 | 5 buckets: configs, results, arcticdb, logs, mlflow |
| CloudTrail | cloudtrail.py | SEC003 | Audit logging to S3 and CloudWatch |
| DynamoDB Tables | dynamodb.py | IS003 | 8 tables for workflow/health/trading state |
| SNS Topics | sns.py | MN001, SR016 | Alert notifications and registration events |
| Security Groups | security_groups.py | IF004 | 5 SGs: ALB, ECS, Lambda, RDS, NAT |
| NAT Instance | nat_instance.py | IF003 | t4g.nano NAT with ASG for HA |
| RDS Database | rds.py | IS002 | PostgreSQL (v15.4) for MLflow |
| Secret Rotation | secret_rotation.py | SEC006 | RDS secret rotation (30-day schedule) |
| ECR Repositories | ecr.py | IS004 | 12 repos: 4 services + 8 Lambda images |
| CodeArtifact | codeartifact.py | SR003 | Private Python package repository |
Phase 2: Compute¶
| Module | File | Task ID | Description |
|---|---|---|---|
| IAM Roles | iam.py | IC001 | ECS execution role + task role |
| ECS Cluster | ecs.py | IC001, BE007 | Fargate cluster, strategy task definition |
| ALB | alb.py | IC002 | Application Load Balancer, listeners, target groups |
| SQS Queues | sqs.py | IO001 | Backtest queue + DLQ |
| Cognito Auth | cognito.py | DK005 | User pool with MFA, M2M client |
| ECS Services | ecs_services.py | IC003 | 4 services: backend, strategy, data, mlflow |
| Lambda Functions | lambda_funcs.py | IC004 | 8 container-image Lambdas |
| API Gateway | api_gateway.py | IC005 | HTTP API with 11 routes, Cognito auth |
| WAF | waf.py | SEC001 | WebACL with rate limiting |
Phase 3: Orchestration¶
| Module | File | Task ID | Description |
|---|---|---|---|
| Step Functions | step_functions.py | IO002, BE008 | Backtest workflow state machine |
Phase 4: Monitoring¶
| Module | File | Task ID | Description |
|---|---|---|---|
| CloudWatch Alarms | cloudwatch_alarms.py | MN003, INF007 | Composite alarm, heartbeat detection, Lambda errors |
| CloudWatch Dashboard | cloudwatch_dashboard.py | OB001 | Trading platform metrics dashboard |
Dependency Graph¶
┌─────────────────┐
│ pulumi_backend │
└────────┬────────┘
│
┌───────────────────────┼───────────────────────┐
│ │ │
┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
│ vpc │ │ s3 │ │ dynamodb │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
┌──────────┼──────────┐ │ │
│ │ │ ┌──────▼──────┐ │
┌───▼───┐ ┌────▼────┐ ┌───▼───┐│ cloudtrail │ │
│ vpc │ │ nacl │ │ vpc ││ │ │
│ endpt │ │ │ │ flow │└─────────────┘ │
└───────┘ └─────────┘ │ logs │ │
└───────┘ │
│ │
┌──────▼──────┐ │
│security_grps│ │
└──────┬──────┘ │
│ │
┌──────────┼──────────┬───────────────────┐ │
│ │ │ │ │
┌───▼───┐ ┌────▼────┐ ┌───▼───┐ ┌─────▼─────┐ │
│ nat │ │ rds │ │ alb │ │ sns │────────┤
│ inst │ └────┬────┘ └───┬───┘ └─────┬─────┘ │
└───────┘ │ │ │ │
│ │ │ │
┌──────▼──────┐ │ ┌──────▼──────┐ │
│secret_rotat │ │ │cw_alarms │ │
└─────────────┘ │ └─────────────┘ │
│ │
┌──────────┼──────────┐ │
│ │ │ │
┌──────▼──────┐ │ ┌──────▼──────┐ │
│ iam │ │ │ cognito │ │
└──────┬──────┘ │ └──────┬──────┘ │
│ │ │ │
┌──────▼──────┐ │ │ │
│ ecs │───┼──────────┤ │
└──────┬──────┘ │ │ │
│ │ │ │
┌──────────┼──────────┤ │ │
│ │ │ │ │
┌───▼───┐ ┌────▼────┐ ┌───▼───────────▼───┐ │
│ ecs │ │ lambda │ │ api_gateway │ │
│ srvcs │ │ funcs │ └─────────┬─────────┘ │
└───┬───┘ └────┬────┘ │ │
│ │ ┌──────▼──────┐ │
│ │ │ waf │ │
│ │ └─────────────┘ │
│ │ │
└──────────┼────────────────────────────────────────────┘
│
┌──────▼──────┐
│step_functs │
└─────────────┘
Module Details¶
vpc.py (IF002)¶
Creates: - VPC with CIDR 10.0.0.0/16 - 6 subnets across 2 AZs (public, private, database) - Internet Gateway - Route tables per tier
Outputs:
vpc_id: str
public_subnet_ids: list[str]
private_subnet_ids: list[str]
database_subnet_ids: list[str]
private_route_table_id: str
database_route_table_id: str
Usage:
security_groups.py (IF004)¶
Creates 5 security groups:
| SG | Ingress | Egress | Purpose |
|---|---|---|---|
| ALB | 80, 443 from 0.0.0.0/0 | All | Load balancer |
| ECS | From ALB SG | All | Container traffic |
| Lambda | None | All | Function networking |
| RDS | 5432 from ECS SG | All | Database access |
| NAT | From private CIDR | All | Outbound internet |
Outputs:
s3.py (IS001)¶
Creates 5 buckets:
| Bucket | Purpose | Lifecycle |
|---|---|---|
| configs | Strategy configurations | None |
| results | Backtest results | 90-day expiration |
| arcticdb | Time-series data | None |
| logs | Application logs | 30-day expiration |
| mlflow | MLflow artifacts | None |
Features: - AES-256 encryption - Versioning enabled - Public access blocked - Lifecycle policies
dynamodb.py (IS003)¶
Creates 8 tables:
| Table | Primary Key | Purpose |
|---|---|---|
| workflow-state | job_id | Backtest job tracking |
| idempotency | idempotency_key | Request deduplication |
| health-state | service_id | Service health tracking |
| trading-state | strategy_id | Live trading state |
| deployments | deployment_id | Deployment tracking |
| drift-state | model_id | Model drift tracking |
| retraining-state | model_id | Retraining job state |
| rollback-state | model_id | Model rollback history |
Features: - On-demand billing (pay per request) - Point-in-time recovery enabled - TTL configured where appropriate
ecs.py (IC001, BE007)¶
Creates: - ECS cluster with Fargate + Fargate Spot capacity - Generic strategy task definition - CloudWatch log group
Strategy Task Definition: - Image: Overridden at runtime via ECSBacktestExecutor - CPU: 512 (configurable) - Memory: 1024 (configurable) - Uses Fargate Spot for cost savings
lambda_funcs.py (IC004)¶
Creates 8 container-image Lambdas:
| Function | Schedule | Purpose |
|---|---|---|
| health-check | rate(5 minutes) | Service health monitoring |
| heartbeat-check | rate(1 minute) | Trading heartbeat detection |
| orphan-scanner | rate(15 minutes) | Orphaned ECS task cleanup |
| drift-monitor | rate(1 day) | Model drift detection |
| retraining-scheduler | rate(7 days) | Retraining triggers |
| sqs-consumer | SQS trigger | Backtest queue processing |
| validate-strategy | On-demand | Strategy validation |
| data-proxy | On-demand | Data collection proxy |
Features: - VPC placement in private subnets - Environment variables from config - Container images from ECR
api_gateway.py (IC005)¶
Creates: - HTTP API Gateway - 11 routes with ALB integration - Cognito JWT authorizer - Optional custom domain
Routes:
| Method | Path | Auth | Target |
|---|---|---|---|
| GET | /health | No | Backend |
| POST | /api/v1/backtests | Yes | Backend |
| GET | /api/v1/backtests | Yes | Backend |
| GET | /api/v1/backtests/{id} | Yes | Backend |
| GET | /api/v1/strategies | Yes | Strategy Service |
| POST | /api/v1/strategies/* | Yes | Strategy Service |
| GET | /api/v1/data/* | Yes | Data Collection |
| POST | /api/v1/hyperopt | Yes | Strategy Service |
| GET | /api/v1/models/* | Yes | Strategy Service |
| POST | /api/v1/models/* | Yes | Strategy Service |
| GET | /api/v1/catalog/* | Yes | Backend |
step_functions.py (IO002, BE008)¶
Creates: - Backtest workflow state machine - IAM execution role
Workflow States: 1. ValidateStrategy - Lambda validation 2. DecideExecutionMode - Choice state 3. RunBacktest - ECS task 4. Notify - Success/failure handling
Type: STANDARD (supports 2+ hour executions)
cloudwatch_alarms.py (MN003, INF007)¶
Creates: - Composite alarm for service health - Stale heartbeat alarm - Lambda error alarms (per function) - EventBridge rules for Lambda schedules
Configurable Thresholds:
pulumi config set alarm_latency_threshold 5000
pulumi config set alarm_min_strategies 1
pulumi config set alarm_stale_threshold 1
Environment-Specific Behavior¶
| Resource | Dev | Staging | Prod |
|---|---|---|---|
| RDS Instance | db.t4g.micro | db.t4g.small | db.t4g.small |
| RDS Multi-AZ | No | No | Yes |
| NAT Gateway | Instance | Instance | NAT Gateway |
| ECS Replicas | 1 | 1 | 2+ |
| Log Retention | 7 days | 30 days | 90 days |
| Fargate Spot | Yes | Yes | No (for live) |
Outputs Quick Reference¶
# Core
pulumi stack output vpc_id
pulumi stack output ecs_cluster_name
pulumi stack output api_gateway_endpoint
# Database
pulumi stack output rds_endpoint
pulumi stack output rds_secret_arn
# Storage
pulumi stack output s3_bucket_ids
pulumi stack output ecr_repository_urls
# Auth
pulumi stack output cognito_user_pool_id
pulumi stack output cognito_user_pool_client_id
# Monitoring
pulumi stack output composite_alarm_arn
pulumi stack output dashboard_url