Pulumi Module Reference¶
Complete reference for all infrastructure modules across the 4 stack-specific directories:
infra/persistent/modules/-- Data-bearing resources (never destroyed)infra/foundation/modules/-- Networking and supporting infrastructureinfra/compute/modules/-- Compute resources (ECS, Lambda, IAM)infra/edge/modules/-- API Gateway, WAF, monitoring
Module Inventory¶
Persistent Stack (infra/persistent/modules/)¶
| Module | File | Task ID | Description |
|---|---|---|---|
| Pulumi Backend | pulumi_backend.py | - | S3 bucket for Pulumi state, IAM roles for CI/CD |
| S3 Buckets | s3.py | IS001 | 5 buckets: configs, results, arcticdb, logs, mlflow |
| DynamoDB Tables | dynamodb.py | IS003 | 12 tables for workflow/health/trading/drift/config state |
| ECR Repositories | ecr.py | IS004 | 24 repos: 6 services + 18 Lambda images |
| Cognito Auth | cognito.py | DK005 | User pool with MFA, M2M client |
| CodeArtifact | codeartifact.py | SR003 | Private Python package repository |
| CloudTrail | cloudtrail.py | SEC003 | Audit logging to S3 and CloudWatch |
Foundation Stack (infra/foundation/modules/)¶
| Module | File | Task ID | Description |
|---|---|---|---|
| VPC Network | vpc.py | IF002 | VPC, 6 subnets (2 AZs, 3 tiers), IGW, route tables |
| Security Groups | security_groups.py | IF004 | 5 SGs: ALB, ECS, Lambda, RDS, NAT |
| NAT Instance | nat_instance.py | IF003 | t4g.nano NAT with ASG for HA |
| Network ACLs | nacl.py | SEC005 | Stateless firewall rules per subnet tier |
| VPC Endpoints | vpc_endpoints.py | SEC002 | Gateway endpoints (S3, DynamoDB) + Interface endpoints (ECR, STS, Secrets Manager, CloudWatch Logs, SSM, SQS) |
| VPC Flow Logs | vpc_flow_logs.py | SEC004 | Flow logs to CloudWatch for audit |
| RDS Database | rds.py | IS002 | PostgreSQL (v15.13) for MLflow |
| SQS Queues | sqs.py | IO001 | Backtest queue + DLQ |
| SNS Topics | sns.py | MN001, SR016 | Alert notifications and registration events |
| Secret Rotation | secret_rotation.py | SEC006 | RDS secret rotation (30-day schedule) |
Compute Stack (infra/compute/modules/)¶
| Module | File | Task ID | Description |
|---|---|---|---|
| IAM Roles | iam.py | IC001 | ECS execution role + task role |
| ECS Cluster | ecs.py | IC001, BE007 | Fargate cluster, strategy task definition |
| ECS Services | ecs_services.py | IC003 | 7 ECS services + Service Discovery |
| ALB | alb.py | IC002 | Application Load Balancer, listeners, target groups |
| Lambda Functions | lambda_funcs.py | IC004 | 17 container-image Lambdas + EventBridge schedules |
| Step Functions | step_functions.py | IO002, BE008 | Backtest workflow state machine |
| EC2 Consolidated | ec2_consolidated.py | - | Consolidated EC2 for dev/staging |
| EC2 Userdata | ec2_userdata.py | - | EC2 instance bootstrap script |
Edge Stack (infra/edge/modules/)¶
| Module | File | Task ID | Description |
|---|---|---|---|
| API Gateway | api_gateway.py | IC005 | HTTP API with 11 routes, Cognito auth |
| WAF | waf.py | SEC001 | WebACL with rate limiting |
| CloudWatch Alarms | cloudwatch_alarms.py | MN003, INF007 | Composite alarm, heartbeat detection, Lambda errors |
| CloudWatch Dashboard | cloudwatch_dashboard.py | OB001 | Trading platform metrics dashboard |
Dependency Graph¶
graph TD
subgraph persistent["Persistent Stack (infra/persistent)"]
pulumi_backend[pulumi_backend]
s3[s3]
dynamodb[dynamodb]
ecr[ecr]
cognito[cognito]
codeartifact[codeartifact]
cloudtrail[cloudtrail]
end
subgraph foundation["Foundation Stack (infra/foundation)"]
vpc[vpc]
vpc_flow_logs[vpc_flow_logs]
nacl[nacl]
security_groups[security_groups]
vpc_endpoints[vpc_endpoints]
nat_instance[nat_instance]
rds[rds]
sns[sns]
sqs[sqs]
secret_rotation[secret_rotation]
end
subgraph compute["Compute Stack (infra/compute)"]
iam[iam]
ecs[ecs]
alb[alb]
ecs_services[ecs_services]
lambda_funcs[lambda_funcs]
step_functions[step_functions]
ec2_consolidated[ec2_consolidated]
end
subgraph edge["Edge Stack (infra/edge)"]
api_gateway[api_gateway]
waf[waf]
cw_alarms[cloudwatch_alarms]
cw_dashboard[cloudwatch_dashboard]
end
%% Persistent: cloudtrail logs to S3
s3 --> cloudtrail
%% Foundation internal dependencies
vpc --> vpc_flow_logs
vpc --> nacl
vpc --> security_groups
vpc --> vpc_endpoints
vpc --> nat_instance
vpc --> rds
security_groups --> nat_instance
security_groups --> vpc_endpoints
security_groups --> rds
rds --> secret_rotation
sns --> secret_rotation
%% Compute depends on persistent + foundation via StackReference
ecr -.-> ecs_services
s3 -.-> ecs_services
dynamodb -.-> lambda_funcs
cognito -.-> api_gateway
vpc -.-> alb
vpc -.-> ecs_services
vpc -.-> lambda_funcs
security_groups -.-> alb
security_groups -.-> ecs_services
security_groups -.-> lambda_funcs
sqs -.-> lambda_funcs
sns -.-> lambda_funcs
rds -.-> ecs_services
%% Compute internal dependencies
iam --> ecs
iam --> ecs_services
iam --> lambda_funcs
ecs --> ecs_services
ecs --> lambda_funcs
alb --> ecs_services
lambda_funcs --> step_functions
step_functions --> ec2_consolidated
ecs_services --> ec2_consolidated
alb --> ec2_consolidated
%% Edge depends on compute via StackReference
alb -.-> api_gateway
api_gateway --> waf
sns -.-> cw_alarms
step_functions -.-> cw_alarms Legend: Solid arrows (
-->) are intra-stack dependencies. Dashed arrows (-.->) are cross-stack references viapulumi.StackReference.
Module Details¶
vpc.py (IF002)¶
Creates: - VPC with CIDR 10.0.0.0/16 - 6 subnets across 2 AZs (public, private, database) - Internet Gateway - Route tables per tier
Outputs:
vpc_id: str
public_subnet_ids: list[str]
private_subnet_ids: list[str]
database_subnet_ids: list[str]
private_route_table_id: str
database_route_table_id: str
Usage:
security_groups.py (IF004)¶
Creates 5 security groups:
| SG | Ingress | Egress | Purpose |
|---|---|---|---|
| ALB | 80, 443 from 0.0.0.0/0 | All | Load balancer |
| ECS | From ALB SG | All | Container traffic |
| Lambda | None | All | Function networking |
| RDS | 5432 from ECS SG | All | Database access |
| NAT | From private CIDR | All | Outbound internet |
Outputs:
s3.py (IS001)¶
Creates 5 buckets:
| Bucket | Purpose | Lifecycle |
|---|---|---|
| configs | Strategy configurations | None |
| results | Backtest results | 90-day expiration |
| arcticdb | Time-series data | None |
| logs | Application logs | 30-day expiration |
| mlflow | MLflow artifacts | None |
Features: - AES-256 encryption - Versioning enabled - Public access blocked - Lifecycle policies
dynamodb.py (IS003)¶
Creates 12 tables:
| Table | Primary Key | Purpose |
|---|---|---|
| workflow-state | job_id | Backtest job tracking |
| idempotency | idempotency_key | Request deduplication |
| health-state | service_id | Service health tracking |
| trading-state | strategy_id | Live trading state |
| deployments | deployment_id | Deployment tracking |
| drift-state | model_id | Model drift tracking |
| retraining-state | model_id | Retraining job state |
| rollback-state | model_id | Model rollback history |
| notifications | notification_id | Notification state |
| shadow-test-state | test_id | A/B test shadow testing |
| infra-drift-state | resource_id | Infrastructure drift detection |
| config-versions | config_id | Versioned configuration |
Features: - On-demand billing (pay per request) - Point-in-time recovery enabled - TTL configured where appropriate
ecs.py (IC001, BE007)¶
Creates: - ECS cluster with Fargate + Fargate Spot capacity - Generic strategy task definition - CloudWatch log group
Strategy Task Definition: - Image: Overridden at runtime via ECSBacktestExecutor - CPU: 512 (configurable) - Memory: 1024 (configurable) - Uses Fargate Spot for cost savings
lambda_funcs.py (IC004)¶
Creates 17 container-image Lambdas in the compute stack:
| Function | Schedule | Purpose |
|---|---|---|
| backtest-consumer | SQS trigger | Consume backtest requests, launch ECS tasks |
| sqs-consumer | SQS trigger | Consume retraining messages, launch ECS tasks |
| health-check | rate(5 minutes) | Service health monitoring |
| trading-heartbeat-check | rate(5 minutes) | Trading heartbeat detection |
| orphan-scanner | rate(15 minutes) | Orphaned ECS task cleanup |
| drift-monitor | rate(1 day) | Model drift detection |
| retraining-scheduler | rate(7 days) | Retraining triggers |
| pulumi-drift-detector | rate(1 day) | Infrastructure drift detection |
| validate-strategy | On-demand | Strategy validation |
| data-collection-proxy | On-demand | Data collection proxy |
| update-status | On-demand | Job status updates in DynamoDB |
| cleanup-resources | On-demand | Stop orphaned ECS tasks on failure |
| notify-completion | On-demand | Send pipeline completion notifications |
| check-retraining-needed | On-demand | Evaluate drift state for retraining |
| compare-models | On-demand | Compare champion vs challenger models |
| promote-model | On-demand | Promote challenger to Production in MLflow |
| model-rollback | On-demand | Roll back model to previous version |
Note:
update-nat-routesis an inline runtime Lambda in the foundation stack (infra/foundation/modules/nat_instance.py), not a container-image Lambda in compute. It is triggered by an ASG Lifecycle Hook to update the private route table on NAT replacement.
Features: - VPC placement in private subnets - Environment variables from config - Container images from ECR
api_gateway.py (IC005)¶
Creates: - HTTP API Gateway - 11 routes with ALB integration - Cognito JWT authorizer - Optional custom domain
Routes:
| Method | Path | Auth | Target |
|---|---|---|---|
| GET | /health | No | Backend |
| POST | /api/v1/backtests | Yes | Backend |
| GET | /api/v1/backtests | Yes | Backend |
| GET | /api/v1/backtests/{id} | Yes | Backend |
| GET | /api/v1/strategies | Yes | Strategy Service |
| POST | /api/v1/strategies/* | Yes | Strategy Service |
| GET | /api/v1/data/* | Yes | Data Collection |
| POST | /api/v1/hyperopt | Yes | Strategy Service |
| GET | /api/v1/models/* | Yes | Strategy Service |
| POST | /api/v1/models/* | Yes | Strategy Service |
| GET | /api/v1/catalog/* | Yes | Backend |
step_functions.py (IO002, BE008)¶
Creates: - Backtest workflow state machine - IAM execution role
Workflow States: 12 states in backtest workflow, 10 in retraining workflow. See 06-STEP-FUNCTIONS.md for complete state reference including ValidateStrategy, EvaluateValidation, EnsureData, UpdateStatusRunning, RunBacktest, HandleTimeout, CleanupResources, HandleSuccess, and error paths.
Type: STANDARD (supports 2+ hour executions)
cloudwatch_alarms.py (MN003, INF007)¶
Creates: - Composite alarm for service health - Stale heartbeat alarm - Lambda error alarms (per function) - EventBridge rules for Lambda schedules
Configurable Thresholds:
pulumi config set alarm_latency_threshold 5000
pulumi config set alarm_min_strategies 1
pulumi config set alarm_stale_threshold 1
Environment-Specific Behavior¶
| Resource | Dev | Staging | Prod |
|---|---|---|---|
| RDS Instance | db.t4g.micro | db.t4g.micro | db.t4g.small |
| RDS Multi-AZ | No | No | Yes |
| NAT Gateway | Instance | Instance | NAT Gateway |
| ECS Replicas | 1 | 1 | 2+ |
| Log Retention | 7 days | 30 days | 90 days |
| Fargate Spot | Yes | Yes | No (for live) |
Outputs Quick Reference¶
# Core
pulumi stack output vpc_id
pulumi stack output ecs_cluster_name
pulumi stack output api_gateway_endpoint
# Database
pulumi stack output rds_endpoint
pulumi stack output rds_secret_arn
# Storage
pulumi stack output s3_bucket_ids
pulumi stack output ecr_repository_urls
# Auth
pulumi stack output cognito_user_pool_id
pulumi stack output cognito_user_pool_client_id
# Monitoring
pulumi stack output composite_alarm_arn
pulumi stack output dashboard_url
See Also¶
- Pulumi Code -- full infrastructure-as-code walkthrough
- Canonical Config -- all infrastructure configuration values