Skip to content

Pulumi Module Reference

Complete reference for all infrastructure modules across the 4 stack-specific directories:

  • infra/persistent/modules/ -- Data-bearing resources (never destroyed)
  • infra/foundation/modules/ -- Networking and supporting infrastructure
  • infra/compute/modules/ -- Compute resources (ECS, Lambda, IAM)
  • infra/edge/modules/ -- API Gateway, WAF, monitoring

Module Inventory

Persistent Stack (infra/persistent/modules/)

Module File Task ID Description
Pulumi Backend pulumi_backend.py - S3 bucket for Pulumi state, IAM roles for CI/CD
S3 Buckets s3.py IS001 5 buckets: configs, results, arcticdb, logs, mlflow
DynamoDB Tables dynamodb.py IS003 12 tables for workflow/health/trading/drift/config state
ECR Repositories ecr.py IS004 24 repos: 6 services + 18 Lambda images
Cognito Auth cognito.py DK005 User pool with MFA, M2M client
CodeArtifact codeartifact.py SR003 Private Python package repository
CloudTrail cloudtrail.py SEC003 Audit logging to S3 and CloudWatch

Foundation Stack (infra/foundation/modules/)

Module File Task ID Description
VPC Network vpc.py IF002 VPC, 6 subnets (2 AZs, 3 tiers), IGW, route tables
Security Groups security_groups.py IF004 5 SGs: ALB, ECS, Lambda, RDS, NAT
NAT Instance nat_instance.py IF003 t4g.nano NAT with ASG for HA
Network ACLs nacl.py SEC005 Stateless firewall rules per subnet tier
VPC Endpoints vpc_endpoints.py SEC002 Gateway endpoints (S3, DynamoDB) + Interface endpoints (ECR, STS, Secrets Manager, CloudWatch Logs, SSM, SQS)
VPC Flow Logs vpc_flow_logs.py SEC004 Flow logs to CloudWatch for audit
RDS Database rds.py IS002 PostgreSQL (v15.13) for MLflow
SQS Queues sqs.py IO001 Backtest queue + DLQ
SNS Topics sns.py MN001, SR016 Alert notifications and registration events
Secret Rotation secret_rotation.py SEC006 RDS secret rotation (30-day schedule)

Compute Stack (infra/compute/modules/)

Module File Task ID Description
IAM Roles iam.py IC001 ECS execution role + task role
ECS Cluster ecs.py IC001, BE007 Fargate cluster, strategy task definition
ECS Services ecs_services.py IC003 7 ECS services + Service Discovery
ALB alb.py IC002 Application Load Balancer, listeners, target groups
Lambda Functions lambda_funcs.py IC004 17 container-image Lambdas + EventBridge schedules
Step Functions step_functions.py IO002, BE008 Backtest workflow state machine
EC2 Consolidated ec2_consolidated.py - Consolidated EC2 for dev/staging
EC2 Userdata ec2_userdata.py - EC2 instance bootstrap script

Edge Stack (infra/edge/modules/)

Module File Task ID Description
API Gateway api_gateway.py IC005 HTTP API with 11 routes, Cognito auth
WAF waf.py SEC001 WebACL with rate limiting
CloudWatch Alarms cloudwatch_alarms.py MN003, INF007 Composite alarm, heartbeat detection, Lambda errors
CloudWatch Dashboard cloudwatch_dashboard.py OB001 Trading platform metrics dashboard

Dependency Graph

graph TD
    subgraph persistent["Persistent Stack (infra/persistent)"]
        pulumi_backend[pulumi_backend]
        s3[s3]
        dynamodb[dynamodb]
        ecr[ecr]
        cognito[cognito]
        codeartifact[codeartifact]
        cloudtrail[cloudtrail]
    end

    subgraph foundation["Foundation Stack (infra/foundation)"]
        vpc[vpc]
        vpc_flow_logs[vpc_flow_logs]
        nacl[nacl]
        security_groups[security_groups]
        vpc_endpoints[vpc_endpoints]
        nat_instance[nat_instance]
        rds[rds]
        sns[sns]
        sqs[sqs]
        secret_rotation[secret_rotation]
    end

    subgraph compute["Compute Stack (infra/compute)"]
        iam[iam]
        ecs[ecs]
        alb[alb]
        ecs_services[ecs_services]
        lambda_funcs[lambda_funcs]
        step_functions[step_functions]
        ec2_consolidated[ec2_consolidated]
    end

    subgraph edge["Edge Stack (infra/edge)"]
        api_gateway[api_gateway]
        waf[waf]
        cw_alarms[cloudwatch_alarms]
        cw_dashboard[cloudwatch_dashboard]
    end

    %% Persistent: cloudtrail logs to S3
    s3 --> cloudtrail

    %% Foundation internal dependencies
    vpc --> vpc_flow_logs
    vpc --> nacl
    vpc --> security_groups
    vpc --> vpc_endpoints
    vpc --> nat_instance
    vpc --> rds
    security_groups --> nat_instance
    security_groups --> vpc_endpoints
    security_groups --> rds
    rds --> secret_rotation
    sns --> secret_rotation

    %% Compute depends on persistent + foundation via StackReference
    ecr -.-> ecs_services
    s3 -.-> ecs_services
    dynamodb -.-> lambda_funcs
    cognito -.-> api_gateway
    vpc -.-> alb
    vpc -.-> ecs_services
    vpc -.-> lambda_funcs
    security_groups -.-> alb
    security_groups -.-> ecs_services
    security_groups -.-> lambda_funcs
    sqs -.-> lambda_funcs
    sns -.-> lambda_funcs
    rds -.-> ecs_services

    %% Compute internal dependencies
    iam --> ecs
    iam --> ecs_services
    iam --> lambda_funcs
    ecs --> ecs_services
    ecs --> lambda_funcs
    alb --> ecs_services
    lambda_funcs --> step_functions
    step_functions --> ec2_consolidated
    ecs_services --> ec2_consolidated
    alb --> ec2_consolidated

    %% Edge depends on compute via StackReference
    alb -.-> api_gateway
    api_gateway --> waf
    sns -.-> cw_alarms
    step_functions -.-> cw_alarms

Legend: Solid arrows (-->) are intra-stack dependencies. Dashed arrows (-.->) are cross-stack references via pulumi.StackReference.


Module Details

vpc.py (IF002)

Creates: - VPC with CIDR 10.0.0.0/16 - 6 subnets across 2 AZs (public, private, database) - Internet Gateway - Route tables per tier

Outputs:

vpc_id: str
public_subnet_ids: list[str]
private_subnet_ids: list[str]
database_subnet_ids: list[str]
private_route_table_id: str
database_route_table_id: str

Usage:

vpc = VpcNetwork()
pulumi.export("vpc_id", vpc.vpc.id)


security_groups.py (IF004)

Creates 5 security groups:

SG Ingress Egress Purpose
ALB 80, 443 from 0.0.0.0/0 All Load balancer
ECS From ALB SG All Container traffic
Lambda None All Function networking
RDS 5432 from ECS SG All Database access
NAT From private CIDR All Outbound internet

Outputs:

alb_sg_id: str
ecs_sg_id: str
lambda_sg_id: str
rds_sg_id: str
nat_sg_id: str


s3.py (IS001)

Creates 5 buckets:

Bucket Purpose Lifecycle
configs Strategy configurations None
results Backtest results 90-day expiration
arcticdb Time-series data None
logs Application logs 30-day expiration
mlflow MLflow artifacts None

Features: - AES-256 encryption - Versioning enabled - Public access blocked - Lifecycle policies


dynamodb.py (IS003)

Creates 12 tables:

Table Primary Key Purpose
workflow-state job_id Backtest job tracking
idempotency idempotency_key Request deduplication
health-state service_id Service health tracking
trading-state strategy_id Live trading state
deployments deployment_id Deployment tracking
drift-state model_id Model drift tracking
retraining-state model_id Retraining job state
rollback-state model_id Model rollback history
notifications notification_id Notification state
shadow-test-state test_id A/B test shadow testing
infra-drift-state resource_id Infrastructure drift detection
config-versions config_id Versioned configuration

Features: - On-demand billing (pay per request) - Point-in-time recovery enabled - TTL configured where appropriate


ecs.py (IC001, BE007)

Creates: - ECS cluster with Fargate + Fargate Spot capacity - Generic strategy task definition - CloudWatch log group

Strategy Task Definition: - Image: Overridden at runtime via ECSBacktestExecutor - CPU: 512 (configurable) - Memory: 1024 (configurable) - Uses Fargate Spot for cost savings


lambda_funcs.py (IC004)

Creates 17 container-image Lambdas in the compute stack:

Function Schedule Purpose
backtest-consumer SQS trigger Consume backtest requests, launch ECS tasks
sqs-consumer SQS trigger Consume retraining messages, launch ECS tasks
health-check rate(5 minutes) Service health monitoring
trading-heartbeat-check rate(5 minutes) Trading heartbeat detection
orphan-scanner rate(15 minutes) Orphaned ECS task cleanup
drift-monitor rate(1 day) Model drift detection
retraining-scheduler rate(7 days) Retraining triggers
pulumi-drift-detector rate(1 day) Infrastructure drift detection
validate-strategy On-demand Strategy validation
data-collection-proxy On-demand Data collection proxy
update-status On-demand Job status updates in DynamoDB
cleanup-resources On-demand Stop orphaned ECS tasks on failure
notify-completion On-demand Send pipeline completion notifications
check-retraining-needed On-demand Evaluate drift state for retraining
compare-models On-demand Compare champion vs challenger models
promote-model On-demand Promote challenger to Production in MLflow
model-rollback On-demand Roll back model to previous version

Note: update-nat-routes is an inline runtime Lambda in the foundation stack (infra/foundation/modules/nat_instance.py), not a container-image Lambda in compute. It is triggered by an ASG Lifecycle Hook to update the private route table on NAT replacement.

Features: - VPC placement in private subnets - Environment variables from config - Container images from ECR


api_gateway.py (IC005)

Creates: - HTTP API Gateway - 11 routes with ALB integration - Cognito JWT authorizer - Optional custom domain

Routes:

Method Path Auth Target
GET /health No Backend
POST /api/v1/backtests Yes Backend
GET /api/v1/backtests Yes Backend
GET /api/v1/backtests/{id} Yes Backend
GET /api/v1/strategies Yes Strategy Service
POST /api/v1/strategies/* Yes Strategy Service
GET /api/v1/data/* Yes Data Collection
POST /api/v1/hyperopt Yes Strategy Service
GET /api/v1/models/* Yes Strategy Service
POST /api/v1/models/* Yes Strategy Service
GET /api/v1/catalog/* Yes Backend

step_functions.py (IO002, BE008)

Creates: - Backtest workflow state machine - IAM execution role

Workflow States: 12 states in backtest workflow, 10 in retraining workflow. See 06-STEP-FUNCTIONS.md for complete state reference including ValidateStrategy, EvaluateValidation, EnsureData, UpdateStatusRunning, RunBacktest, HandleTimeout, CleanupResources, HandleSuccess, and error paths.

Type: STANDARD (supports 2+ hour executions)


cloudwatch_alarms.py (MN003, INF007)

Creates: - Composite alarm for service health - Stale heartbeat alarm - Lambda error alarms (per function) - EventBridge rules for Lambda schedules

Configurable Thresholds:

pulumi config set alarm_latency_threshold 5000
pulumi config set alarm_min_strategies 1
pulumi config set alarm_stale_threshold 1


Environment-Specific Behavior

Resource Dev Staging Prod
RDS Instance db.t4g.micro db.t4g.micro db.t4g.small
RDS Multi-AZ No No Yes
NAT Gateway Instance Instance NAT Gateway
ECS Replicas 1 1 2+
Log Retention 7 days 30 days 90 days
Fargate Spot Yes Yes No (for live)

Outputs Quick Reference

# Core
pulumi stack output vpc_id
pulumi stack output ecs_cluster_name
pulumi stack output api_gateway_endpoint

# Database
pulumi stack output rds_endpoint
pulumi stack output rds_secret_arn

# Storage
pulumi stack output s3_bucket_ids
pulumi stack output ecr_repository_urls

# Auth
pulumi stack output cognito_user_pool_id
pulumi stack output cognito_user_pool_client_id

# Monitoring
pulumi stack output composite_alarm_arn
pulumi stack output dashboard_url

See Also