Skip to content

Pulumi Operations Guide

Quick reference for daily operations and adding infrastructure. For setup/deployment basics, see 08-pulumi-deployment.md.

Daily Commands

just infra-preview dev               # See what would change (all 4 stacks)
just infra-preview-persistent dev    # Preview persistent stack
just infra-preview-foundation dev    # Preview foundation stack
just infra-bootstrap dev             # Deploy all stacks in order
just infra-up-persistent dev         # Deploy persistent stack (data resources)
just infra-up-foundation dev         # Deploy foundation stack (networking)
just infra-outputs persistent dev    # Persistent outputs (S3, DDB, ECR, Cognito)
just infra-outputs foundation dev    # Foundation outputs (VPC, RDS, SQS)
just infra-outputs compute dev       # Compute outputs (ALB DNS, cluster, etc.)
just infra-down-soft dev             # Destroy edge+compute only (data preserved)

Service Access

Consolidated EC2 (Dev/Staging)

# Find instance ID
aws ec2 describe-instances \
  --filters "Name=tag:Name,Values=tradai-consolidated-dev" "Name=instance-state-name,Values=running" \
  --query 'Reservations[].Instances[].InstanceId' --output text \
  --profile tradai --region eu-central-1

# SSM Session Manager (interactive shell — no SSH keys needed)
aws ssm start-session --target <instance-id> --profile tradai --region eu-central-1

# Quick command via SSM (non-interactive)
aws ssm send-command --instance-ids <instance-id> \
  --document-name "AWS-RunShellScript" \
  --parameters 'commands=["sudo docker ps"]' \
  --profile tradai --region eu-central-1

On the EC2 Instance

sudo docker ps                                   # Running containers
sudo docker logs backend-api --tail 50 -f        # Service logs
sudo docker stats --no-stream                    # Resource usage
cd /opt/tradai && sudo docker-compose restart     # Restart all services
sudo docker-compose restart backend-api           # Restart one service
sudo docker-compose pull && sudo docker-compose up -d  # Pull latest images

ECS Fargate (Production)

just ecs-status                     # Service status
just ecs-events backend             # Recent events
just ecs-force-deploy backend       # Force redeployment
just ecs-force-deploy-all           # Redeploy all services

Logs

# CloudWatch (consolidated EC2 containers)
aws logs tail /tradai/consolidated/containers --since 30m \
  --profile tradai --region eu-central-1

# Follow logs in real time
aws logs tail /tradai/consolidated/containers --follow \
  --profile tradai --region eu-central-1

Service Endpoints

# Get ALB DNS name
just infra-outputs compute dev | jq -r '.alb_dns_name'

# Health checks
curl -s http://<ALB_DNS>/api/health | jq .
curl -s http://<ALB_DNS>/strategy/health | jq .
curl -s http://<ALB_DNS>/data/health | jq .

Image Management

just service-ecr-login              # Login to ECR
just service-push backend           # Push one service image
just service-push-all               # Push all service images
just lambda-check-images            # Verify Lambda images in ECR
just ami-list                       # List available custom AMIs
just asg-refresh                    # Trigger EC2 instance refresh
just asg-status                     # Check ASG health

CI/CD Pipeline

PR opened → pulumi preview (comment on PR)
Merge to main → deploy dev → deploy staging
Manual trigger → deploy prod (requires approval)

Required secrets: PULUMI_CONFIG_PASSPHRASE, S3_PULUMI_BACKEND_URL, AWS credentials.

Adding Infrastructure (Quant Trader)

You DON'T need infra changes for:

  • Deploying new strategies (use strategy-service)
  • Running backtests (existing ECS handles this)
  • Viewing MLflow experiments
  • Using existing data sources

You DO need to request infra for:

  • New exchange API credentials -> Secrets Manager entry
  • Custom storage (S3/DynamoDB) -> New bucket/table
  • More compute for ML training -> Adjust ECS task resources
  • New external integrations -> Network/IAM changes

How to Request

  1. Create issue with: what you need, why (trading capability), size estimate
  2. Platform team reviews for existing alternatives
  3. If approved: PR to infra/, deploy through CI

Quick Troubleshooting

Error Fix
no Pulumi.yaml cd infra/foundation/ (or compute/edge)
passphrase must be set source infra/.env or set PULUMI_CONFIG_PASSPHRASE
stack not found pulumi stack init dev
State lock stuck pulumi cancel (safe)
SSM not connecting Check instance is running + SSM agent installed
S3/dnf 403 on EC2 VPC endpoint policy needs AllowSystemPackageBuckets
Docker containers not starting SSM into instance, check sudo docker-compose logs

Don't Do This

  • Don't create AWS resources via Console (causes drift)
  • Don't hardcode resource names (use config.py helpers)
  • Don't store secrets in code (use Secrets Manager)
  • Don't run pulumi up --yes without reviewing preview
  • Don't deploy compute before foundation (images won't exist)
  • Don't SSH into EC2 (use SSM Session Manager instead)