Pulumi Operations Guide¶
Quick reference for daily operations and adding infrastructure. For setup/deployment basics, see 08-pulumi-deployment.md.
Daily Commands¶
just infra-preview dev # See what would change (all 4 stacks)
just infra-preview-persistent dev # Preview persistent stack
just infra-preview-foundation dev # Preview foundation stack
just infra-bootstrap dev # Deploy all stacks in order
just infra-up-persistent dev # Deploy persistent stack (data resources)
just infra-up-foundation dev # Deploy foundation stack (networking)
just infra-outputs persistent dev # Persistent outputs (S3, DDB, ECR, Cognito)
just infra-outputs foundation dev # Foundation outputs (VPC, RDS, SQS)
just infra-outputs compute dev # Compute outputs (ALB DNS, cluster, etc.)
just infra-down-soft dev # Destroy edge+compute only (data preserved)
Service Access¶
Consolidated EC2 (Dev/Staging)¶
# Find instance ID
aws ec2 describe-instances \
--filters "Name=tag:Name,Values=tradai-consolidated-dev" "Name=instance-state-name,Values=running" \
--query 'Reservations[].Instances[].InstanceId' --output text \
--profile tradai --region eu-central-1
# SSM Session Manager (interactive shell — no SSH keys needed)
aws ssm start-session --target <instance-id> --profile tradai --region eu-central-1
# Quick command via SSM (non-interactive)
aws ssm send-command --instance-ids <instance-id> \
--document-name "AWS-RunShellScript" \
--parameters 'commands=["sudo docker ps"]' \
--profile tradai --region eu-central-1
On the EC2 Instance¶
sudo docker ps # Running containers
sudo docker logs backend-api --tail 50 -f # Service logs
sudo docker stats --no-stream # Resource usage
cd /opt/tradai && sudo docker-compose restart # Restart all services
sudo docker-compose restart backend-api # Restart one service
sudo docker-compose pull && sudo docker-compose up -d # Pull latest images
ECS Fargate (Production)¶
just ecs-status # Service status
just ecs-events backend # Recent events
just ecs-force-deploy backend # Force redeployment
just ecs-force-deploy-all # Redeploy all services
Logs¶
# CloudWatch (consolidated EC2 containers)
aws logs tail /tradai/consolidated/containers --since 30m \
--profile tradai --region eu-central-1
# Follow logs in real time
aws logs tail /tradai/consolidated/containers --follow \
--profile tradai --region eu-central-1
Service Endpoints¶
# Get ALB DNS name
just infra-outputs compute dev | jq -r '.alb_dns_name'
# Health checks
curl -s http://<ALB_DNS>/api/health | jq .
curl -s http://<ALB_DNS>/strategy/health | jq .
curl -s http://<ALB_DNS>/data/health | jq .
Image Management¶
just service-ecr-login # Login to ECR
just service-push backend # Push one service image
just service-push-all # Push all service images
just lambda-check-images # Verify Lambda images in ECR
just ami-list # List available custom AMIs
just asg-refresh # Trigger EC2 instance refresh
just asg-status # Check ASG health
CI/CD Pipeline¶
PR opened → pulumi preview (comment on PR)
↓
Merge to main → deploy dev → deploy staging
↓
Manual trigger → deploy prod (requires approval)
Required secrets: PULUMI_CONFIG_PASSPHRASE, S3_PULUMI_BACKEND_URL, AWS credentials.
Adding Infrastructure (Quant Trader)¶
You DON'T need infra changes for:¶
- Deploying new strategies (use strategy-service)
- Running backtests (existing ECS handles this)
- Viewing MLflow experiments
- Using existing data sources
You DO need to request infra for:¶
- New exchange API credentials -> Secrets Manager entry
- Custom storage (S3/DynamoDB) -> New bucket/table
- More compute for ML training -> Adjust ECS task resources
- New external integrations -> Network/IAM changes
How to Request¶
- Create issue with: what you need, why (trading capability), size estimate
- Platform team reviews for existing alternatives
- If approved: PR to
infra/, deploy through CI
Quick Troubleshooting¶
| Error | Fix |
|---|---|
no Pulumi.yaml | cd infra/foundation/ (or compute/edge) |
passphrase must be set | source infra/.env or set PULUMI_CONFIG_PASSPHRASE |
stack not found | pulumi stack init dev |
| State lock stuck | pulumi cancel (safe) |
| SSM not connecting | Check instance is running + SSM agent installed |
| S3/dnf 403 on EC2 | VPC endpoint policy needs AllowSystemPackageBuckets |
| Docker containers not starting | SSM into instance, check sudo docker-compose logs |
Don't Do This¶
- Don't create AWS resources via Console (causes drift)
- Don't hardcode resource names (use
config.pyhelpers) - Don't store secrets in code (use Secrets Manager)
- Don't run
pulumi up --yeswithout reviewing preview - Don't deploy compute before foundation (images won't exist)
- Don't SSH into EC2 (use SSM Session Manager instead)