TradAI CI/CD Pipeline Architecture¶
Version: 1.0.0 | Date: 2026-03-28 | Status: CURRENT Source: .github/workflows/, justfile
TL;DR: GitHub Actions (sole CI/CD). 11 workflows with path-based change detection and dynamic test matrix. Tag-based deployment gates:
v*tags trigger docker-build, Lambda deploy, and library publish -- all gated behind CI success viaworkflow_run. 4-stack Pulumi deployment viajust infra-bootstrap. Manualworkflow_dispatchavailable for emergency bypasses.
1. Pipeline Overview¶
flowchart LR
subgraph Triggers
PR[Pull Request]
Push[Push to main]
Tag["Tag v*"]
Manual[workflow_dispatch]
Schedule[Weekly / Sunday]
end
subgraph CI["CI Gate (ci.yml)"]
Changes[Detect Changes]
Lint[Lint & Format]
Type[Type Check]
Test[Test Matrix]
Security[Security Scan]
Perf[Performance Tests]
Contract[Contract Tests]
end
subgraph Deploy["Deployment Workflows"]
Docker[Docker Build & Push]
Lambda[Deploy Lambdas]
Publish[Publish Libraries]
Infra[Deploy Infrastructure]
Docs[Deploy Documentation]
end
subgraph Targets
ECR[Amazon ECR]
ECS[ECS Services]
LambdaFn[Lambda Functions]
CA[CodeArtifact]
CF[Cloudflare Pages]
Pulumi[Pulumi Stacks]
end
PR --> CI
Push --> CI
Tag --> CI
Schedule --> CI
CI -->|workflow_run + v* tag| Docker
CI -->|workflow_run + v* tag| Lambda
CI -->|workflow_run + v* tag| Publish
Manual --> Infra
Manual --> Lambda
Push -->|docs paths| Docs
Docker --> ECR --> ECS
Lambda --> ECR --> LambdaFn
Publish --> CA
Infra --> Pulumi
Docs --> CF 2. GitHub Actions Workflows¶
11 workflow files in .github/workflows/:
| Workflow | File | Trigger | Purpose | Timeout |
|---|---|---|---|---|
| CI | ci.yml | push (main), PR, tag v*, weekly, dispatch | Orchestrator: change detection, lint, typecheck, test matrix, security, perf, contract | varies |
| Lint | _lint.yml | workflow_call (reusable) | Ruff check + format (called by CI) | 5 min |
| Test Package | _test.yml | workflow_call (reusable) | Per-package pytest with coverage (called by CI matrix) | 20 min |
| Docker Build & Push | docker-build.yml | workflow_run (CI success) | Build 4 service images, push to ECR, redeploy ECS | -- |
| Deploy Lambdas | deploy-lambdas.yml | workflow_run (CI success), dispatch | 5-stage: version, wheel, base image, individual lambdas, update functions | -- |
| Publish Libraries | publish-libs.yml | workflow_run (CI success) | Build tradai-strategy wheel, publish to CodeArtifact | -- |
| Deploy Infrastructure | deploy-infra.yml | PR (infra/**), dispatch | Validate, preview (PR), deploy (manual) via pulumi-ci.sh | 60 min |
| Deploy Documentation | docs.yml | push (main, docs paths), dispatch | Build MkDocs, deploy to Cloudflare Pages | -- |
| Devcontainer CI | devcontainer-ci.yml | weekly (Sunday 02:00), dispatch | Full test suite inside devcontainer image | 30 min |
| Devcontainer Prebuild | devcontainer-prebuild.yml | push (main, .devcontainer/**), weekly | Build and push devcontainer image to GHCR | 30 min |
| Docs Freshness | docs-freshness.yml | scheduled, dispatch | Check documentation freshness against codebase | -- |
3. CI Gate Structure¶
The CI workflow (ci.yml) is the quality gate. Deployment workflows only fire when CI passes on a v* tag.
sequenceDiagram
participant Dev as Developer
participant GH as GitHub
participant CI as CI Workflow
participant Deploy as Deployment Workflows
Dev->>GH: Push PR
GH->>CI: Trigger CI (PR)
CI->>CI: Detect Changes (dorny/paths-filter)
CI->>CI: Build Test Matrix
par Parallel Jobs
CI->>CI: Lint & Format (Ruff)
CI->>CI: Type Check (MyPy, 7 packages)
CI->>CI: Security Scan (pip-audit + Bandit)
CI->>CI: Performance Tests (PR only)
CI->>CI: Contract Tests (PR + schedule)
end
CI->>CI: Test Matrix (up to 8 packages, parallel)
CI-->>Dev: Status checks on PR
Dev->>GH: Merge + Tag v1.2.3
GH->>CI: Trigger CI (tag)
CI->>CI: Full matrix (all 8 packages)
CI-->>GH: CI completed (success)
GH->>Deploy: workflow_run event
par Tag Deployments
Deploy->>Deploy: Docker Build & Push (4 services)
Deploy->>Deploy: Deploy Lambdas (17 Dockerfile-based functions)
Deploy->>Deploy: Publish Libraries (CodeArtifact)
end
Deploy->>Deploy: Redeploy ECS Services Tag-Based Gating
Deployment workflows use workflow_run with a condition: github.event.workflow_run.conclusion == 'success' && startsWith(github.event.workflow_run.head_branch, 'v'). This ensures only successful CI runs on version tags trigger deployments.
4. Change Detection and Test Matrix¶
The CI workflow uses path-based change detection (dorny/paths-filter@v3) to avoid running all 8 package test suites on every PR. Each filter includes the package's own source plus the specific tradai-common submodules it imports. Heavy consumers (backend, strategy-service, cli) watch all of tradai-common/**; light consumers (data, strategy, data-collection) watch only their specific deps.
8 filters: deps (workspace config -- triggers all), common, data, strategy, backend, data-collection, strategy-service, cli.
On PRs, only affected packages run. On push/schedule/dispatch, all 8 matrix entries run (7 packages at 60% coverage threshold, cli at 45%, plus integration tests at 0%). Integration tests run when common changes or 2+ packages are affected.
5. Deployment Flows¶
5.1 Lambda Deployment (5-Stage Pipeline)¶
The deploy-lambdas.yml workflow builds and deploys 18 Lambda functions as container images.
flowchart TB
subgraph Stage1["Stage 1: Version"]
V[Calculate Version<br/>tag or manual-YYYYMMDDHHMMSS]
end
subgraph Stage2["Stage 2: Wheel"]
W[Build tradai-common wheel<br/>uv build libs/tradai-common]
end
subgraph Stage3["Stage 3: Base Image"]
B[Build lambda-base<br/>lambdas/base/Dockerfile]
end
subgraph Stage4["Stage 4: Individual Lambdas"]
direction LR
L1[backtest-consumer]
L2[drift-monitor]
L3[health-check]
L4[update-status]
LN["... 14 more"]
end
subgraph Stage5["Stage 5: Update Functions"]
U[aws lambda update-function-code<br/>for each function]
end
V --> Stage2
Stage2 --> Stage3
Stage3 --> Stage4
Stage4 --> Stage5 17 Lambda functions auto-discovered from lambdas/*/Dockerfile (backtest-consumer, drift-monitor, health-check, update-status, sqs-consumer, validate-strategy, and 11 more -- plus base shared image). The 18th Lambda (update-nat-routes) is an inline Python handler deployed directly via Pulumi (no Dockerfile), so it is not part of the lambda-bootstrap pipeline.
Local equivalent:
just lambda-bootstrap # Full pipeline: wheel -> base -> all lambdas -> ECR push
just lambda-build-all # Build only (no push)
just lambda-push-all # Push pre-built images to ECR
5.2 Service Deployment (Docker Build & Push)¶
The docker-build.yml workflow builds 4 service images and redeploys ECS.
flowchart LR
subgraph Build["Parallel Builds"]
B1[backend<br/>services/backend/Dockerfile]
B2[data-collection<br/>services/data-collection/Dockerfile]
B3[strategy-service<br/>services/strategy-service/Dockerfile]
B4[mlflow<br/>services/mlflow/Dockerfile]
end
subgraph Push["ECR Push"]
ECR["ECR Registry<br/>:version + :latest tags"]
end
subgraph Redeploy["ECS Redeploy"]
ECS["aws ecs update-service<br/>--force-new-deployment"]
end
Build --> Push --> Redeploy ECS service names follow the pattern tradai-{service}-{env} with backend-api (not backend) matching infra/config.py.
Local equivalent:
just docker-build # Build all 4 service images (linux/amd64)
just service-push-all # Build + push all to ECR
just ecs-force-deploy-all # Force ECS redeployment (strategy-service, dry-run-trading, live-trading)
CI vs Local Redeployment Targets
The docker-build.yml CI workflow redeploys backend-api, data-collection, strategy-service, and mlflow (the 4 services it builds). The just ecs-force-deploy-all command targets only strategy-service, dry-run-trading, and live-trading (the trading services). To redeploy all services locally, use just ecs-force-deploy <service> for each service individually.
5.3 Infrastructure Deployment (4-Stack Pulumi)¶
The deploy-infra.yml workflow manages 4 Pulumi stacks deployed in strict order.
flowchart TB
subgraph Validate
V[Lint + Unit Tests<br/>all 4 stacks]
end
subgraph PR["PR: Preview"]
P1["Preview dev"]
P2["Preview staging"]
P3["Preview prod"]
PS["Post PR Summary<br/>(create/update/delete counts)"]
end
subgraph Manual["Manual: Deploy"]
D1["persistent<br/>S3, DynamoDB, ECR, Cognito, CodeArtifact"]
D2["foundation<br/>VPC, RDS, SQS, SNS"]
D3["Lambda Bootstrap<br/>(just lambda-bootstrap)"]
D4["compute<br/>ALB, ECS, Lambda, Step Functions"]
D5["edge<br/>API Gateway, WAF, CloudWatch"]
end
Validate --> PR
Validate --> Manual
P1 --> PS
P2 --> PS
P3 --> PS
D1 --> D2 --> D3 --> D4 --> D5 Deployment Order
The compute stack has a pre-flight check that verifies all Lambda images exist in ECR before deploying. If images are missing, it fails fast with instructions to run just lambda-bootstrap first.
Local equivalent:
just infra-bootstrap dev # Full: persistent -> foundation -> lambda-bootstrap -> service-push -> compute -> edge
just infra-up-foundation dev # Single stack
just infra-preview dev # Preview all 4 stacks
just infra-recover foundation dev # Recovery: cancel + refresh + preview drift
Deploy script: infra/pulumi-ci.sh handles layer iteration, backend login, stack selection, and pulumi preview/pulumi up for any combination of layers and environments.
5.4 Library Publishing¶
The publish-libs.yml workflow publishes tradai-strategy to AWS CodeArtifact for use by the separate tradai-strategies repository.
# CI pipeline steps:
uv build libs/tradai-strategy --out-dir dist # Build wheel
twine upload --repository-url $CODEARTIFACT_URL ... # Publish to CodeArtifact
Local equivalent:
just codeartifact-login dev # Configure pip/twine auth (12h expiry)
just publish-all-libs dev # Build + publish tradai-strategy
5.5 Documentation Deployment¶
Triggered by pushes to docs/**, mkdocs.yml, lib/service READMEs, and architecture reports. Builds MkDocs with Material theme and deploys to Cloudflare Pages.
6. Emergency Procedures¶
Manual Dispatch Bypass
The workflow_dispatch trigger on deploy-lambdas.yml and deploy-infra.yml bypasses the CI gate. Use only for emergency hotfixes when CI is broken or blocking a critical deploy.
- Emergency Lambda deploy: Actions > Deploy Lambdas > Run workflow (select env). Or locally:
just lambda-bootstrap. - Emergency ECS hotfix:
just ecs-force-deploy-all(or single service:just ecs-force-deploy strategy-service). - Rollback: Check out previous tag, rebuild and push:
git checkout v1.2.2 && just docker-build && just service-push-all && just ecs-force-deploy-all. For Lambdas:just lambda-bootstrap. - Infrastructure recovery:
just infra-recover foundation dev(cancels pending ops, refreshes state, previews drift).deploy-infra.ymlalso prints recovery instructions on failure.
7. Changelog¶
| Date | Change | Author |
|---|---|---|
| 2026-03-28 | Initial document | Architecture team |
8. Dependencies¶
This document relates to:
- 02-ARCHITECTURE-OVERVIEW.md -- System architecture and service topology
- 04-SECURITY.md -- Security controls and secrets management
- 05-SERVICES.md -- Service definitions (backend, data-collection, strategy-service, mlflow)
.github/workflows/-- All 11 workflow definitionsjustfile-- Local development and deployment commandsinfra/pulumi-ci.sh-- Pulumi deployment script (4-stack orchestration).github/actions/setup-workspace/-- Shared composite action for CI setup