Skip to content

TradAI CI/CD Pipeline Architecture

Version: 1.0.0 | Date: 2026-03-28 | Status: CURRENT Source: .github/workflows/, justfile

TL;DR: GitHub Actions (sole CI/CD). 11 workflows with path-based change detection and dynamic test matrix. Tag-based deployment gates: v* tags trigger docker-build, Lambda deploy, and library publish -- all gated behind CI success via workflow_run. 4-stack Pulumi deployment via just infra-bootstrap. Manual workflow_dispatch available for emergency bypasses.


1. Pipeline Overview

flowchart LR
    subgraph Triggers
        PR[Pull Request]
        Push[Push to main]
        Tag["Tag v*"]
        Manual[workflow_dispatch]
        Schedule[Weekly / Sunday]
    end

    subgraph CI["CI Gate (ci.yml)"]
        Changes[Detect Changes]
        Lint[Lint & Format]
        Type[Type Check]
        Test[Test Matrix]
        Security[Security Scan]
        Perf[Performance Tests]
        Contract[Contract Tests]
    end

    subgraph Deploy["Deployment Workflows"]
        Docker[Docker Build & Push]
        Lambda[Deploy Lambdas]
        Publish[Publish Libraries]
        Infra[Deploy Infrastructure]
        Docs[Deploy Documentation]
    end

    subgraph Targets
        ECR[Amazon ECR]
        ECS[ECS Services]
        LambdaFn[Lambda Functions]
        CA[CodeArtifact]
        CF[Cloudflare Pages]
        Pulumi[Pulumi Stacks]
    end

    PR --> CI
    Push --> CI
    Tag --> CI
    Schedule --> CI

    CI -->|workflow_run + v* tag| Docker
    CI -->|workflow_run + v* tag| Lambda
    CI -->|workflow_run + v* tag| Publish
    Manual --> Infra
    Manual --> Lambda
    Push -->|docs paths| Docs

    Docker --> ECR --> ECS
    Lambda --> ECR --> LambdaFn
    Publish --> CA
    Infra --> Pulumi
    Docs --> CF

2. GitHub Actions Workflows

11 workflow files in .github/workflows/:

Workflow File Trigger Purpose Timeout
CI ci.yml push (main), PR, tag v*, weekly, dispatch Orchestrator: change detection, lint, typecheck, test matrix, security, perf, contract varies
Lint _lint.yml workflow_call (reusable) Ruff check + format (called by CI) 5 min
Test Package _test.yml workflow_call (reusable) Per-package pytest with coverage (called by CI matrix) 20 min
Docker Build & Push docker-build.yml workflow_run (CI success) Build 4 service images, push to ECR, redeploy ECS --
Deploy Lambdas deploy-lambdas.yml workflow_run (CI success), dispatch 5-stage: version, wheel, base image, individual lambdas, update functions --
Publish Libraries publish-libs.yml workflow_run (CI success) Build tradai-strategy wheel, publish to CodeArtifact --
Deploy Infrastructure deploy-infra.yml PR (infra/**), dispatch Validate, preview (PR), deploy (manual) via pulumi-ci.sh 60 min
Deploy Documentation docs.yml push (main, docs paths), dispatch Build MkDocs, deploy to Cloudflare Pages --
Devcontainer CI devcontainer-ci.yml weekly (Sunday 02:00), dispatch Full test suite inside devcontainer image 30 min
Devcontainer Prebuild devcontainer-prebuild.yml push (main, .devcontainer/**), weekly Build and push devcontainer image to GHCR 30 min
Docs Freshness docs-freshness.yml scheduled, dispatch Check documentation freshness against codebase --

3. CI Gate Structure

The CI workflow (ci.yml) is the quality gate. Deployment workflows only fire when CI passes on a v* tag.

sequenceDiagram
    participant Dev as Developer
    participant GH as GitHub
    participant CI as CI Workflow
    participant Deploy as Deployment Workflows

    Dev->>GH: Push PR
    GH->>CI: Trigger CI (PR)
    CI->>CI: Detect Changes (dorny/paths-filter)
    CI->>CI: Build Test Matrix
    par Parallel Jobs
        CI->>CI: Lint & Format (Ruff)
        CI->>CI: Type Check (MyPy, 7 packages)
        CI->>CI: Security Scan (pip-audit + Bandit)
        CI->>CI: Performance Tests (PR only)
        CI->>CI: Contract Tests (PR + schedule)
    end
    CI->>CI: Test Matrix (up to 8 packages, parallel)
    CI-->>Dev: Status checks on PR

    Dev->>GH: Merge + Tag v1.2.3
    GH->>CI: Trigger CI (tag)
    CI->>CI: Full matrix (all 8 packages)
    CI-->>GH: CI completed (success)

    GH->>Deploy: workflow_run event
    par Tag Deployments
        Deploy->>Deploy: Docker Build & Push (4 services)
        Deploy->>Deploy: Deploy Lambdas (17 Dockerfile-based functions)
        Deploy->>Deploy: Publish Libraries (CodeArtifact)
    end
    Deploy->>Deploy: Redeploy ECS Services

Tag-Based Gating

Deployment workflows use workflow_run with a condition: github.event.workflow_run.conclusion == 'success' && startsWith(github.event.workflow_run.head_branch, 'v'). This ensures only successful CI runs on version tags trigger deployments.


4. Change Detection and Test Matrix

The CI workflow uses path-based change detection (dorny/paths-filter@v3) to avoid running all 8 package test suites on every PR. Each filter includes the package's own source plus the specific tradai-common submodules it imports. Heavy consumers (backend, strategy-service, cli) watch all of tradai-common/**; light consumers (data, strategy, data-collection) watch only their specific deps.

8 filters: deps (workspace config -- triggers all), common, data, strategy, backend, data-collection, strategy-service, cli.

On PRs, only affected packages run. On push/schedule/dispatch, all 8 matrix entries run (7 packages at 60% coverage threshold, cli at 45%, plus integration tests at 0%). Integration tests run when common changes or 2+ packages are affected.


5. Deployment Flows

5.1 Lambda Deployment (5-Stage Pipeline)

The deploy-lambdas.yml workflow builds and deploys 18 Lambda functions as container images.

flowchart TB
    subgraph Stage1["Stage 1: Version"]
        V[Calculate Version<br/>tag or manual-YYYYMMDDHHMMSS]
    end

    subgraph Stage2["Stage 2: Wheel"]
        W[Build tradai-common wheel<br/>uv build libs/tradai-common]
    end

    subgraph Stage3["Stage 3: Base Image"]
        B[Build lambda-base<br/>lambdas/base/Dockerfile]
    end

    subgraph Stage4["Stage 4: Individual Lambdas"]
        direction LR
        L1[backtest-consumer]
        L2[drift-monitor]
        L3[health-check]
        L4[update-status]
        LN["... 14 more"]
    end

    subgraph Stage5["Stage 5: Update Functions"]
        U[aws lambda update-function-code<br/>for each function]
    end

    V --> Stage2
    Stage2 --> Stage3
    Stage3 --> Stage4
    Stage4 --> Stage5

17 Lambda functions auto-discovered from lambdas/*/Dockerfile (backtest-consumer, drift-monitor, health-check, update-status, sqs-consumer, validate-strategy, and 11 more -- plus base shared image). The 18th Lambda (update-nat-routes) is an inline Python handler deployed directly via Pulumi (no Dockerfile), so it is not part of the lambda-bootstrap pipeline.

Local equivalent:

just lambda-bootstrap          # Full pipeline: wheel -> base -> all lambdas -> ECR push
just lambda-build-all          # Build only (no push)
just lambda-push-all           # Push pre-built images to ECR

5.2 Service Deployment (Docker Build & Push)

The docker-build.yml workflow builds 4 service images and redeploys ECS.

flowchart LR
    subgraph Build["Parallel Builds"]
        B1[backend<br/>services/backend/Dockerfile]
        B2[data-collection<br/>services/data-collection/Dockerfile]
        B3[strategy-service<br/>services/strategy-service/Dockerfile]
        B4[mlflow<br/>services/mlflow/Dockerfile]
    end

    subgraph Push["ECR Push"]
        ECR["ECR Registry<br/>:version + :latest tags"]
    end

    subgraph Redeploy["ECS Redeploy"]
        ECS["aws ecs update-service<br/>--force-new-deployment"]
    end

    Build --> Push --> Redeploy

ECS service names follow the pattern tradai-{service}-{env} with backend-api (not backend) matching infra/config.py.

Local equivalent:

just docker-build              # Build all 4 service images (linux/amd64)
just service-push-all          # Build + push all to ECR
just ecs-force-deploy-all      # Force ECS redeployment (strategy-service, dry-run-trading, live-trading)

CI vs Local Redeployment Targets

The docker-build.yml CI workflow redeploys backend-api, data-collection, strategy-service, and mlflow (the 4 services it builds). The just ecs-force-deploy-all command targets only strategy-service, dry-run-trading, and live-trading (the trading services). To redeploy all services locally, use just ecs-force-deploy <service> for each service individually.

5.3 Infrastructure Deployment (4-Stack Pulumi)

The deploy-infra.yml workflow manages 4 Pulumi stacks deployed in strict order.

flowchart TB
    subgraph Validate
        V[Lint + Unit Tests<br/>all 4 stacks]
    end

    subgraph PR["PR: Preview"]
        P1["Preview dev"]
        P2["Preview staging"]
        P3["Preview prod"]
        PS["Post PR Summary<br/>(create/update/delete counts)"]
    end

    subgraph Manual["Manual: Deploy"]
        D1["persistent<br/>S3, DynamoDB, ECR, Cognito, CodeArtifact"]
        D2["foundation<br/>VPC, RDS, SQS, SNS"]
        D3["Lambda Bootstrap<br/>(just lambda-bootstrap)"]
        D4["compute<br/>ALB, ECS, Lambda, Step Functions"]
        D5["edge<br/>API Gateway, WAF, CloudWatch"]
    end

    Validate --> PR
    Validate --> Manual

    P1 --> PS
    P2 --> PS
    P3 --> PS

    D1 --> D2 --> D3 --> D4 --> D5

Deployment Order

The compute stack has a pre-flight check that verifies all Lambda images exist in ECR before deploying. If images are missing, it fails fast with instructions to run just lambda-bootstrap first.

Local equivalent:

just infra-bootstrap dev       # Full: persistent -> foundation -> lambda-bootstrap -> service-push -> compute -> edge
just infra-up-foundation dev   # Single stack
just infra-preview dev         # Preview all 4 stacks
just infra-recover foundation dev  # Recovery: cancel + refresh + preview drift

Deploy script: infra/pulumi-ci.sh handles layer iteration, backend login, stack selection, and pulumi preview/pulumi up for any combination of layers and environments.

5.4 Library Publishing

The publish-libs.yml workflow publishes tradai-strategy to AWS CodeArtifact for use by the separate tradai-strategies repository.

# CI pipeline steps:
uv build libs/tradai-strategy --out-dir dist       # Build wheel
twine upload --repository-url $CODEARTIFACT_URL ... # Publish to CodeArtifact

Local equivalent:

just codeartifact-login dev    # Configure pip/twine auth (12h expiry)
just publish-all-libs dev      # Build + publish tradai-strategy

5.5 Documentation Deployment

Triggered by pushes to docs/**, mkdocs.yml, lib/service READMEs, and architecture reports. Builds MkDocs with Material theme and deploys to Cloudflare Pages.


6. Emergency Procedures

Manual Dispatch Bypass

The workflow_dispatch trigger on deploy-lambdas.yml and deploy-infra.yml bypasses the CI gate. Use only for emergency hotfixes when CI is broken or blocking a critical deploy.

  • Emergency Lambda deploy: Actions > Deploy Lambdas > Run workflow (select env). Or locally: just lambda-bootstrap.
  • Emergency ECS hotfix: just ecs-force-deploy-all (or single service: just ecs-force-deploy strategy-service).
  • Rollback: Check out previous tag, rebuild and push: git checkout v1.2.2 && just docker-build && just service-push-all && just ecs-force-deploy-all. For Lambdas: just lambda-bootstrap.
  • Infrastructure recovery: just infra-recover foundation dev (cancels pending ops, refreshes state, previews drift). deploy-infra.yml also prints recovery instructions on failure.

7. Changelog

Date Change Author
2026-03-28 Initial document Architecture team

8. Dependencies

This document relates to:

  • 02-ARCHITECTURE-OVERVIEW.md -- System architecture and service topology
  • 04-SECURITY.md -- Security controls and secrets management
  • 05-SERVICES.md -- Service definitions (backend, data-collection, strategy-service, mlflow)
  • .github/workflows/ -- All 11 workflow definitions
  • justfile -- Local development and deployment commands
  • infra/pulumi-ci.sh -- Pulumi deployment script (4-stack orchestration)
  • .github/actions/setup-workspace/ -- Shared composite action for CI setup