Skip to content

CI/CD Pipeline Guide

Complete guide covering the CI/CD pipelines for both the tradai-uv platform (this repo) and the tradai-strategies repository. All CI/CD runs on GitHub Actions.


Pipeline Overview

Platform Pipeline (tradai-uv)

graph TD
    T1["Push to main / PR"] --> CI1["CI (ci.yml)<br/>changes → lint, typecheck,<br/>test, security, perf (PR)"]
    T2["Tag v*"] --> CI2["CI (ci.yml)<br/>lint → typecheck → test → security-scan"]
    CI2 -->|"workflow_run<br/>(CI success)"| DB["docker-build<br/>(4 services)"]
    CI2 -->|"workflow_run<br/>(CI success)"| DL["deploy-lambdas<br/>(auto-discover)"]
    CI2 -->|"workflow_run<br/>(CI success)"| PL["publish-libs<br/>(CodeArtifact)"]

Strategy Pipeline (tradai-strategies)

┌─────────────────────────────────────────────────────────────────────┐
│                        Pull Request                                 │
├─────────────────────────────────────────────────────────────────────┤
│  lint  →  typecheck  →  unit-tests  →  smoke-backtest              │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│                     Main Branch Merge                               │
├─────────────────────────────────────────────────────────────────────┤
│  lint  →  typecheck  →  unit-tests  →  smoke-backtest              │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│                     Tag Release (*-v*)                              │
├─────────────────────────────────────────────────────────────────────┤
│  lint → typecheck → tests → full-backtest → docker → ECR → MLflow  │
└─────────────────────────────────────────────────────────────────────┘

GitHub Actions Workflows

All workflow files live in .github/workflows/. Reusable workflows are prefixed with _.

1. CI (ci.yml)

Triggers: Push to main, tags v*, pull requests, weekly schedule, manual dispatch

Path filtering: Uses dorny/paths-filter@v3 to build a dynamic test matrix. Each package filter lists its own source plus the tradai-common submodules it imports. This avoids the "common changed so test everything" cascade.

graph LR
    changes --> MB["matrix-builder"] --> test["test (dynamic matrix)"]
    changes --> lint["lint (_lint.yml reusable)"]
    changes --> typecheck
    changes --> sec["security-scan<br/>(pip-audit + Bandit SAST)"]
    changes --> perf["performance-test<br/>(PR only)"]
    changes --> contract["contract-tests<br/>(PR + schedule)"]
    changes --> integ["integration tests<br/>(schedule/dispatch only)"]
Job Description Duration
changes Detect changed paths per package ~15 sec
matrix-builder Build dynamic test matrix from changes ~5 sec
lint Ruff linter + formatter (reusable _lint.yml) ~30 sec
typecheck MyPy type checking (all packages) ~1 min
test Pytest with 60% coverage per package (matrix) ~3 min
security-scan pip-audit + Bandit SAST ~30 sec
performance-test Benchmark tests (PR only) ~2 min
contract-tests API contract tests with Docker services ~8 min

Workflow Coordination (CI Gate)

Deployment workflows use workflow_run trigger to gate on CI success:

Workflow Trigger Condition
Docker Build CI success + tag v* workflow_run completed
Publish Libs CI success + tag v* workflow_run completed
Deploy Lambdas CI success + tag v* workflow_run completed

This prevents deploying code that failed CI checks.

sequenceDiagram
    actor Dev as Developer
    participant GH as GitHub
    participant CI as CI Workflow<br/>(ci.yml)
    participant Docker as Docker Build<br/>(docker-build.yml)
    participant Lambdas as Deploy Lambdas<br/>(deploy-lambdas.yml)
    participant Publish as Publish Libs<br/>(publish-libs.yml)

    Dev->>GH: git push origin v1.2.0 (tag)
    GH->>CI: Trigger on tag push (v*)

    CI->>CI: changes (path detection)
    CI->>CI: lint (Ruff)
    CI->>CI: typecheck (mypy)
    CI->>CI: test (pytest matrix)
    CI->>CI: security-scan (pip-audit + Bandit)

    CI-->>GH: CI completed (success)

    par workflow_run triggers (parallel)
        GH->>Docker: workflow_run: CI success + tag v*
        Docker->>Docker: Build 4 service images
        Docker->>Docker: Push to ECR
        Docker->>Docker: Force redeploy ECS services
    and
        GH->>Lambdas: workflow_run: CI success + tag v*
        Lambdas->>Lambdas: Build tradai-common wheel
        Lambdas->>Lambdas: Build base image
        Lambdas->>Lambdas: Discover + build all lambdas
        Lambdas->>Lambdas: Update Lambda functions
    and
        GH->>Publish: workflow_run: CI success + tag v*
        Publish->>Publish: Build tradai-strategy wheel
        Publish->>Publish: Publish to CodeArtifact
    end

2. Docker Build (docker-build.yml)

Triggers: workflow_run on CI (tag v* only)

Builds and pushes Docker images for all services to ECR, then redeploys ECS:

Service Image Name Dockerfile
backend tradai/backend services/backend/Dockerfile
data-collection tradai/data-collection services/data-collection/Dockerfile
strategy-service tradai/strategy-service services/strategy-service/Dockerfile
mlflow tradai/mlflow services/mlflow/Dockerfile

Images are tagged with the version (e.g., v1.2.3) and latest. After push, the workflow force-redeploys ECS services (backend-api, data-collection, strategy-service, mlflow).

Version extraction: Uses github.event.workflow_run.head_branch (not GITHUB_REF_NAME) because workflow_run context does not have the tag in GITHUB_REF_NAME.

3. Deploy Lambdas (deploy-lambdas.yml)

Triggers: workflow_run on CI (tag v*), manual dispatch

Dynamically discovers and deploys all Lambda functions:

version ──→ build-wheel ──→ build-base ──→ discover ──→ build-lambdas ──→ update-functions
   │                            │              │              │
   │                            │              │              └─ Matrix: all lambdas
   │                            │              └─ Finds lambdas/*/Dockerfile
   │                            └─ Base image with tradai-common
   └─ Calculates version ONCE (prevents race condition)

Lambda Discovery: Automatically finds all directories in lambdas/ with a Dockerfile (excluding base/). No config update needed when adding new lambdas.

Manual Dispatch: Select environment (dev, staging, prod) for emergency deploys that bypass the CI gate.

4. Publish Libraries (publish-libs.yml)

Triggers: workflow_run on CI (tag v*)

Publishes the tradai-strategy library wheel to AWS CodeArtifact (tradai-python-dev repository) for consumption by the strategies repository:

# The strategies repo consumes this via:
pip install tradai-strategy --index-url https://...codeartifact.../pypi/tradai-python-dev/simple/

5. Deploy Infrastructure (deploy-infra.yml)

Triggers: Pull requests (paths: infra/**), manual dispatch

Input Options Description
stack dev, staging, prod Target environment
command preview, up Pulumi command

PR behavior: Runs pulumi preview on all stacks (dev, staging, prod) in parallel via matrix, then posts a summary comment on the PR showing create/update/delete/replace counts per stack.

Manual deployment: Select stack and command from the Actions UI. Uses infra/pulumi-ci.sh to orchestrate all Pulumi layers.

6. Other Workflows

Workflow File Purpose
Deploy Documentation docs.yml Build MkDocs and deploy to Cloudflare Pages on main push
Devcontainer Prebuild devcontainer-prebuild.yml Build and push devcontainer image to GHCR weekly + on .devcontainer/ changes
Devcontainer CI devcontainer-ci.yml Weekly test suite run inside devcontainer to catch environment drift
Docs Freshness docs-freshness.yml Documentation freshness checks (schedule)

Strategy CI/CD Pipeline (tradai-strategies)

Note: The CI/CD workflows below run in the tradai-strategies repository, not in the platform repo.

The proprietary strategies repository has its own CI/CD pipeline. This section documents the expected workflow for that separate repo.

Pipeline Stages

Stage 1: Lint

lint:
  script:
    - just lint $STRATEGY_NAME

Checks: Ruff formatting, linting, import sorting.

Failure fix: just fmt $STRATEGY_NAME

Stage 2: Type Check

typecheck:
  script:
    - just typecheck $STRATEGY_NAME

Checks: mypy strict type checking, Protocol compliance.

Failure fix: uv run mypy $STRATEGY_NAME --show-error-codes

Stage 3: Unit Tests

test:
  script:
    - just test $STRATEGY_NAME

Requirements: All tests pass, coverage >= 80%.

Stage 4: Smoke Backtest

smoke-backtest:
  script:
    - just backtest-smoke $STRATEGY_NAME

30-day timerange, single pair (BTC/USDT:USDT), basic validation, ~5 minutes.

Stage 5: Full Backtest (Release Only)

full-backtest:
  script:
    - just backtest-full $STRATEGY_NAME --timerange 20240101-20241201

12-month timerange, all configured pairs, full metrics validation:

Metric Threshold
Sharpe Ratio >= 1.0
Profit Factor >= 1.2
Max Drawdown <= 20%
Total Trades >= 50

Stage 6: Docker Build & ECR Push

The strategy is packaged as a Docker image using Freqtrade as the base:

FROM freqtradeorg/freqtrade:stable

COPY . /app
RUN pip install /app
COPY configs /freqtrade/user_data/configs

ENTRYPOINT ["freqtrade", "trade"]

Built and pushed to ECR with version and latest tags.

Stage 7: MLflow Registration

mlflow-register:
  script:
    - just mlflow-register $STRATEGY_NAME $TAG

Registers model version with backtest metrics, Docker image URI, and strategy metadata.

Strategy GitHub Actions Workflow

The tradai-strategies repo uses this workflow (in its own .github/workflows/strategy-ci.yml):

name: Strategy CI/CD

on:
  pull_request:
    paths:
      - 'strategies/**'
  push:
    branches: [main]
    tags:
      - '*-v*'

env:
  AWS_REGION: eu-central-1

jobs:
  detect-strategies:
    runs-on: ubuntu-latest
    outputs:
      strategies: ${{ steps.detect.outputs.strategies }}
    steps:
      - uses: actions/checkout@v4
      - id: detect
        run: |
          STRATEGIES=$(git diff --name-only ${{ github.event.before }} ${{ github.sha }} \
            | grep '^strategies/' | cut -d'/' -f2 | sort -u \
            | jq -R -s -c 'split("\n")[:-1]')
          echo "strategies=$STRATEGIES" >> $GITHUB_OUTPUT

  lint-test:
    needs: detect-strategies
    runs-on: ubuntu-latest
    strategy:
      matrix:
        strategy: ${{ fromJson(needs.detect-strategies.outputs.strategies) }}
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      - run: pip install uv
      - run: cd strategies/${{ matrix.strategy }} && uv sync
      - run: just lint ${{ matrix.strategy }}
      - run: just typecheck ${{ matrix.strategy }}
      - run: just test ${{ matrix.strategy }}

  smoke-backtest:
    needs: lint-test
    runs-on: ubuntu-latest
    strategy:
      matrix:
        strategy: ${{ fromJson(needs.detect-strategies.outputs.strategies) }}
    steps:
      - uses: actions/checkout@v4
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: ${{ env.AWS_REGION }}
      - name: CodeArtifact Login
        run: |
          export CODEARTIFACT_TOKEN=$(aws codeartifact get-authorization-token \
            --domain tradai --query authorizationToken --output text)
          pip config set global.extra-index-url \
            "https://aws:${CODEARTIFACT_TOKEN}@tradai-${AWS_REGION}.d.codeartifact.${AWS_REGION}.amazonaws.com/pypi/tradai-python-dev/simple/"
      - run: just backtest-smoke ${{ matrix.strategy }}

  release:
    if: startsWith(github.ref, 'refs/tags/')
    needs: smoke-backtest
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Extract strategy from tag
        id: extract
        run: |
          TAG=${GITHUB_REF#refs/tags/}
          STRATEGY=$(echo $TAG | sed 's/-v[0-9].*//')
          VERSION=$(echo $TAG | sed 's/.*-v//')
          echo "strategy=$STRATEGY" >> $GITHUB_OUTPUT
          echo "version=$VERSION" >> $GITHUB_OUTPUT
          echo "tag=$TAG" >> $GITHUB_OUTPUT
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: ${{ env.AWS_REGION }}
      - run: just backtest-full ${{ steps.extract.outputs.strategy }}
      - name: Build Docker
        run: |
          docker build \
            -t ${{ secrets.ECR_REGISTRY }}/${{ steps.extract.outputs.strategy }}:${{ steps.extract.outputs.tag }} \
            -t ${{ secrets.ECR_REGISTRY }}/${{ steps.extract.outputs.strategy }}:latest \
            strategies/${{ steps.extract.outputs.strategy }}
      - name: Push to ECR
        run: |
          aws ecr get-login-password | docker login --username AWS --password-stdin ${{ secrets.ECR_REGISTRY }}
          docker push ${{ secrets.ECR_REGISTRY }}/${{ steps.extract.outputs.strategy }}:${{ steps.extract.outputs.tag }}
          docker push ${{ secrets.ECR_REGISTRY }}/${{ steps.extract.outputs.strategy }}:latest
      - run: just mlflow-register ${{ steps.extract.outputs.strategy }} ${{ steps.extract.outputs.tag }}

Required Secrets

GitHub Actions Secrets (tradai-uv)

Configure in GitHub Settings, then Secrets and variables, then Actions:

Secret Description Used by
AWS_ACCESS_KEY_ID AWS access key All deployment workflows
AWS_SECRET_ACCESS_KEY AWS secret key All deployment workflows
AWS_REGION AWS region (e.g., eu-central-1) All deployment workflows
AWS_ACCOUNT_ID AWS account ID deploy-lambdas
PULUMI_CONFIG_PASSPHRASE Pulumi encryption passphrase deploy-infra
S3_PULUMI_BACKEND_URL Pulumi state backend (e.g., s3://tradai-pulumi-state) deploy-infra
CLOUDFLARE_API_TOKEN Cloudflare Pages API token docs
CLOUDFLARE_ACCOUNT_ID Cloudflare account ID docs

GitHub Actions Secrets (tradai-strategies)

Secret Description Used by
AWS_ACCESS_KEY_ID ECR/CodeArtifact access smoke-backtest, release
AWS_SECRET_ACCESS_KEY ECR/CodeArtifact access smoke-backtest, release
ECR_REGISTRY ECR registry URL release
MLFLOW_TRACKING_URI MLflow server URL release

Versioning Convention

Tag Format

Platform (tradai-uv): v<semver> (e.g., v1.0.0, v0.0.0-test)

Strategies (tradai-strategies): <strategy-slug>-v<semver> (e.g., momentum-v1.0.0, trend-following-v2.1.3, ml-rsi-v1.0.0-rc1)

Version Incrementing

Change Type Example Version Bump
Bug fix Fix signal calculation 1.0.0 → 1.0.1
New indicator Add MACD 1.0.1 → 1.1.0
Breaking change New config format 1.1.0 → 2.0.0

CodeArtifact Integration

Setup

# Login (12-hour token)
just codeartifact-login

# Or manually
export CODEARTIFACT_TOKEN=$(aws codeartifact get-authorization-token \
  --domain tradai \
  --query authorizationToken \
  --output text)

pip config set global.extra-index-url \
  "https://aws:${CODEARTIFACT_TOKEN}@tradai-eu-central-1.d.codeartifact.eu-central-1.amazonaws.com/pypi/tradai-python-dev/simple/"

Publishing tradai-strategy

When tradai-strategy package is updated in tradai-uv:

cd libs/tradai-strategy
uv build
twine upload --repository codeartifact dist/*

In CI, this is handled by publish-libs.yml which publishes to the tradai-python-dev CodeArtifact repository using twine upload --skip-existing.

Consuming in Strategies

# strategies/my-strategy/pyproject.toml
[project]
dependencies = [
    "tradai-strategy>=1.0.0",  # From CodeArtifact
    "freqtrade>=2025.6",
]

Adding New Services

  1. Create service directory: services/<name>/
  2. Add Dockerfile: services/<name>/Dockerfile
  3. Update docker-build.yml matrix:
matrix:
  include:
    # ... existing services
    - service: new-service
      dockerfile: services/new-service/Dockerfile
      image_name: tradai/new-service

Adding New Lambdas

Lambdas are auto-discovered. Just create:

lambdas/
└── new-lambda/
    ├── Dockerfile
    └── handler.py

The next tag release will automatically build and deploy it.


Manual Commands

Local Development

# Full CI locally
just check                    # lint + typecheck + test

# Individual stages
just lint
just typecheck
just test
just fmt                      # Auto-fix formatting

Strategy Development (in tradai-strategies)

# Full CI locally
just ci momentum-strategy

# Individual stages
just lint momentum-strategy
just typecheck momentum-strategy
just test momentum-strategy
just backtest-smoke momentum-strategy

Release Process (tradai-uv)

# 1. Ensure main is up to date
git checkout main && git pull

# 2. Tag and push
git tag v1.1.0
git push origin --tags

Release Process (tradai-strategies)

# 1. Ensure main is up to date
git checkout main && git pull

# 2. Update version in pyproject.toml
# version = "1.0.0" -> "1.1.0"

# 3. Commit
git add .
git commit -m "chore(momentum-strategy): bump version to 1.1.0"
git push

# 4. Create and push tag
git tag momentum-strategy-v1.1.0
git push --tags

E2E Testing the Deployment Pipeline

Push a version tag to trigger the full chain: CI, then Docker Build, Deploy Lambdas, and Publish Libraries.

Why run E2E tests

The three deployment workflows (docker-build, deploy-lambdas, publish-libs) only trigger on v* tags, not on PRs or pushes to main. Without periodic E2E testing, bugs in deployment workflows go undetected until a real release.

Examples of bugs caught by E2E testing (issue #93):

  • Version extraction bug: Workflows used GITHUB_REF_NAME which returns main instead of the tag in workflow_run context. Docker images were tagged main instead of the version.
  • Non-existent environment: Workflows targeted prod resources that had never been provisioned. publish-libs failed with ResourceNotFoundException.

How to run

git tag v0.0.0-test
git push origin v0.0.0-test

CI runs first. When it passes, three workflows trigger automatically via workflow_run.

What to check

Open the Actions tab and verify each workflow:

publish-libs.yml:

Step What to look for Failure means
Get CodeArtifact repository endpoint tradai-python-dev resolved CodeArtifact repo missing or IAM issue
Build tradai-strategy wheel Wheel built in dist/ Build error in libs/tradai-strategy
Publish to CodeArtifact twine upload exit 0 Auth token or network error

deploy-lambdas.yml:

Step What to look for Failure means
Calculate Version VERSION=v0.0.0-test, environment: dev Version extraction broken
Build tradai-common wheel Wheel artifact uploaded Build error in libs/tradai-common
Build Lambda Base Image Image pushed to ECR Dockerfile or ECR permissions
Build lambdas (matrix) All lambda images pushed Individual lambda Dockerfile issue
Update Lambda Functions Updated successfully or Function does not exist (skipping) IAM permissions for lambda:UpdateFunctionCode

docker-build.yml:

Step What to look for Failure means
Extract version from tag VERSION=v0.0.0-test (not main) Version extraction regression
Build and push Docker image All 4 service images pushed Dockerfile or ECR issue
Force new deployment (ECS) Service not found is OK in dev In prod: ECS name mismatch

What gets created in AWS

Resource What happens Cleanup needed?
Docker images in ECR Tagged as v0.0.0-test + latest No -- latest overwrites on next deploy
Lambda images in ECR Same as above No
Lambda function code update-function-code points to new image No -- next deploy overwrites
CodeArtifact wheel Published with --skip-existing No -- idempotent
ECS service force-new-deployment No -- in dev this is a no-op

Cleanup

Delete the test git tag only:

git push origin --delete v0.0.0-test
git tag -d v0.0.0-test

Cost

These workflows only trigger on v* tags. PRs only run CI with smart change detection. A full E2E test tag run takes approximately 15 minutes of GitHub Actions compute.


Troubleshooting

Pipeline Failures

Stage Common Issue Resolution
Lint Formatting issues just fmt (platform) or just fmt $STRATEGY (strategies)
Typecheck Missing types Add type annotations, run uv run mypy --show-error-codes
Test Fixture missing Check conftest.py
Backtest Data missing Sync data first
Docker Build fails Check Dockerfile, verify build context
ECR Push denied Check AWS credentials and IAM permissions
MLflow Register fails Check MLflow connection
Lambda discovery No lambdas found Ensure lambdas/<name>/Dockerfile exists

CodeArtifact Token Expired

# Re-login
just codeartifact-login

# CI: Token is auto-refreshed in pipeline

ECR Push Permission Denied

# Check AWS credentials
aws sts get-caller-identity

# Verify ECR permissions
aws ecr describe-repositories --repository-names tradai/$SERVICE

# Required IAM permissions
# ecr:GetAuthorizationToken, ecr:BatchCheckLayerAvailability, ecr:PutImage

Cache Issues

# GitHub Actions uses UV cache at ~/.cache/uv
# Cache key includes uv.lock hash -- changing uv.lock invalidates cache automatically

Pulumi State Lock

# If Pulumi reports state lock:
pulumi cancel  # Cancel stuck operation
# Or manually unlock in S3

VERSION=main instead of tag

In workflow_run context, GITHUB_REF_NAME returns main, not the tag. Workflows must use github.event.workflow_run.head_branch to extract the version. If you see images tagged main instead of a version, check this extraction logic.

Failed Lambda Discovery

# Check lambdas directory structure
ls -la lambdas/

# Each lambda needs:
# lambdas/<name>/Dockerfile
# lambdas/<name>/handler.py (or similar entry point)