CI/CD Pipeline Guide¶

Complete guide covering the CI/CD pipelines for both the tradai-uv platform (this repo) and the tradai-strategies repository. All CI/CD runs on GitHub Actions.

Pipeline Overview¶

Platform Pipeline (tradai-uv)¶

graph TD
    T1["Push to main / PR"] --> CI1["CI (ci.yml)<br/>changes → lint, typecheck,<br/>test, security, perf (PR)"]
    T2["Tag v*"] --> CI2["CI (ci.yml)<br/>lint → typecheck → test → security-scan"]
    CI2 -->|"workflow_run<br/>(CI success)"| DB["docker-build<br/>(4 services)"]
    CI2 -->|"workflow_run<br/>(CI success)"| DL["deploy-lambdas<br/>(auto-discover)"]
    CI2 -->|"workflow_run<br/>(CI success)"| PL["publish-libs<br/>(CodeArtifact)"]

Strategy Pipeline (tradai-strategies)¶

┌─────────────────────────────────────────────────────────────────────┐
│                        Pull Request                                 │
├─────────────────────────────────────────────────────────────────────┤
│  lint  →  typecheck  →  unit-tests  →  smoke-backtest              │
└─────────────────────────────────────────────────────────────────────┘
                               ↓
┌─────────────────────────────────────────────────────────────────────┐
│                     Main Branch Merge                               │
├─────────────────────────────────────────────────────────────────────┤
│  lint  →  typecheck  →  unit-tests  →  smoke-backtest              │
└─────────────────────────────────────────────────────────────────────┘
                               ↓
┌─────────────────────────────────────────────────────────────────────┐
│                     Tag Release (*-v*)                              │
├─────────────────────────────────────────────────────────────────────┤
│  lint → typecheck → tests → full-backtest → docker → ECR → MLflow  │
└─────────────────────────────────────────────────────────────────────┘

GitHub Actions Workflows¶

All workflow files live in .github/workflows/. Reusable workflows are prefixed with _.

1. CI (`ci.yml`)¶

Triggers: Push to main, tags v*, pull requests, weekly schedule, manual dispatch

Path filtering: Uses dorny/paths-filter@v3 to build a dynamic test matrix. Each package filter lists its own source plus the tradai-common submodules it imports. This avoids the "common changed so test everything" cascade.

graph LR
    changes --> MB["matrix-builder"] --> test["test (dynamic matrix)"]
    changes --> lint["lint (_lint.yml reusable)"]
    changes --> typecheck
    changes --> sec["security-scan<br/>(pip-audit + Bandit SAST)"]
    changes --> perf["performance-test<br/>(PR only)"]
    changes --> contract["contract-tests<br/>(PR + schedule)"]
    changes --> integ["integration tests<br/>(schedule/dispatch only)"]

Job	Description	Duration
`changes`	Detect changed paths per package	~15 sec
`matrix-builder`	Build dynamic test matrix from changes	~5 sec
`lint`	Ruff linter + formatter (reusable `_lint.yml`)	~30 sec
`typecheck`	MyPy type checking (all packages)	~1 min
`test`	Pytest with 60% coverage per package (matrix)	~3 min
`security-scan`	pip-audit + Bandit SAST	~30 sec
`performance-test`	Benchmark tests (PR only)	~2 min
`contract-tests`	API contract tests with Docker services	~8 min

Workflow Coordination (CI Gate)¶

Deployment workflows use workflow_run trigger to gate on CI success:

Workflow	Trigger	Condition
Docker Build	CI success + tag `v*`	`workflow_run` completed
Publish Libs	CI success + tag `v*`	`workflow_run` completed
Deploy Lambdas	CI success + tag `v*`	`workflow_run` completed

This prevents deploying code that failed CI checks.

sequenceDiagram
    actor Dev as Developer
    participant GH as GitHub
    participant CI as CI Workflow<br/>(ci.yml)
    participant Docker as Docker Build<br/>(docker-build.yml)
    participant Lambdas as Deploy Lambdas<br/>(deploy-lambdas.yml)
    participant Publish as Publish Libs<br/>(publish-libs.yml)

    Dev->>GH: git push origin v1.2.0 (tag)
    GH->>CI: Trigger on tag push (v*)

    CI->>CI: changes (path detection)
    CI->>CI: lint (Ruff)
    CI->>CI: typecheck (mypy)
    CI->>CI: test (pytest matrix)
    CI->>CI: security-scan (pip-audit + Bandit)

    CI-->>GH: CI completed (success)

    par workflow_run triggers (parallel)
        GH->>Docker: workflow_run: CI success + tag v*
        Docker->>Docker: Build 4 service images
        Docker->>Docker: Push to ECR
        Docker->>Docker: Force redeploy ECS services
    and
        GH->>Lambdas: workflow_run: CI success + tag v*
        Lambdas->>Lambdas: Build tradai-common wheel
        Lambdas->>Lambdas: Build base image
        Lambdas->>Lambdas: Discover + build all lambdas
        Lambdas->>Lambdas: Update Lambda functions
    and
        GH->>Publish: workflow_run: CI success + tag v*
        Publish->>Publish: Build tradai-strategy wheel
        Publish->>Publish: Publish to CodeArtifact
    end

2. Docker Build (`docker-build.yml`)¶

Triggers: workflow_run on CI (tag v* only)

Builds and pushes Docker images for all services to ECR, then redeploys ECS:

Service	Image Name	Dockerfile
backend	`tradai/backend`	`services/backend/Dockerfile`
data-collection	`tradai/data-collection`	`services/data-collection/Dockerfile`
strategy-service	`tradai/strategy-service`	`services/strategy-service/Dockerfile`
mlflow	`tradai/mlflow`	`services/mlflow/Dockerfile`

Images are tagged with the version (e.g., v1.2.3) and latest. After push, the workflow force-redeploys ECS services (backend-api, data-collection, strategy-service, mlflow).

Version extraction: Uses github.event.workflow_run.head_branch (not GITHUB_REF_NAME) because workflow_run context does not have the tag in GITHUB_REF_NAME.

3. Deploy Lambdas (`deploy-lambdas.yml`)¶

Triggers: workflow_run on CI (tag v*), manual dispatch

Dynamically discovers and deploys all Lambda functions:

version ──→ build-wheel ──→ build-base ──→ discover ──→ build-lambdas ──→ update-functions
   │                            │              │              │
   │                            │              │              └─ Matrix: all lambdas
   │                            │              └─ Finds lambdas/*/Dockerfile
   │                            └─ Base image with tradai-common
   └─ Calculates version ONCE (prevents race condition)

Lambda Discovery: Automatically finds all directories in lambdas/ with a Dockerfile (excluding base/). No config update needed when adding new lambdas.

Manual Dispatch: Select environment (dev, staging, prod) for emergency deploys that bypass the CI gate.

4. Publish Libraries (`publish-libs.yml`)¶

Triggers: workflow_run on CI (tag v*)

Publishes the tradai-strategy library wheel to AWS CodeArtifact (tradai-python-dev repository) for consumption by the strategies repository:

# The strategies repo consumes this via:
pip install tradai-strategy --index-url https://...codeartifact.../pypi/tradai-python-dev/simple/

5. Deploy Infrastructure (`deploy-infra.yml`)¶

Triggers: Pull requests (paths: infra/**), manual dispatch

Input	Options	Description
`stack`	`dev`, `staging`, `prod`	Target environment
`command`	`preview`, `up`	Pulumi command

PR behavior: Runs pulumi preview on all stacks (dev, staging, prod) in parallel via matrix, then posts a summary comment on the PR showing create/update/delete/replace counts per stack.

Manual deployment: Select stack and command from the Actions UI. Uses infra/pulumi-ci.sh to orchestrate all Pulumi layers.

6. Other Workflows¶

Workflow	File	Purpose
Deploy Documentation	`docs.yml`	Build MkDocs and deploy to Cloudflare Pages on `main` push
Devcontainer Prebuild	`devcontainer-prebuild.yml`	Build and push devcontainer image to GHCR weekly + on `.devcontainer/` changes
Devcontainer CI	`devcontainer-ci.yml`	Weekly test suite run inside devcontainer to catch environment drift
Docs Freshness	`docs-freshness.yml`	Documentation freshness checks (schedule)

Strategy CI/CD Pipeline (tradai-strategies)¶

Note: The CI/CD workflows below run in the tradai-strategies repository, not in the platform repo.

The proprietary strategies repository has its own CI/CD pipeline. This section documents the expected workflow for that separate repo.

Pipeline Stages¶

Stage 1: Lint¶

lint:
  script:
    - just lint $STRATEGY_NAME

Checks: Ruff formatting, linting, import sorting.

Failure fix: just fmt $STRATEGY_NAME

Stage 2: Type Check¶

typecheck:
  script:
    - just typecheck $STRATEGY_NAME

Checks: mypy strict type checking, Protocol compliance.

Failure fix: uv run mypy $STRATEGY_NAME --show-error-codes

Stage 3: Unit Tests¶

test:
  script:
    - just test $STRATEGY_NAME

Requirements: All tests pass, coverage >= 80%.

Stage 4: Smoke Backtest¶

smoke-backtest:
  script:
    - just backtest-smoke $STRATEGY_NAME

30-day timerange, single pair (BTC/USDT:USDT), basic validation, ~5 minutes.

Stage 5: Full Backtest (Release Only)¶

full-backtest:
  script:
    - just backtest-full $STRATEGY_NAME --timerange 20240101-20241201

12-month timerange, all configured pairs, full metrics validation:

Metric	Threshold
Sharpe Ratio	>= 1.0
Profit Factor	>= 1.2
Max Drawdown	<= 20%
Total Trades	>= 50

Stage 6: Docker Build & ECR Push¶

The strategy is packaged as a Docker image using Freqtrade as the base:

FROM freqtradeorg/freqtrade:stable

COPY . /app
RUN pip install /app
COPY configs /freqtrade/user_data/configs

ENTRYPOINT ["freqtrade", "trade"]

Built and pushed to ECR with version and latest tags.

Stage 7: MLflow Registration¶

mlflow-register:
  script:
    - just mlflow-register $STRATEGY_NAME $TAG

Registers model version with backtest metrics, Docker image URI, and strategy metadata.

Strategy GitHub Actions Workflow¶

The tradai-strategies repo uses this workflow (in its own .github/workflows/strategy-ci.yml):

name: Strategy CI/CD

on:
  pull_request:
    paths:
      - 'strategies/**'
  push:
    branches: [main]
    tags:
      - '*-v*'

env:
  AWS_REGION: eu-central-1

jobs:
  detect-strategies:
    runs-on: ubuntu-latest
    outputs:
      strategies: ${{ steps.detect.outputs.strategies }}
    steps:
      - uses: actions/checkout@v4
      - id: detect
        run: |
          STRATEGIES=$(git diff --name-only ${{ github.event.before }} ${{ github.sha }} \
            | grep '^strategies/' | cut -d'/' -f2 | sort -u \
            | jq -R -s -c 'split("\n")[:-1]')
          echo "strategies=$STRATEGIES" >> $GITHUB_OUTPUT

  lint-test:
    needs: detect-strategies
    runs-on: ubuntu-latest
    strategy:
      matrix:
        strategy: ${{ fromJson(needs.detect-strategies.outputs.strategies) }}
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      - run: pip install uv
      - run: cd strategies/${{ matrix.strategy }} && uv sync
      - run: just lint ${{ matrix.strategy }}
      - run: just typecheck ${{ matrix.strategy }}
      - run: just test ${{ matrix.strategy }}

  smoke-backtest:
    needs: lint-test
    runs-on: ubuntu-latest
    strategy:
      matrix:
        strategy: ${{ fromJson(needs.detect-strategies.outputs.strategies) }}
    steps:
      - uses: actions/checkout@v4
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: ${{ env.AWS_REGION }}
      - name: CodeArtifact Login
        run: |
          export CODEARTIFACT_TOKEN=$(aws codeartifact get-authorization-token \
            --domain tradai --query authorizationToken --output text)
          pip config set global.extra-index-url \
            "https://aws:${CODEARTIFACT_TOKEN}@tradai-${AWS_REGION}.d.codeartifact.${AWS_REGION}.amazonaws.com/pypi/tradai-python-dev/simple/"
      - run: just backtest-smoke ${{ matrix.strategy }}

  release:
    if: startsWith(github.ref, 'refs/tags/')
    needs: smoke-backtest
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Extract strategy from tag
        id: extract
        run: |
          TAG=${GITHUB_REF#refs/tags/}
          STRATEGY=$(echo $TAG | sed 's/-v[0-9].*//')
          VERSION=$(echo $TAG | sed 's/.*-v//')
          echo "strategy=$STRATEGY" >> $GITHUB_OUTPUT
          echo "version=$VERSION" >> $GITHUB_OUTPUT
          echo "tag=$TAG" >> $GITHUB_OUTPUT
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: ${{ env.AWS_REGION }}
      - run: just backtest-full ${{ steps.extract.outputs.strategy }}
      - name: Build Docker
        run: |
          docker build \
            -t ${{ secrets.ECR_REGISTRY }}/${{ steps.extract.outputs.strategy }}:${{ steps.extract.outputs.tag }} \
            -t ${{ secrets.ECR_REGISTRY }}/${{ steps.extract.outputs.strategy }}:latest \
            strategies/${{ steps.extract.outputs.strategy }}
      - name: Push to ECR
        run: |
          aws ecr get-login-password | docker login --username AWS --password-stdin ${{ secrets.ECR_REGISTRY }}
          docker push ${{ secrets.ECR_REGISTRY }}/${{ steps.extract.outputs.strategy }}:${{ steps.extract.outputs.tag }}
          docker push ${{ secrets.ECR_REGISTRY }}/${{ steps.extract.outputs.strategy }}:latest
      - run: just mlflow-register ${{ steps.extract.outputs.strategy }} ${{ steps.extract.outputs.tag }}

Required Secrets¶

GitHub Actions Secrets (tradai-uv)¶

Configure in GitHub Settings, then Secrets and variables, then Actions:

Secret	Description	Used by
`AWS_ACCESS_KEY_ID`	AWS access key	All deployment workflows
`AWS_SECRET_ACCESS_KEY`	AWS secret key	All deployment workflows
`AWS_REGION`	AWS region (e.g., `eu-central-1`)	All deployment workflows
`AWS_ACCOUNT_ID`	AWS account ID	deploy-lambdas
`PULUMI_CONFIG_PASSPHRASE`	Pulumi encryption passphrase	deploy-infra
`S3_PULUMI_BACKEND_URL`	Pulumi state backend (e.g., `s3://tradai-pulumi-state`)	deploy-infra
`CLOUDFLARE_API_TOKEN`	Cloudflare Pages API token	docs
`CLOUDFLARE_ACCOUNT_ID`	Cloudflare account ID	docs

GitHub Actions Secrets (tradai-strategies)¶

Secret	Description	Used by
`AWS_ACCESS_KEY_ID`	ECR/CodeArtifact access	smoke-backtest, release
`AWS_SECRET_ACCESS_KEY`	ECR/CodeArtifact access	smoke-backtest, release
`ECR_REGISTRY`	ECR registry URL	release
`MLFLOW_TRACKING_URI`	MLflow server URL	release

Versioning Convention¶

Tag Format¶

Platform (tradai-uv): v<semver> (e.g., v1.0.0, v0.0.0-test)

Strategies (tradai-strategies): <strategy-slug>-v<semver> (e.g., momentum-v1.0.0, trend-following-v2.1.3, ml-rsi-v1.0.0-rc1)

Version Incrementing¶

Change Type	Example	Version Bump
Bug fix	Fix signal calculation	1.0.0 → 1.0.1
New indicator	Add MACD	1.0.1 → 1.1.0
Breaking change	New config format	1.1.0 → 2.0.0

CodeArtifact Integration¶

Setup¶

# Login (12-hour token)
just codeartifact-login

# Or manually
export CODEARTIFACT_TOKEN=$(aws codeartifact get-authorization-token \
  --domain tradai \
  --query authorizationToken \
  --output text)

pip config set global.extra-index-url \
  "https://aws:${CODEARTIFACT_TOKEN}@tradai-eu-central-1.d.codeartifact.eu-central-1.amazonaws.com/pypi/tradai-python-dev/simple/"

Publishing tradai-strategy¶

When tradai-strategy package is updated in tradai-uv:

cd libs/tradai-strategy
uv build
twine upload --repository codeartifact dist/*

In CI, this is handled by publish-libs.yml which publishes to the tradai-python-dev CodeArtifact repository using twine upload --skip-existing.

Consuming in Strategies¶

# strategies/my-strategy/pyproject.toml
[project]
dependencies = [
    "tradai-strategy>=1.0.0",  # From CodeArtifact
    "freqtrade>=2026.4",
]

Adding New Services¶

Create service directory: services/<name>/
Add Dockerfile: services/<name>/Dockerfile
Update docker-build.yml matrix:

matrix:
  include:
    # ... existing services
    - service: new-service
      dockerfile: services/new-service/Dockerfile
      image_name: tradai/new-service

Adding New Lambdas¶

Lambdas are auto-discovered. Just create:

lambdas/
└── new-lambda/
    ├── Dockerfile
    └── handler.py

The next tag release will automatically build and deploy it.

Manual Commands¶

Local Development¶

# Full CI locally
just check                    # lint + typecheck + test

# Individual stages
just lint
just typecheck
just test
just fmt                      # Auto-fix formatting

Strategy Development (in tradai-strategies)¶

# Full CI locally
just ci momentum-strategy

# Individual stages
just lint momentum-strategy
just typecheck momentum-strategy
just test momentum-strategy
just backtest-smoke momentum-strategy

Release Process (tradai-uv)¶

# 1. Ensure main is up to date
git checkout main && git pull

# 2. Tag and push
git tag v1.1.0
git push origin --tags

Release Process (tradai-strategies)¶

# 1. Ensure main is up to date
git checkout main && git pull

# 2. Update version in pyproject.toml
# version = "1.0.0" -> "1.1.0"

# 3. Commit
git add .
git commit -m "chore(momentum-strategy): bump version to 1.1.0"
git push

# 4. Create and push tag
git tag momentum-strategy-v1.1.0
git push --tags

E2E Testing the Deployment Pipeline¶

Push a version tag to trigger the full chain: CI, then Docker Build, Deploy Lambdas, and Publish Libraries.

Why run E2E tests¶

The three deployment workflows (docker-build, deploy-lambdas, publish-libs) only trigger on v* tags, not on PRs or pushes to main. Without periodic E2E testing, bugs in deployment workflows go undetected until a real release.

Examples of bugs caught by E2E testing (issue #93):

Version extraction bug: Workflows used GITHUB_REF_NAME which returns main instead of the tag in workflow_run context. Docker images were tagged main instead of the version.
Non-existent environment: Workflows targeted prod resources that had never been provisioned. publish-libs failed with ResourceNotFoundException.

How to run¶

git tag v0.0.0-test
git push origin v0.0.0-test

CI runs first. When it passes, three workflows trigger automatically via workflow_run.

What to check¶

Open the Actions tab and verify each workflow:

publish-libs.yml:

Step	What to look for	Failure means
Get CodeArtifact repository endpoint	`tradai-python-dev` resolved	CodeArtifact repo missing or IAM issue
Build tradai-strategy wheel	Wheel built in `dist/`	Build error in `libs/tradai-strategy`
Publish to CodeArtifact	`twine upload` exit 0	Auth token or network error

deploy-lambdas.yml:

Step	What to look for	Failure means
Calculate Version	`VERSION=v0.0.0-test, environment: dev`	Version extraction broken
Build tradai-common wheel	Wheel artifact uploaded	Build error in `libs/tradai-common`
Build Lambda Base Image	Image pushed to ECR	Dockerfile or ECR permissions
Build lambdas (matrix)	All lambda images pushed	Individual lambda Dockerfile issue
Update Lambda Functions	`Updated successfully` or `Function does not exist (skipping)`	IAM permissions for `lambda:UpdateFunctionCode`

docker-build.yml:

Step	What to look for	Failure means
Extract version from tag	`VERSION=v0.0.0-test` (not `main`)	Version extraction regression
Build and push Docker image	All 4 service images pushed	Dockerfile or ECR issue
Force new deployment (ECS)	`Service not found` is OK in dev	In prod: ECS name mismatch

What gets created in AWS¶

Resource	What happens	Cleanup needed?
Docker images in ECR	Tagged as `v0.0.0-test` + `latest`	No -- `latest` overwrites on next deploy
Lambda images in ECR	Same as above	No
Lambda function code	`update-function-code` points to new image	No -- next deploy overwrites
CodeArtifact wheel	Published with `--skip-existing`	No -- idempotent
ECS service	`force-new-deployment`	No -- in dev this is a no-op

Cleanup¶

Delete the test git tag only:

git push origin --delete v0.0.0-test
git tag -d v0.0.0-test

Cost¶

These workflows only trigger on v* tags. PRs only run CI with smart change detection. A full E2E test tag run takes approximately 15 minutes of GitHub Actions compute.

Troubleshooting¶

Pipeline Failures¶

Stage	Common Issue	Resolution
Lint	Formatting issues	`just fmt` (platform) or `just fmt $STRATEGY` (strategies)
Typecheck	Missing types	Add type annotations, run `uv run mypy --show-error-codes`
Test	Fixture missing	Check `conftest.py`
Backtest	Data missing	Sync data first
Docker	Build fails	Check Dockerfile, verify build context
ECR	Push denied	Check AWS credentials and IAM permissions
MLflow	Register fails	Check MLflow connection
Lambda discovery	No lambdas found	Ensure `lambdas/<name>/Dockerfile` exists

CodeArtifact Token Expired¶

# Re-login
just codeartifact-login

# CI: Token is auto-refreshed in pipeline

ECR Push Permission Denied¶

# Check AWS credentials
aws sts get-caller-identity

# Verify ECR permissions
aws ecr describe-repositories --repository-names tradai/$SERVICE

# Required IAM permissions
# ecr:GetAuthorizationToken, ecr:BatchCheckLayerAvailability, ecr:PutImage

Cache Issues¶

# GitHub Actions uses UV cache at ~/.cache/uv
# Cache key includes uv.lock hash -- changing uv.lock invalidates cache automatically

Pulumi State Lock¶

# If Pulumi reports state lock:
pulumi cancel  # Cancel stuck operation
# Or manually unlock in S3

VERSION=main instead of tag¶

In workflow_run context, GITHUB_REF_NAME returns main, not the tag. Workflows must use github.event.workflow_run.head_branch to extract the version. If you see images tagged main instead of a version, check this extraction logic.

Failed Lambda Discovery¶

# Check lambdas directory structure
ls -la lambdas/

# Each lambda needs:
# lambdas/<name>/Dockerfile
# lambdas/<name>/handler.py (or similar entry point)

Strategy Lifecycle - Full development workflow
Strategy Repo Guide - Repository setup
Pulumi Deployment - Infrastructure (ECR, CodeArtifact)
Pulumi Operations - Day-to-day infrastructure operations