CI/CD Pipeline Guide¶
Complete guide covering the CI/CD pipelines for both the tradai-uv platform (this repo) and the tradai-strategies repository. All CI/CD runs on GitHub Actions.
Pipeline Overview¶
Platform Pipeline (tradai-uv)¶
graph TD
T1["Push to main / PR"] --> CI1["CI (ci.yml)<br/>changes → lint, typecheck,<br/>test, security, perf (PR)"]
T2["Tag v*"] --> CI2["CI (ci.yml)<br/>lint → typecheck → test → security-scan"]
CI2 -->|"workflow_run<br/>(CI success)"| DB["docker-build<br/>(4 services)"]
CI2 -->|"workflow_run<br/>(CI success)"| DL["deploy-lambdas<br/>(auto-discover)"]
CI2 -->|"workflow_run<br/>(CI success)"| PL["publish-libs<br/>(CodeArtifact)"] Strategy Pipeline (tradai-strategies)¶
┌─────────────────────────────────────────────────────────────────────┐
│ Pull Request │
├─────────────────────────────────────────────────────────────────────┤
│ lint → typecheck → unit-tests → smoke-backtest │
└─────────────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────────┐
│ Main Branch Merge │
├─────────────────────────────────────────────────────────────────────┤
│ lint → typecheck → unit-tests → smoke-backtest │
└─────────────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────────┐
│ Tag Release (*-v*) │
├─────────────────────────────────────────────────────────────────────┤
│ lint → typecheck → tests → full-backtest → docker → ECR → MLflow │
└─────────────────────────────────────────────────────────────────────┘
GitHub Actions Workflows¶
All workflow files live in .github/workflows/. Reusable workflows are prefixed with _.
1. CI (ci.yml)¶
Triggers: Push to main, tags v*, pull requests, weekly schedule, manual dispatch
Path filtering: Uses dorny/paths-filter@v3 to build a dynamic test matrix. Each package filter lists its own source plus the tradai-common submodules it imports. This avoids the "common changed so test everything" cascade.
graph LR
changes --> MB["matrix-builder"] --> test["test (dynamic matrix)"]
changes --> lint["lint (_lint.yml reusable)"]
changes --> typecheck
changes --> sec["security-scan<br/>(pip-audit + Bandit SAST)"]
changes --> perf["performance-test<br/>(PR only)"]
changes --> contract["contract-tests<br/>(PR + schedule)"]
changes --> integ["integration tests<br/>(schedule/dispatch only)"] | Job | Description | Duration |
|---|---|---|
changes | Detect changed paths per package | ~15 sec |
matrix-builder | Build dynamic test matrix from changes | ~5 sec |
lint | Ruff linter + formatter (reusable _lint.yml) | ~30 sec |
typecheck | MyPy type checking (all packages) | ~1 min |
test | Pytest with 60% coverage per package (matrix) | ~3 min |
security-scan | pip-audit + Bandit SAST | ~30 sec |
performance-test | Benchmark tests (PR only) | ~2 min |
contract-tests | API contract tests with Docker services | ~8 min |
Workflow Coordination (CI Gate)¶
Deployment workflows use workflow_run trigger to gate on CI success:
| Workflow | Trigger | Condition |
|---|---|---|
| Docker Build | CI success + tag v* | workflow_run completed |
| Publish Libs | CI success + tag v* | workflow_run completed |
| Deploy Lambdas | CI success + tag v* | workflow_run completed |
This prevents deploying code that failed CI checks.
sequenceDiagram
actor Dev as Developer
participant GH as GitHub
participant CI as CI Workflow<br/>(ci.yml)
participant Docker as Docker Build<br/>(docker-build.yml)
participant Lambdas as Deploy Lambdas<br/>(deploy-lambdas.yml)
participant Publish as Publish Libs<br/>(publish-libs.yml)
Dev->>GH: git push origin v1.2.0 (tag)
GH->>CI: Trigger on tag push (v*)
CI->>CI: changes (path detection)
CI->>CI: lint (Ruff)
CI->>CI: typecheck (mypy)
CI->>CI: test (pytest matrix)
CI->>CI: security-scan (pip-audit + Bandit)
CI-->>GH: CI completed (success)
par workflow_run triggers (parallel)
GH->>Docker: workflow_run: CI success + tag v*
Docker->>Docker: Build 4 service images
Docker->>Docker: Push to ECR
Docker->>Docker: Force redeploy ECS services
and
GH->>Lambdas: workflow_run: CI success + tag v*
Lambdas->>Lambdas: Build tradai-common wheel
Lambdas->>Lambdas: Build base image
Lambdas->>Lambdas: Discover + build all lambdas
Lambdas->>Lambdas: Update Lambda functions
and
GH->>Publish: workflow_run: CI success + tag v*
Publish->>Publish: Build tradai-strategy wheel
Publish->>Publish: Publish to CodeArtifact
end 2. Docker Build (docker-build.yml)¶
Triggers: workflow_run on CI (tag v* only)
Builds and pushes Docker images for all services to ECR, then redeploys ECS:
| Service | Image Name | Dockerfile |
|---|---|---|
| backend | tradai/backend | services/backend/Dockerfile |
| data-collection | tradai/data-collection | services/data-collection/Dockerfile |
| strategy-service | tradai/strategy-service | services/strategy-service/Dockerfile |
| mlflow | tradai/mlflow | services/mlflow/Dockerfile |
Images are tagged with the version (e.g., v1.2.3) and latest. After push, the workflow force-redeploys ECS services (backend-api, data-collection, strategy-service, mlflow).
Version extraction: Uses github.event.workflow_run.head_branch (not GITHUB_REF_NAME) because workflow_run context does not have the tag in GITHUB_REF_NAME.
3. Deploy Lambdas (deploy-lambdas.yml)¶
Triggers: workflow_run on CI (tag v*), manual dispatch
Dynamically discovers and deploys all Lambda functions:
version ──→ build-wheel ──→ build-base ──→ discover ──→ build-lambdas ──→ update-functions
│ │ │ │
│ │ │ └─ Matrix: all lambdas
│ │ └─ Finds lambdas/*/Dockerfile
│ └─ Base image with tradai-common
└─ Calculates version ONCE (prevents race condition)
Lambda Discovery: Automatically finds all directories in lambdas/ with a Dockerfile (excluding base/). No config update needed when adding new lambdas.
Manual Dispatch: Select environment (dev, staging, prod) for emergency deploys that bypass the CI gate.
4. Publish Libraries (publish-libs.yml)¶
Triggers: workflow_run on CI (tag v*)
Publishes the tradai-strategy library wheel to AWS CodeArtifact (tradai-python-dev repository) for consumption by the strategies repository:
# The strategies repo consumes this via:
pip install tradai-strategy --index-url https://...codeartifact.../pypi/tradai-python-dev/simple/
5. Deploy Infrastructure (deploy-infra.yml)¶
Triggers: Pull requests (paths: infra/**), manual dispatch
| Input | Options | Description |
|---|---|---|
stack | dev, staging, prod | Target environment |
command | preview, up | Pulumi command |
PR behavior: Runs pulumi preview on all stacks (dev, staging, prod) in parallel via matrix, then posts a summary comment on the PR showing create/update/delete/replace counts per stack.
Manual deployment: Select stack and command from the Actions UI. Uses infra/pulumi-ci.sh to orchestrate all Pulumi layers.
6. Other Workflows¶
| Workflow | File | Purpose |
|---|---|---|
| Deploy Documentation | docs.yml | Build MkDocs and deploy to Cloudflare Pages on main push |
| Devcontainer Prebuild | devcontainer-prebuild.yml | Build and push devcontainer image to GHCR weekly + on .devcontainer/ changes |
| Devcontainer CI | devcontainer-ci.yml | Weekly test suite run inside devcontainer to catch environment drift |
| Docs Freshness | docs-freshness.yml | Documentation freshness checks (schedule) |
Strategy CI/CD Pipeline (tradai-strategies)¶
Note: The CI/CD workflows below run in the
tradai-strategiesrepository, not in the platform repo.
The proprietary strategies repository has its own CI/CD pipeline. This section documents the expected workflow for that separate repo.
Pipeline Stages¶
Stage 1: Lint¶
Checks: Ruff formatting, linting, import sorting.
Failure fix: just fmt $STRATEGY_NAME
Stage 2: Type Check¶
Checks: mypy strict type checking, Protocol compliance.
Failure fix: uv run mypy $STRATEGY_NAME --show-error-codes
Stage 3: Unit Tests¶
Requirements: All tests pass, coverage >= 80%.
Stage 4: Smoke Backtest¶
30-day timerange, single pair (BTC/USDT:USDT), basic validation, ~5 minutes.
Stage 5: Full Backtest (Release Only)¶
12-month timerange, all configured pairs, full metrics validation:
| Metric | Threshold |
|---|---|
| Sharpe Ratio | >= 1.0 |
| Profit Factor | >= 1.2 |
| Max Drawdown | <= 20% |
| Total Trades | >= 50 |
Stage 6: Docker Build & ECR Push¶
The strategy is packaged as a Docker image using Freqtrade as the base:
FROM freqtradeorg/freqtrade:stable
COPY . /app
RUN pip install /app
COPY configs /freqtrade/user_data/configs
ENTRYPOINT ["freqtrade", "trade"]
Built and pushed to ECR with version and latest tags.
Stage 7: MLflow Registration¶
Registers model version with backtest metrics, Docker image URI, and strategy metadata.
Strategy GitHub Actions Workflow¶
The tradai-strategies repo uses this workflow (in its own .github/workflows/strategy-ci.yml):
name: Strategy CI/CD
on:
pull_request:
paths:
- 'strategies/**'
push:
branches: [main]
tags:
- '*-v*'
env:
AWS_REGION: eu-central-1
jobs:
detect-strategies:
runs-on: ubuntu-latest
outputs:
strategies: ${{ steps.detect.outputs.strategies }}
steps:
- uses: actions/checkout@v4
- id: detect
run: |
STRATEGIES=$(git diff --name-only ${{ github.event.before }} ${{ github.sha }} \
| grep '^strategies/' | cut -d'/' -f2 | sort -u \
| jq -R -s -c 'split("\n")[:-1]')
echo "strategies=$STRATEGIES" >> $GITHUB_OUTPUT
lint-test:
needs: detect-strategies
runs-on: ubuntu-latest
strategy:
matrix:
strategy: ${{ fromJson(needs.detect-strategies.outputs.strategies) }}
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- run: pip install uv
- run: cd strategies/${{ matrix.strategy }} && uv sync
- run: just lint ${{ matrix.strategy }}
- run: just typecheck ${{ matrix.strategy }}
- run: just test ${{ matrix.strategy }}
smoke-backtest:
needs: lint-test
runs-on: ubuntu-latest
strategy:
matrix:
strategy: ${{ fromJson(needs.detect-strategies.outputs.strategies) }}
steps:
- uses: actions/checkout@v4
- uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ env.AWS_REGION }}
- name: CodeArtifact Login
run: |
export CODEARTIFACT_TOKEN=$(aws codeartifact get-authorization-token \
--domain tradai --query authorizationToken --output text)
pip config set global.extra-index-url \
"https://aws:${CODEARTIFACT_TOKEN}@tradai-${AWS_REGION}.d.codeartifact.${AWS_REGION}.amazonaws.com/pypi/tradai-python-dev/simple/"
- run: just backtest-smoke ${{ matrix.strategy }}
release:
if: startsWith(github.ref, 'refs/tags/')
needs: smoke-backtest
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Extract strategy from tag
id: extract
run: |
TAG=${GITHUB_REF#refs/tags/}
STRATEGY=$(echo $TAG | sed 's/-v[0-9].*//')
VERSION=$(echo $TAG | sed 's/.*-v//')
echo "strategy=$STRATEGY" >> $GITHUB_OUTPUT
echo "version=$VERSION" >> $GITHUB_OUTPUT
echo "tag=$TAG" >> $GITHUB_OUTPUT
- uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ env.AWS_REGION }}
- run: just backtest-full ${{ steps.extract.outputs.strategy }}
- name: Build Docker
run: |
docker build \
-t ${{ secrets.ECR_REGISTRY }}/${{ steps.extract.outputs.strategy }}:${{ steps.extract.outputs.tag }} \
-t ${{ secrets.ECR_REGISTRY }}/${{ steps.extract.outputs.strategy }}:latest \
strategies/${{ steps.extract.outputs.strategy }}
- name: Push to ECR
run: |
aws ecr get-login-password | docker login --username AWS --password-stdin ${{ secrets.ECR_REGISTRY }}
docker push ${{ secrets.ECR_REGISTRY }}/${{ steps.extract.outputs.strategy }}:${{ steps.extract.outputs.tag }}
docker push ${{ secrets.ECR_REGISTRY }}/${{ steps.extract.outputs.strategy }}:latest
- run: just mlflow-register ${{ steps.extract.outputs.strategy }} ${{ steps.extract.outputs.tag }}
Required Secrets¶
GitHub Actions Secrets (tradai-uv)¶
Configure in GitHub Settings, then Secrets and variables, then Actions:
| Secret | Description | Used by |
|---|---|---|
AWS_ACCESS_KEY_ID | AWS access key | All deployment workflows |
AWS_SECRET_ACCESS_KEY | AWS secret key | All deployment workflows |
AWS_REGION | AWS region (e.g., eu-central-1) | All deployment workflows |
AWS_ACCOUNT_ID | AWS account ID | deploy-lambdas |
PULUMI_CONFIG_PASSPHRASE | Pulumi encryption passphrase | deploy-infra |
S3_PULUMI_BACKEND_URL | Pulumi state backend (e.g., s3://tradai-pulumi-state) | deploy-infra |
CLOUDFLARE_API_TOKEN | Cloudflare Pages API token | docs |
CLOUDFLARE_ACCOUNT_ID | Cloudflare account ID | docs |
GitHub Actions Secrets (tradai-strategies)¶
| Secret | Description | Used by |
|---|---|---|
AWS_ACCESS_KEY_ID | ECR/CodeArtifact access | smoke-backtest, release |
AWS_SECRET_ACCESS_KEY | ECR/CodeArtifact access | smoke-backtest, release |
ECR_REGISTRY | ECR registry URL | release |
MLFLOW_TRACKING_URI | MLflow server URL | release |
Versioning Convention¶
Tag Format¶
Platform (tradai-uv): v<semver> (e.g., v1.0.0, v0.0.0-test)
Strategies (tradai-strategies): <strategy-slug>-v<semver> (e.g., momentum-v1.0.0, trend-following-v2.1.3, ml-rsi-v1.0.0-rc1)
Version Incrementing¶
| Change Type | Example | Version Bump |
|---|---|---|
| Bug fix | Fix signal calculation | 1.0.0 → 1.0.1 |
| New indicator | Add MACD | 1.0.1 → 1.1.0 |
| Breaking change | New config format | 1.1.0 → 2.0.0 |
CodeArtifact Integration¶
Setup¶
# Login (12-hour token)
just codeartifact-login
# Or manually
export CODEARTIFACT_TOKEN=$(aws codeartifact get-authorization-token \
--domain tradai \
--query authorizationToken \
--output text)
pip config set global.extra-index-url \
"https://aws:${CODEARTIFACT_TOKEN}@tradai-eu-central-1.d.codeartifact.eu-central-1.amazonaws.com/pypi/tradai-python-dev/simple/"
Publishing tradai-strategy¶
When tradai-strategy package is updated in tradai-uv:
In CI, this is handled by publish-libs.yml which publishes to the tradai-python-dev CodeArtifact repository using twine upload --skip-existing.
Consuming in Strategies¶
# strategies/my-strategy/pyproject.toml
[project]
dependencies = [
"tradai-strategy>=1.0.0", # From CodeArtifact
"freqtrade>=2025.6",
]
Adding New Services¶
- Create service directory:
services/<name>/ - Add Dockerfile:
services/<name>/Dockerfile - Update
docker-build.ymlmatrix:
matrix:
include:
# ... existing services
- service: new-service
dockerfile: services/new-service/Dockerfile
image_name: tradai/new-service
Adding New Lambdas¶
Lambdas are auto-discovered. Just create:
The next tag release will automatically build and deploy it.
Manual Commands¶
Local Development¶
# Full CI locally
just check # lint + typecheck + test
# Individual stages
just lint
just typecheck
just test
just fmt # Auto-fix formatting
Strategy Development (in tradai-strategies)¶
# Full CI locally
just ci momentum-strategy
# Individual stages
just lint momentum-strategy
just typecheck momentum-strategy
just test momentum-strategy
just backtest-smoke momentum-strategy
Release Process (tradai-uv)¶
# 1. Ensure main is up to date
git checkout main && git pull
# 2. Tag and push
git tag v1.1.0
git push origin --tags
Release Process (tradai-strategies)¶
# 1. Ensure main is up to date
git checkout main && git pull
# 2. Update version in pyproject.toml
# version = "1.0.0" -> "1.1.0"
# 3. Commit
git add .
git commit -m "chore(momentum-strategy): bump version to 1.1.0"
git push
# 4. Create and push tag
git tag momentum-strategy-v1.1.0
git push --tags
E2E Testing the Deployment Pipeline¶
Push a version tag to trigger the full chain: CI, then Docker Build, Deploy Lambdas, and Publish Libraries.
Why run E2E tests¶
The three deployment workflows (docker-build, deploy-lambdas, publish-libs) only trigger on v* tags, not on PRs or pushes to main. Without periodic E2E testing, bugs in deployment workflows go undetected until a real release.
Examples of bugs caught by E2E testing (issue #93):
- Version extraction bug: Workflows used
GITHUB_REF_NAMEwhich returnsmaininstead of the tag inworkflow_runcontext. Docker images were taggedmaininstead of the version. - Non-existent environment: Workflows targeted
prodresources that had never been provisioned.publish-libsfailed withResourceNotFoundException.
How to run¶
CI runs first. When it passes, three workflows trigger automatically via workflow_run.
What to check¶
Open the Actions tab and verify each workflow:
publish-libs.yml:
| Step | What to look for | Failure means |
|---|---|---|
| Get CodeArtifact repository endpoint | tradai-python-dev resolved | CodeArtifact repo missing or IAM issue |
| Build tradai-strategy wheel | Wheel built in dist/ | Build error in libs/tradai-strategy |
| Publish to CodeArtifact | twine upload exit 0 | Auth token or network error |
deploy-lambdas.yml:
| Step | What to look for | Failure means |
|---|---|---|
| Calculate Version | VERSION=v0.0.0-test, environment: dev | Version extraction broken |
| Build tradai-common wheel | Wheel artifact uploaded | Build error in libs/tradai-common |
| Build Lambda Base Image | Image pushed to ECR | Dockerfile or ECR permissions |
| Build lambdas (matrix) | All lambda images pushed | Individual lambda Dockerfile issue |
| Update Lambda Functions | Updated successfully or Function does not exist (skipping) | IAM permissions for lambda:UpdateFunctionCode |
docker-build.yml:
| Step | What to look for | Failure means |
|---|---|---|
| Extract version from tag | VERSION=v0.0.0-test (not main) | Version extraction regression |
| Build and push Docker image | All 4 service images pushed | Dockerfile or ECR issue |
| Force new deployment (ECS) | Service not found is OK in dev | In prod: ECS name mismatch |
What gets created in AWS¶
| Resource | What happens | Cleanup needed? |
|---|---|---|
| Docker images in ECR | Tagged as v0.0.0-test + latest | No -- latest overwrites on next deploy |
| Lambda images in ECR | Same as above | No |
| Lambda function code | update-function-code points to new image | No -- next deploy overwrites |
| CodeArtifact wheel | Published with --skip-existing | No -- idempotent |
| ECS service | force-new-deployment | No -- in dev this is a no-op |
Cleanup¶
Delete the test git tag only:
Cost¶
These workflows only trigger on v* tags. PRs only run CI with smart change detection. A full E2E test tag run takes approximately 15 minutes of GitHub Actions compute.
Troubleshooting¶
Pipeline Failures¶
| Stage | Common Issue | Resolution |
|---|---|---|
| Lint | Formatting issues | just fmt (platform) or just fmt $STRATEGY (strategies) |
| Typecheck | Missing types | Add type annotations, run uv run mypy --show-error-codes |
| Test | Fixture missing | Check conftest.py |
| Backtest | Data missing | Sync data first |
| Docker | Build fails | Check Dockerfile, verify build context |
| ECR | Push denied | Check AWS credentials and IAM permissions |
| MLflow | Register fails | Check MLflow connection |
| Lambda discovery | No lambdas found | Ensure lambdas/<name>/Dockerfile exists |
CodeArtifact Token Expired¶
ECR Push Permission Denied¶
# Check AWS credentials
aws sts get-caller-identity
# Verify ECR permissions
aws ecr describe-repositories --repository-names tradai/$SERVICE
# Required IAM permissions
# ecr:GetAuthorizationToken, ecr:BatchCheckLayerAvailability, ecr:PutImage
Cache Issues¶
# GitHub Actions uses UV cache at ~/.cache/uv
# Cache key includes uv.lock hash -- changing uv.lock invalidates cache automatically
Pulumi State Lock¶
VERSION=main instead of tag¶
In workflow_run context, GITHUB_REF_NAME returns main, not the tag. Workflows must use github.event.workflow_run.head_branch to extract the version. If you see images tagged main instead of a version, check this extraction logic.
Failed Lambda Discovery¶
# Check lambdas directory structure
ls -la lambdas/
# Each lambda needs:
# lambdas/<name>/Dockerfile
# lambdas/<name>/handler.py (or similar entry point)
Related Documentation¶
- Strategy Lifecycle - Full development workflow
- Strategy Repo Guide - Repository setup
- Pulumi Deployment - Infrastructure (ECR, CodeArtifact)
- Pulumi Operations - Day-to-day infrastructure operations