Skip to content

TradAI Testing Strategy

Version: 1.0.0 | Date: 2026-03-28 | Status: CURRENT Source: tests/, pyproject.toml, justfile, .devcontainer/


1. TL;DR

18 test suite categories, 14 pytest markers, 80% coverage minimum. Parallel execution via pytest-xdist. LocalStack for AWS mocking. Devcontainer for reproducible environments with weekly CI validation.


2. Test Pyramid

graph TB
    subgraph pyramid["Test Pyramid"]
        direction TB
        PERF["Performance / Load<br/><i>Locust, pytest-benchmark</i><br/>minutes"]
        E2E["E2E<br/><i>Full workflows, Docker services</i><br/>30-120s"]
        CONTRACT["Contract<br/><i>API schema, backward compat</i><br/>5-15s"]
        INTEGRATION["Integration<br/><i>Service-to-service, LocalStack</i><br/>5-30s"]
        UNIT["Unit<br/><i>Isolated, fast, many</i><br/>&lt;1s each"]
    end

    PERF --> E2E
    E2E --> CONTRACT
    CONTRACT --> INTEGRATION
    INTEGRATION --> UNIT

    style UNIT fill:#4CAF50,color:#fff
    style INTEGRATION fill:#8BC34A,color:#fff
    style CONTRACT fill:#FFC107,color:#000
    style E2E fill:#FF9800,color:#fff
    style PERF fill:#F44336,color:#fff

Guiding principle: Most tests should be unit tests (fast, isolated). Integration and E2E tests validate cross-boundary behavior. Performance and load tests run on demand or in scheduled CI.


3. Test Suite Categories

All suites live under tests/ at the repository root. Package-level tests live under libs/*/tests/ and services/*/tests/.

# Category Directory Purpose Marker Services? Typical Duration
1 Unit tests/unit/ Isolated logic, no external deps unit No <1s per test
2 Integration tests/integration/ Service-to-service, AWS via LocalStack integration Yes 5-30s
3 Smoke tests/smoke/ Quick sanity after deployment smoke Yes 2-5s
4 Contract tests/contract/ API schema validation, backward compat contract Yes 5-15s
5 Financial tests/financial/ P&L precision, rounding, currency calcs financial No <5s
6 Backtest Validation tests/backtest_validation/ Look-ahead bias, reproducibility backtest_validation No 5-30s
7 E2E tests/e2e/ Full user workflows end-to-end e2e Yes 30-120s
8 Security tests/security/ Injection, XSS, path traversal, input validation security Yes 5-15s
9 Data Quality tests/data_quality/ Completeness, accuracy, consistency data_quality No 5-15s
10 Resilience tests/resilience/ Circuit breaker, retry, timeout behavior resilience Yes 10-30s
11 Concurrency tests/concurrency/ Thread safety, race conditions concurrency Yes 10-30s
12 Regression tests/regression/ Previously fixed bugs regression No <10s
13 Performance tests/performance/ Duration and memory regression tests performance No 10-60s
14 Slow tests/slow/ Real Freqtrade backtest execution slow No 1-5 min
15 Load tests/load/ Locust-based throughput/latency tests N/A Yes 1-10 min
16 Fixtures tests/fixtures/ Shared test data (strategies, configs) N/A N/A N/A
17 Lambda Unit tests/unit/lambdas/ Lambda handler unit tests unit No <5s
18 FreqAI libs/tradai-strategy/tests/freqai/ ML model tests (LightGBM, XGBoost, CatBoost) N/A No 10-60s

4. Pytest Markers

Registered in pyproject.toml under [tool.pytest.ini_options]. All markers are enforced via --strict-markers.

Marker Description Default CI Behavior
unit Isolated, no external dependencies Included
integration Requires running services or LocalStack Excluded from fast runs
slow May take >1s; real backtests require RUN_SLOW_TESTS=1 Excluded
smoke Quick sanity checks, run pre-deployment Included
financial P&L calculations, decimal precision Included
backtest_validation Look-ahead bias, reproducibility checks Included
e2e Complete user workflows Excluded from fast runs
contract API schema validation, backward compatibility Included
security Injection, XSS, input validation Included
data_quality Data completeness, accuracy, consistency Included
resilience Circuit breaker, retry, timeout Included
concurrency Thread safety, race conditions Included
regression Previously fixed bugs Included
performance Duration and memory regression benchmarks Excluded from fast runs

Devcontainer CI exclusions

The weekly devcontainer CI job (devcontainer-ci.yml) runs with -m "not slow and not e2e and not performance and not integration" to keep the validation under 30 minutes.


5. Quality Gates

Command What It Runs When to Use
just check Lint + typecheck + env-access check + LocalStack tests Before every PR / push
just check-quick Lint + typecheck + env-access check (no tests) Quick iteration during development
just check-full Lint + typecheck + all tests + strategy validation Pre-release, comprehensive validation
just test All tests (parallel, no coverage) General test run
just test-fast Excludes integration, e2e, slow, performance Rapid feedback loop
just test-strict All tests with --cov and HTML/XML reports CI coverage enforcement
just test-cov All tests with coverage (no fail threshold) Check coverage without gating
just test-package <pkg> Tests for a single workspace package Focused package work
just test-suite <suite> Single suite by name (e.g., financial, security) Targeted validation
just test-integration Integration tests only (marker-filtered) After service changes
just test-slow Real Freqtrade backtests (RUN_SLOW_TESTS=1) Backtest validation
just test-lambdas Lambda handler unit tests After Lambda changes
just test-freqai FreqAI ML model tests After ML lib changes
just test-durations Profile slowest tests (top 25 at >1s) Performance investigation

Slow test gate

just test-slow sets RUN_SLOW_TESTS=1 automatically. When running manually, you must export this variable: RUN_SLOW_TESTS=1 uv run pytest tests/slow/ -v -m slow --no-cov. These tests require Freqtrade installed and OHLCV data files in user_data/data/.


6. Test Infrastructure

6.1 LocalStack

LocalStack provides local AWS service mocking for integration and unit tests that interact with AWS APIs.

Mocked services: S3, DynamoDB, SQS, SNS, ECR, Lambda

Start command:

just up-localstack
# or directly:
docker compose --profile localstack up -d

Environment variables: LOCALSTACK_URL (default http://localhost:4566), AWS_ENDPOINT_URL (set by just check to redirect boto3), AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY (testing), AWS_REGION (us-east-1).

The root conftest.py provides session-scoped fixtures that auto-start LocalStack and create test buckets (e.g., tradai-configs-test). Cleanup runs only on pytest-xdist worker gw0 to avoid race conditions during bucket teardown.

6.2 Devcontainer

The devcontainer provides a fully reproducible development environment with all tools pre-installed.

Configuration: .devcontainer/devcontainer.json

Base image: ghcr.io/astral-sh/uv:python3.11-bookworm-slim

What's included:

Component Version / Details
Python 3.11 (from base image)
UV From base image
Just 1.42.4
Pulumi 3.210.0
Docker CLI + Compose From Docker official repo
PostgreSQL client For DB debugging
OpenMP (libgomp1) Required by LightGBM/XGBoost
Git + LFS Version control

Pre-built images: The devcontainer-prebuild.yml workflow builds and pushes a cached image to GHCR on every push to main that touches .devcontainer/ files, plus weekly rebuilds (Sunday 00:00 UTC) for base-image security updates. Local builds pull cached layers via cacheFrom.

Weekly CI validation: The devcontainer-ci.yml workflow runs every Sunday at 02:00 UTC. It builds the devcontainer image, syncs dependencies, and runs the full test suite (excluding slow, e2e, performance, integration) inside the container. This catches environment drift between the devcontainer and bare-runner CI.

Forwarded ports: 8000 (Backend), 8002 (Data Collection), 8003 (Strategy), 5433 (PostgreSQL), 5001 (MLflow), 4566 (LocalStack), 6379 (Redis).

6.3 Git Worktree Testing

When running tests in a git worktree (e.g., .claude/worktrees/<name>), Python imports may resolve to the main repo instead of the worktree's modified source files due to UV's editable install .pth files.

Fix: Prepend worktree source paths to PYTHONPATH:

wt-test() {
    local WT="$(pwd)"
    PYTHONPATH="$WT/libs/tradai-common/src:$WT/libs/tradai-data/src:..." \
    uv run pytest "$@"
}

Full documentation: docs/ai/worktree-testing.md

Worktree caveat

just check and just test do not set PYTHONPATH -- they run against main repo source. Use the wt-test helper or the one-liner from docs/ai/worktree-testing.md instead.


7. Coverage Configuration

Coverage is configured in pyproject.toml under [tool.coverage.run] and [tool.coverage.report].

Target: 80% minimum (enforced in just test-strict via --cov-fail-under)

Source directories:

[tool.coverage.run]
source = ["libs", "services", "cli"]
omit = ["*/tests/*", "*/__pycache__/*", "*/.venv/*"]

Reports generated:

Format Output Purpose
Terminal --cov-report=term-missing Quick review of uncovered lines
HTML --cov-report=html htmlcov/ directory for detailed browsing
XML --cov-report=xml CI integration (e.g., Codecov upload)

Excluded lines:

[tool.coverage.report]
exclude_lines = [
    "pragma: no cover",
    "def __repr__",
    "if __name__ == .__main__.:",
    "raise AssertionError",
    "raise NotImplementedError",
    "if TYPE_CHECKING:",
]

Note: Most just test commands pass --no-cov to disable coverage for faster feedback. Use just test-cov or just test-strict explicitly when coverage data is needed.


8. Specialized Test Types

8.1 Financial Tests

Directory: tests/financial/ | Marker: financial

Financial domain tests validate monetary precision and correctness:

  • P&L calculations using Decimal types (not float) for exact arithmetic
  • Fee computations with configurable rate precision
  • Trade entry/exit price validation
  • Leverage and position sizing accuracy

The root conftest.py provides sample_trade (with Decimal prices, quantities, fees), sample_ohlcv_df, invalid_ohlcv_df (high < low), and ohlcv_with_gaps (missing candles) fixtures.

8.2 Backtest Validation

Directory: tests/backtest_validation/ | Marker: backtest_validation

Validates backtest integrity to prevent common quantitative research pitfalls:

  • Look-ahead bias detection -- ensures strategies only use data available at decision time
  • Reproducibility -- same config + data must produce identical results across runs
  • Strategy compliance -- validates that strategy configs match expected schemas
  • Result sanity -- checks profit factor, Sharpe ratio, drawdown within realistic bounds

8.3 Performance Regression Tests

Directory: tests/performance/ | Marker: performance

Uses pytest-benchmark and a custom Timer context manager to detect execution-time and memory regressions.

Baseline management: Baseline metrics are stored in tests/performance/baselines.json. The assert_no_regression fixture compares current measurements against baselines with a configurable threshold (default: 20% degradation tolerance).

Memory tracking: Uses memory-profiler (dev dependency) for memory usage assertions.

8.4 Load Testing

Directory: tests/load/ | Framework: Locust

Locust-based load tests simulate realistic traffic patterns against all three services.

Available user classes:

Class Target Source
BackendUser Backend API (:8000) tests/load/services/backend.py
DataCollectionUser Data Collection (:8002) tests/load/services/data_collection.py
StrategyServiceUser Strategy Service (:8003) tests/load/services/strategy_service.py
BacktestWorkflowUser E2E backtest workflow tests/load/scenarios/backtest_workflow.py

Commands:

just load-test                          # Web UI at localhost:8089
just load-test-ci 10 2 1m              # Headless: 10 users, 2/s spawn, 1 min
just load-test-all                      # All services sequentially
just load-test-backend 5 30s           # Backend only
just load-test-data-collection 5 30s   # Data collection only
just load-test-strategy 5 30s          # Strategy service only

Reports are written to reports/load-test-*.html.

8.5 Security Tests

Directory: tests/security/ | Marker: security

The root conftest.py provides injection payload fixtures:

  • sql_injection_payloads -- 5 SQL injection vectors
  • xss_payloads -- 5 XSS vectors
  • path_traversal_payloads -- 4 path traversal vectors

Tests validate that all API endpoints reject malicious input without exposing internal errors.


9. Root Conftest Hooks

The root tests/conftest.py provides shared fixtures across all test categories.

Key session-scoped fixtures

docker_compose_up auto-starts Docker services; wait_for_services polls health endpoints for up to 120s; localstack_up starts LocalStack via --profile localstack; localstack_s3_config_bucket (autouse) creates the S3 config bucket.

HTTP client fixtures: backend_client, data_collection_client, strategy_service_client (sync httpx.Client), async_backend_client (async).

Skip conditions: skip_slow_tests (needs RUN_SLOW_TESTS=1), skip_if_no_services (Docker down), skip_if_ci (CI=true), skip_if_no_localstack (health endpoint unreachable).

Data generator: generate_ohlcv_data() produces synthetic OHLCV DataFrames with configurable symbol, periods, frequency, and base price. Uses numpy with seed 42 for reproducibility.


10. Changelog

Version Date Changes
1.0.0 2026-03-28 Initial document

11. Dependencies

If This Changes Update This Doc
New test suite added to tests/ Section 3 (Test Suite Categories)
New pytest marker in pyproject.toml Section 4 (Pytest Markers)
New just test-* command in justfile Section 5 (Quality Gates)
Devcontainer base image or tooling changes Section 6.2 (Devcontainer)
Coverage config changes in pyproject.toml Section 7 (Coverage Configuration)
New Locust user class added Section 8.4 (Load Testing)
Root conftest.py fixture changes Section 9 (Root Conftest Fixtures)