TradAI Testing Strategy¶
Version: 1.0.0 | Date: 2026-03-28 | Status: CURRENT Source: tests/, pyproject.toml, justfile, .devcontainer/
1. TL;DR¶
18 test suite categories, 14 pytest markers, 80% coverage minimum. Parallel execution via pytest-xdist. LocalStack for AWS mocking. Devcontainer for reproducible environments with weekly CI validation.
2. Test Pyramid¶
graph TB
subgraph pyramid["Test Pyramid"]
direction TB
PERF["Performance / Load<br/><i>Locust, pytest-benchmark</i><br/>minutes"]
E2E["E2E<br/><i>Full workflows, Docker services</i><br/>30-120s"]
CONTRACT["Contract<br/><i>API schema, backward compat</i><br/>5-15s"]
INTEGRATION["Integration<br/><i>Service-to-service, LocalStack</i><br/>5-30s"]
UNIT["Unit<br/><i>Isolated, fast, many</i><br/><1s each"]
end
PERF --> E2E
E2E --> CONTRACT
CONTRACT --> INTEGRATION
INTEGRATION --> UNIT
style UNIT fill:#4CAF50,color:#fff
style INTEGRATION fill:#8BC34A,color:#fff
style CONTRACT fill:#FFC107,color:#000
style E2E fill:#FF9800,color:#fff
style PERF fill:#F44336,color:#fff Guiding principle: Most tests should be unit tests (fast, isolated). Integration and E2E tests validate cross-boundary behavior. Performance and load tests run on demand or in scheduled CI.
3. Test Suite Categories¶
All suites live under tests/ at the repository root. Package-level tests live under libs/*/tests/ and services/*/tests/.
| # | Category | Directory | Purpose | Marker | Services? | Typical Duration |
|---|---|---|---|---|---|---|
| 1 | Unit | tests/unit/ | Isolated logic, no external deps | unit | No | <1s per test |
| 2 | Integration | tests/integration/ | Service-to-service, AWS via LocalStack | integration | Yes | 5-30s |
| 3 | Smoke | tests/smoke/ | Quick sanity after deployment | smoke | Yes | 2-5s |
| 4 | Contract | tests/contract/ | API schema validation, backward compat | contract | Yes | 5-15s |
| 5 | Financial | tests/financial/ | P&L precision, rounding, currency calcs | financial | No | <5s |
| 6 | Backtest Validation | tests/backtest_validation/ | Look-ahead bias, reproducibility | backtest_validation | No | 5-30s |
| 7 | E2E | tests/e2e/ | Full user workflows end-to-end | e2e | Yes | 30-120s |
| 8 | Security | tests/security/ | Injection, XSS, path traversal, input validation | security | Yes | 5-15s |
| 9 | Data Quality | tests/data_quality/ | Completeness, accuracy, consistency | data_quality | No | 5-15s |
| 10 | Resilience | tests/resilience/ | Circuit breaker, retry, timeout behavior | resilience | Yes | 10-30s |
| 11 | Concurrency | tests/concurrency/ | Thread safety, race conditions | concurrency | Yes | 10-30s |
| 12 | Regression | tests/regression/ | Previously fixed bugs | regression | No | <10s |
| 13 | Performance | tests/performance/ | Duration and memory regression tests | performance | No | 10-60s |
| 14 | Slow | tests/slow/ | Real Freqtrade backtest execution | slow | No | 1-5 min |
| 15 | Load | tests/load/ | Locust-based throughput/latency tests | N/A | Yes | 1-10 min |
| 16 | Fixtures | tests/fixtures/ | Shared test data (strategies, configs) | N/A | N/A | N/A |
| 17 | Lambda Unit | tests/unit/lambdas/ | Lambda handler unit tests | unit | No | <5s |
| 18 | FreqAI | libs/tradai-strategy/tests/freqai/ | ML model tests (LightGBM, XGBoost, CatBoost) | N/A | No | 10-60s |
4. Pytest Markers¶
Registered in pyproject.toml under [tool.pytest.ini_options]. All markers are enforced via --strict-markers.
| Marker | Description | Default CI Behavior |
|---|---|---|
unit | Isolated, no external dependencies | Included |
integration | Requires running services or LocalStack | Excluded from fast runs |
slow | May take >1s; real backtests require RUN_SLOW_TESTS=1 | Excluded |
smoke | Quick sanity checks, run pre-deployment | Included |
financial | P&L calculations, decimal precision | Included |
backtest_validation | Look-ahead bias, reproducibility checks | Included |
e2e | Complete user workflows | Excluded from fast runs |
contract | API schema validation, backward compatibility | Included |
security | Injection, XSS, input validation | Included |
data_quality | Data completeness, accuracy, consistency | Included |
resilience | Circuit breaker, retry, timeout | Included |
concurrency | Thread safety, race conditions | Included |
regression | Previously fixed bugs | Included |
performance | Duration and memory regression benchmarks | Excluded from fast runs |
Devcontainer CI exclusions
The weekly devcontainer CI job (devcontainer-ci.yml) runs with -m "not slow and not e2e and not performance and not integration" to keep the validation under 30 minutes.
5. Quality Gates¶
| Command | What It Runs | When to Use |
|---|---|---|
just check | Lint + typecheck + env-access check + LocalStack tests | Before every PR / push |
just check-quick | Lint + typecheck + env-access check (no tests) | Quick iteration during development |
just check-full | Lint + typecheck + all tests + strategy validation | Pre-release, comprehensive validation |
just test | All tests (parallel, no coverage) | General test run |
just test-fast | Excludes integration, e2e, slow, performance | Rapid feedback loop |
just test-strict | All tests with --cov and HTML/XML reports | CI coverage enforcement |
just test-cov | All tests with coverage (no fail threshold) | Check coverage without gating |
just test-package <pkg> | Tests for a single workspace package | Focused package work |
just test-suite <suite> | Single suite by name (e.g., financial, security) | Targeted validation |
just test-integration | Integration tests only (marker-filtered) | After service changes |
just test-slow | Real Freqtrade backtests (RUN_SLOW_TESTS=1) | Backtest validation |
just test-lambdas | Lambda handler unit tests | After Lambda changes |
just test-freqai | FreqAI ML model tests | After ML lib changes |
just test-durations | Profile slowest tests (top 25 at >1s) | Performance investigation |
Slow test gate
just test-slow sets RUN_SLOW_TESTS=1 automatically. When running manually, you must export this variable: RUN_SLOW_TESTS=1 uv run pytest tests/slow/ -v -m slow --no-cov. These tests require Freqtrade installed and OHLCV data files in user_data/data/.
6. Test Infrastructure¶
6.1 LocalStack¶
LocalStack provides local AWS service mocking for integration and unit tests that interact with AWS APIs.
Mocked services: S3, DynamoDB, SQS, SNS, ECR, Lambda
Start command:
Environment variables: LOCALSTACK_URL (default http://localhost:4566), AWS_ENDPOINT_URL (set by just check to redirect boto3), AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY (testing), AWS_REGION (us-east-1).
The root conftest.py provides session-scoped fixtures that auto-start LocalStack and create test buckets (e.g., tradai-configs-test). Cleanup runs only on pytest-xdist worker gw0 to avoid race conditions during bucket teardown.
6.2 Devcontainer¶
The devcontainer provides a fully reproducible development environment with all tools pre-installed.
Configuration: .devcontainer/devcontainer.json
Base image: ghcr.io/astral-sh/uv:python3.11-bookworm-slim
What's included:
| Component | Version / Details |
|---|---|
| Python | 3.11 (from base image) |
| UV | From base image |
| Just | 1.42.4 |
| Pulumi | 3.210.0 |
| Docker CLI + Compose | From Docker official repo |
| PostgreSQL client | For DB debugging |
OpenMP (libgomp1) | Required by LightGBM/XGBoost |
| Git + LFS | Version control |
Pre-built images: The devcontainer-prebuild.yml workflow builds and pushes a cached image to GHCR on every push to main that touches .devcontainer/ files, plus weekly rebuilds (Sunday 00:00 UTC) for base-image security updates. Local builds pull cached layers via cacheFrom.
Weekly CI validation: The devcontainer-ci.yml workflow runs every Sunday at 02:00 UTC. It builds the devcontainer image, syncs dependencies, and runs the full test suite (excluding slow, e2e, performance, integration) inside the container. This catches environment drift between the devcontainer and bare-runner CI.
Forwarded ports: 8000 (Backend), 8002 (Data Collection), 8003 (Strategy), 5433 (PostgreSQL), 5001 (MLflow), 4566 (LocalStack), 6379 (Redis).
6.3 Git Worktree Testing¶
When running tests in a git worktree (e.g., .claude/worktrees/<name>), Python imports may resolve to the main repo instead of the worktree's modified source files due to UV's editable install .pth files.
Fix: Prepend worktree source paths to PYTHONPATH:
wt-test() {
local WT="$(pwd)"
PYTHONPATH="$WT/libs/tradai-common/src:$WT/libs/tradai-data/src:..." \
uv run pytest "$@"
}
Full documentation: docs/ai/worktree-testing.md
Worktree caveat
just check and just test do not set PYTHONPATH -- they run against main repo source. Use the wt-test helper or the one-liner from docs/ai/worktree-testing.md instead.
7. Coverage Configuration¶
Coverage is configured in pyproject.toml under [tool.coverage.run] and [tool.coverage.report].
Target: 80% minimum (enforced in just test-strict via --cov-fail-under)
Source directories:
[tool.coverage.run]
source = ["libs", "services", "cli"]
omit = ["*/tests/*", "*/__pycache__/*", "*/.venv/*"]
Reports generated:
| Format | Output | Purpose |
|---|---|---|
| Terminal | --cov-report=term-missing | Quick review of uncovered lines |
| HTML | --cov-report=html | htmlcov/ directory for detailed browsing |
| XML | --cov-report=xml | CI integration (e.g., Codecov upload) |
Excluded lines:
[tool.coverage.report]
exclude_lines = [
"pragma: no cover",
"def __repr__",
"if __name__ == .__main__.:",
"raise AssertionError",
"raise NotImplementedError",
"if TYPE_CHECKING:",
]
Note: Most just test commands pass --no-cov to disable coverage for faster feedback. Use just test-cov or just test-strict explicitly when coverage data is needed.
8. Specialized Test Types¶
8.1 Financial Tests¶
Directory: tests/financial/ | Marker: financial
Financial domain tests validate monetary precision and correctness:
- P&L calculations using
Decimaltypes (notfloat) for exact arithmetic - Fee computations with configurable rate precision
- Trade entry/exit price validation
- Leverage and position sizing accuracy
The root conftest.py provides sample_trade (with Decimal prices, quantities, fees), sample_ohlcv_df, invalid_ohlcv_df (high < low), and ohlcv_with_gaps (missing candles) fixtures.
8.2 Backtest Validation¶
Directory: tests/backtest_validation/ | Marker: backtest_validation
Validates backtest integrity to prevent common quantitative research pitfalls:
- Look-ahead bias detection -- ensures strategies only use data available at decision time
- Reproducibility -- same config + data must produce identical results across runs
- Strategy compliance -- validates that strategy configs match expected schemas
- Result sanity -- checks profit factor, Sharpe ratio, drawdown within realistic bounds
8.3 Performance Regression Tests¶
Directory: tests/performance/ | Marker: performance
Uses pytest-benchmark and a custom Timer context manager to detect execution-time and memory regressions.
Baseline management: Baseline metrics are stored in tests/performance/baselines.json. The assert_no_regression fixture compares current measurements against baselines with a configurable threshold (default: 20% degradation tolerance).
Memory tracking: Uses memory-profiler (dev dependency) for memory usage assertions.
8.4 Load Testing¶
Directory: tests/load/ | Framework: Locust
Locust-based load tests simulate realistic traffic patterns against all three services.
Available user classes:
| Class | Target | Source |
|---|---|---|
BackendUser | Backend API (:8000) | tests/load/services/backend.py |
DataCollectionUser | Data Collection (:8002) | tests/load/services/data_collection.py |
StrategyServiceUser | Strategy Service (:8003) | tests/load/services/strategy_service.py |
BacktestWorkflowUser | E2E backtest workflow | tests/load/scenarios/backtest_workflow.py |
Commands:
just load-test # Web UI at localhost:8089
just load-test-ci 10 2 1m # Headless: 10 users, 2/s spawn, 1 min
just load-test-all # All services sequentially
just load-test-backend 5 30s # Backend only
just load-test-data-collection 5 30s # Data collection only
just load-test-strategy 5 30s # Strategy service only
Reports are written to reports/load-test-*.html.
8.5 Security Tests¶
Directory: tests/security/ | Marker: security
The root conftest.py provides injection payload fixtures:
sql_injection_payloads-- 5 SQL injection vectorsxss_payloads-- 5 XSS vectorspath_traversal_payloads-- 4 path traversal vectors
Tests validate that all API endpoints reject malicious input without exposing internal errors.
9. Root Conftest Hooks¶
The root tests/conftest.py provides shared fixtures across all test categories.
Key session-scoped fixtures
docker_compose_up auto-starts Docker services; wait_for_services polls health endpoints for up to 120s; localstack_up starts LocalStack via --profile localstack; localstack_s3_config_bucket (autouse) creates the S3 config bucket.
HTTP client fixtures: backend_client, data_collection_client, strategy_service_client (sync httpx.Client), async_backend_client (async).
Skip conditions: skip_slow_tests (needs RUN_SLOW_TESTS=1), skip_if_no_services (Docker down), skip_if_ci (CI=true), skip_if_no_localstack (health endpoint unreachable).
Data generator: generate_ohlcv_data() produces synthetic OHLCV DataFrames with configurable symbol, periods, frequency, and base price. Uses numpy with seed 42 for reproducibility.
10. Changelog¶
| Version | Date | Changes |
|---|---|---|
| 1.0.0 | 2026-03-28 | Initial document |
11. Dependencies¶
| If This Changes | Update This Doc |
|---|---|
New test suite added to tests/ | Section 3 (Test Suite Categories) |
New pytest marker in pyproject.toml | Section 4 (Pytest Markers) |
New just test-* command in justfile | Section 5 (Quality Gates) |
| Devcontainer base image or tooling changes | Section 6.2 (Devcontainer) |
Coverage config changes in pyproject.toml | Section 7 (Coverage Configuration) |
| New Locust user class added | Section 8.4 (Load Testing) |
Root conftest.py fixture changes | Section 9 (Root Conftest Fixtures) |