TradAI Testing Strategy¶

Version: 1.0.0 | Date: 2026-03-28 | Status: CURRENT Source: tests/, pyproject.toml, justfile, .devcontainer/

1. TL;DR¶

18 test suite categories, 14 pytest markers, 80% coverage minimum. Parallel execution via pytest-xdist. LocalStack for AWS mocking. Devcontainer for reproducible environments with weekly CI validation.

2. Test Pyramid¶

graph TB
    subgraph pyramid["Test Pyramid"]
        direction TB
        PERF["Performance / Load<br/><i>Locust, pytest-benchmark</i><br/>minutes"]
        E2E["E2E<br/><i>Full workflows, Docker services</i><br/>30-120s"]
        CONTRACT["Contract<br/><i>API schema, backward compat</i><br/>5-15s"]
        INTEGRATION["Integration<br/><i>Service-to-service, LocalStack</i><br/>5-30s"]
        UNIT["Unit<br/><i>Isolated, fast, many</i><br/>&lt;1s each"]
    end

    PERF --> E2E
    E2E --> CONTRACT
    CONTRACT --> INTEGRATION
    INTEGRATION --> UNIT

    style UNIT fill:#4CAF50,color:#fff
    style INTEGRATION fill:#8BC34A,color:#fff
    style CONTRACT fill:#FFC107,color:#000
    style E2E fill:#FF9800,color:#fff
    style PERF fill:#F44336,color:#fff

Guiding principle: Most tests should be unit tests (fast, isolated). Integration and E2E tests validate cross-boundary behavior. Performance and load tests run on demand or in scheduled CI.

3. Test Suite Categories¶

All suites live under tests/ at the repository root. Package-level tests live under libs/*/tests/ and services/*/tests/.

#	Category	Directory	Purpose	Marker	Services?	Typical Duration
1	Unit	`tests/unit/`	Isolated logic, no external deps	`unit`	No	<1s per test
2	Integration	`tests/integration/`	Service-to-service, AWS via LocalStack	`integration`	Yes	5-30s
3	Smoke	`tests/smoke/`	Quick sanity after deployment	`smoke`	Yes	2-5s
4	Contract	`tests/contract/`	API schema validation, backward compat	`contract`	Yes	5-15s
5	Financial	`tests/financial/`	P&L precision, rounding, currency calcs	`financial`	No	<5s
6	Backtest Validation	`tests/backtest_validation/`	Look-ahead bias, reproducibility	`backtest_validation`	No	5-30s
7	E2E	`tests/e2e/`	Full user workflows end-to-end	`e2e`	Yes	30-120s
8	Security	`tests/security/`	Injection, XSS, path traversal, input validation	`security`	Yes	5-15s
9	Data Quality	`tests/data_quality/`	Completeness, accuracy, consistency	`data_quality`	No	5-15s
10	Resilience	`tests/resilience/`	Circuit breaker, retry, timeout behavior	`resilience`	Yes	10-30s
11	Concurrency	`tests/concurrency/`	Thread safety, race conditions	`concurrency`	Yes	10-30s
12	Regression	`tests/regression/`	Previously fixed bugs	`regression`	No	<10s
13	Performance	`tests/performance/`	Duration and memory regression tests	`performance`	No	10-60s
14	Slow	`tests/slow/`	Real Freqtrade backtest execution	`slow`	No	1-5 min
15	Load	`tests/load/`	Locust-based throughput/latency tests	N/A	Yes	1-10 min
16	Fixtures	`tests/fixtures/`	Shared test data (strategies, configs)	N/A	N/A	N/A
17	Lambda Unit	`tests/unit/lambdas/`	Lambda handler unit tests	`unit`	No	<5s
18	FreqAI	`libs/tradai-strategy/tests/freqai/`	ML model tests (LightGBM, XGBoost, CatBoost)	N/A	No	10-60s

4. Pytest Markers¶

Registered in pyproject.toml under [tool.pytest.ini_options]. All markers are enforced via --strict-markers.

Marker	Description	Default CI Behavior
`unit`	Isolated, no external dependencies	Included
`integration`	Requires running services or LocalStack	Excluded from fast runs
`slow`	May take >1s; real backtests require `RUN_SLOW_TESTS=1`	Excluded
`smoke`	Quick sanity checks, run pre-deployment	Included
`financial`	P&L calculations, decimal precision	Included
`backtest_validation`	Look-ahead bias, reproducibility checks	Included
`e2e`	Complete user workflows	Excluded from fast runs
`contract`	API schema validation, backward compatibility	Included
`security`	Injection, XSS, input validation	Included
`data_quality`	Data completeness, accuracy, consistency	Included
`resilience`	Circuit breaker, retry, timeout	Included
`concurrency`	Thread safety, race conditions	Included
`regression`	Previously fixed bugs	Included
`performance`	Duration and memory regression benchmarks	Excluded from fast runs

Devcontainer CI exclusions

The weekly devcontainer CI job (devcontainer-ci.yml) runs with -m "not slow and not e2e and not performance and not integration" to keep the validation under 30 minutes.

5. Quality Gates¶

Command	What It Runs	When to Use
`just check`	Lint + typecheck + env-access check + LocalStack tests	Before every PR / push
`just check-quick`	Lint + typecheck + env-access check (no tests)	Quick iteration during development
`just check-full`	Lint + typecheck + all tests + strategy validation	Pre-release, comprehensive validation
`just test`	All tests (parallel, no coverage)	General test run
`just test-fast`	Excludes `integration`, `e2e`, `slow`, `performance`	Rapid feedback loop
`just test-strict`	All tests with `--cov` and HTML/XML reports	CI coverage enforcement
`just test-cov`	All tests with coverage (no fail threshold)	Check coverage without gating
`just test-package <pkg>`	Tests for a single workspace package	Focused package work
`just test-suite <suite>`	Single suite by name (e.g., `financial`, `security`)	Targeted validation
`just test-integration`	Integration tests only (marker-filtered)	After service changes
`just test-slow`	Real Freqtrade backtests (`RUN_SLOW_TESTS=1`)	Backtest validation
`just test-lambdas`	Lambda handler unit tests	After Lambda changes
`just test-freqai`	FreqAI ML model tests	After ML lib changes
`just test-durations`	Profile slowest tests (top 25 at >1s)	Performance investigation

Slow test gate

just test-slow sets RUN_SLOW_TESTS=1 automatically. When running manually, you must export this variable: RUN_SLOW_TESTS=1 uv run pytest tests/slow/ -v -m slow --no-cov. These tests require Freqtrade installed and OHLCV data files in user_data/data/.

6. Test Infrastructure¶

6.1 LocalStack¶

LocalStack provides local AWS service mocking for integration and unit tests that interact with AWS APIs.

Mocked services: S3, DynamoDB, SQS, SNS, ECR, Lambda

Start command:

just up-localstack
# or directly:
docker compose --profile localstack up -d

Environment variables: LOCALSTACK_URL (default http://localhost:4566), AWS_ENDPOINT_URL (set by just check to redirect boto3), AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY (testing), AWS_REGION (us-east-1).

The root conftest.py provides session-scoped fixtures that auto-start LocalStack and create test buckets (e.g., tradai-configs-test). Cleanup runs only on pytest-xdist worker gw0 to avoid race conditions during bucket teardown.

6.2 Devcontainer¶

The devcontainer provides a fully reproducible development environment with all tools pre-installed.

Configuration: .devcontainer/devcontainer.json

Base image: ghcr.io/astral-sh/uv:python3.11-bookworm-slim

What's included:

Component	Version / Details
Python	3.11 (from base image)
UV	From base image
Just	1.42.4
Pulumi	3.210.0
Docker CLI + Compose	From Docker official repo
PostgreSQL client	For DB debugging
OpenMP (`libgomp1`)	Required by LightGBM/XGBoost
Git + LFS	Version control

Pre-built images: The devcontainer-prebuild.yml workflow builds and pushes a cached image to GHCR on every push to main that touches .devcontainer/ files, plus weekly rebuilds (Sunday 00:00 UTC) for base-image security updates. Local builds pull cached layers via cacheFrom.

Weekly CI validation: The devcontainer-ci.yml workflow runs every Sunday at 02:00 UTC. It builds the devcontainer image, syncs dependencies, and runs the full test suite (excluding slow, e2e, performance, integration) inside the container. This catches environment drift between the devcontainer and bare-runner CI.

Forwarded ports: 8000 (Backend), 8002 (Data Collection), 8003 (Strategy), 5433 (PostgreSQL), 5001 (MLflow), 4566 (LocalStack), 6379 (Redis).

6.3 Git Worktree Testing¶

When running tests in a git worktree (e.g., .claude/worktrees/<name>), Python imports may resolve to the main repo instead of the worktree's modified source files due to UV's editable install .pth files.

Fix: Prepend worktree source paths to PYTHONPATH:

wt-test() {
    local WT="$(pwd)"
    PYTHONPATH="$WT/libs/tradai-common/src:$WT/libs/tradai-data/src:..." \
    uv run pytest "$@"
}

Full documentation: docs/ai/worktree-testing.md

Worktree caveat

just check and just test do not set PYTHONPATH -- they run against main repo source. Use the wt-test helper or the one-liner from docs/ai/worktree-testing.md instead.

7. Coverage Configuration¶

Coverage is configured in pyproject.toml under [tool.coverage.run] and [tool.coverage.report].

Target: 80% minimum (enforced in just test-strict via --cov-fail-under)

Source directories:

[tool.coverage.run]
source = ["libs", "services", "cli"]
omit = ["*/tests/*", "*/__pycache__/*", "*/.venv/*"]

Reports generated:

Format	Output	Purpose
Terminal	`--cov-report=term-missing`	Quick review of uncovered lines
HTML	`--cov-report=html`	`htmlcov/` directory for detailed browsing
XML	`--cov-report=xml`	CI integration (e.g., Codecov upload)

Excluded lines:

[tool.coverage.report]
exclude_lines = [
    "pragma: no cover",
    "def __repr__",
    "if __name__ == .__main__.:",
    "raise AssertionError",
    "raise NotImplementedError",
    "if TYPE_CHECKING:",
]

Note: Most just test commands pass --no-cov to disable coverage for faster feedback. Use just test-cov or just test-strict explicitly when coverage data is needed.

8. Specialized Test Types¶

8.1 Financial Tests¶

Directory: tests/financial/ | Marker: financial

Financial domain tests validate monetary precision and correctness:

P&L calculations using Decimal types (not float) for exact arithmetic
Fee computations with configurable rate precision
Trade entry/exit price validation
Leverage and position sizing accuracy

The root conftest.py provides sample_trade (with Decimal prices, quantities, fees), sample_ohlcv_df, invalid_ohlcv_df (high < low), and ohlcv_with_gaps (missing candles) fixtures.

8.2 Backtest Validation¶

Directory: tests/backtest_validation/ | Marker: backtest_validation

Validates backtest integrity to prevent common quantitative research pitfalls:

Look-ahead bias detection -- ensures strategies only use data available at decision time
Reproducibility -- same config + data must produce identical results across runs
Strategy compliance -- validates that strategy configs match expected schemas
Result sanity -- checks profit factor, Sharpe ratio, drawdown within realistic bounds

8.3 Performance Regression Tests¶

Directory: tests/performance/ | Marker: performance

Uses pytest-benchmark and a custom Timer context manager to detect execution-time and memory regressions.

Baseline management: Baseline metrics are stored in tests/performance/baselines.json. The assert_no_regression fixture compares current measurements against baselines with a configurable threshold (default: 20% degradation tolerance).

Memory tracking: Uses memory-profiler (dev dependency) for memory usage assertions.

8.4 Load Testing¶

Directory: tests/load/ | Framework: Locust

Locust-based load tests simulate realistic traffic patterns against all three services.

Available user classes:

Class	Target	Source
`BackendUser`	Backend API (`:8000`)	`tests/load/services/backend.py`
`DataCollectionUser`	Data Collection (`:8002`)	`tests/load/services/data_collection.py`
`StrategyServiceUser`	Strategy Service (`:8003`)	`tests/load/services/strategy_service.py`
`BacktestWorkflowUser`	E2E backtest workflow	`tests/load/scenarios/backtest_workflow.py`

Commands:

just load-test                          # Web UI at localhost:8089
just load-test-ci 10 2 1m              # Headless: 10 users, 2/s spawn, 1 min
just load-test-all                      # All services sequentially
just load-test-backend 5 30s           # Backend only
just load-test-data-collection 5 30s   # Data collection only
just load-test-strategy 5 30s          # Strategy service only

Reports are written to reports/load-test-*.html.

8.5 Security Tests¶

Directory: tests/security/ | Marker: security

The root conftest.py provides injection payload fixtures:

sql_injection_payloads -- 5 SQL injection vectors
xss_payloads -- 5 XSS vectors
path_traversal_payloads -- 4 path traversal vectors

Tests validate that all API endpoints reject malicious input without exposing internal errors.

9. Root Conftest Hooks¶

The root tests/conftest.py provides shared fixtures across all test categories.

Key session-scoped fixtures

docker_compose_up auto-starts Docker services; wait_for_services polls health endpoints for up to 120s; localstack_up starts LocalStack via --profile localstack; localstack_s3_config_bucket (autouse) creates the S3 config bucket.

HTTP client fixtures: backend_client, data_collection_client, strategy_service_client (sync httpx.Client), async_backend_client (async).

Skip conditions: skip_slow_tests (needs RUN_SLOW_TESTS=1), skip_if_no_services (Docker down), skip_if_ci (CI=true), skip_if_no_localstack (health endpoint unreachable).

Data generator: generate_ohlcv_data() produces synthetic OHLCV DataFrames with configurable symbol, periods, frequency, and base price. Uses numpy with seed 42 for reproducibility.

10. Changelog¶

Version	Date	Changes
1.0.0	2026-03-28	Initial document

11. Dependencies¶

If This Changes	Update This Doc
New test suite added to `tests/`	Section 3 (Test Suite Categories)
New pytest marker in `pyproject.toml`	Section 4 (Pytest Markers)
New `just test-*` command in `justfile`	Section 5 (Quality Gates)
Devcontainer base image or tooling changes	Section 6.2 (Devcontainer)
Coverage config changes in `pyproject.toml`	Section 7 (Coverage Configuration)
New Locust user class added	Section 8.4 (Load Testing)
Root `conftest.py` fixture changes	Section 9 (Root Conftest Fixtures)