tradai-common Design Document¶

Overview¶

tradai-common provides shared utilities, base classes, and AWS integrations for the TradAI platform. This library follows SOLID principles and applies DRY patterns throughout.

Architecture Decisions¶

What We Keep (Improved)¶

base_service.py - Base class for services with Hydra config loading
ADD: Docstrings, settings_class validation
KEEP: LoggerMixin composition pattern
base_settings.py - Pydantic settings with S3 support
ADD: Path validation, proper return types
KEEP: Strict mode, frozen entities
logger_mixin.py - Logging mixin for composition
KEEP: Already well-designed
common.py - Utility functions for time, quantization
KEEP: Excellent type hints
ADD: More validation where needed
cache.py - Caching mechanisms
KEEP: Already well-designed with Final types
mlflow_client.py - MLflow REST client
REMOVE: All test code (lines 530-550)
IMPROVE: Error handling

What We Fix (Security)¶

secrets_manager.py (rename from secrets_maneger.py)
FIX: Thread-safe with lock
FIX: Secret name from environment variables
FIX: Proper error handling
REMOVE: Test code
ecr_util.py - ECS job manager
FIX: All AWS config from environment variables
REMOVE: Hardcoded ARNs, subnets, security groups
REMOVE: Test code (lines 519-560)
KEEP: Async patterns, CloudWatch streaming

What We Add (SOLID)¶

exceptions.py - Custom exception hierarchy
NEW: TradAIError base class
NEW: ValidationError, NotFoundError, ConfigurationError, etc.
repositories.py - Repository pattern ABCs
NEW: Repository[T] generic base
NEW: DataRepository for market data
NEW: Dependency Inversion principle
s3_utils.py - S3 path parsing (DRY)
NEW: S3Path frozen dataclass
NEW: Centralized parsing logic

What We Skip (Too Complex/Refactor Later)¶

scope.py - Thread-unsafe, overly complex
SKIP: Will redesign with dependency injection later
REASON: 500+ lines, global state, tight coupling
log.py - 50+ methods, too complex
SKIP: Will simplify later if needed
REASON: Logging is already handled by logger_mixin
strategy_config_mixin.py - Service-specific logic
SKIP: Move to strategy service layer later
REASON: Violates Single Responsibility

Design Patterns Applied¶

1. Repository Pattern (Dependency Inversion)¶

# Abstract interface
class DataRepository(ABC):
    @abstractmethod
    def get_ohlcv(...) -> pd.DataFrame:
        pass

# Concrete implementation
class ArcticDBDataRepository(DataRepository):
    def get_ohlcv(...) -> pd.DataFrame:
        # Implementation

2. Frozen Entities (Immutability)¶

from pydantic import BaseModel


class Strategy(BaseModel):
    name: str
    version: str

    class Config:
        frozen = True  # Immutable

3. Mixin Pattern (Composition over Inheritance)¶

class LoggerMixin:
    @property
    def logger(self) -> logging.Logger:
        ...

class MyService(LoggerMixin):
    # Gets logging for free

3a. Settings Mixin Pattern (DRY Configuration)¶

Reusable Pydantic settings mixins for shared configuration fields across services:

from tradai.common.settings_mixins import ArcticSettingsMixin, MLflowSettingsMixin

# Mixin defines shared fields with sensible defaults
class ArcticSettingsMixin:
    arctic_s3_bucket: str = Field(default="", description="S3 bucket for ArcticDB")
    arctic_library: str = Field(default="futures", description="ArcticDB library name")
    arctic_s3_endpoint: str = Field(default="s3.us-east-1.amazonaws.com")
    arctic_region: str = Field(default="us-east-1")

# Services inherit and optionally override
class DataCollectionSettings(ArcticSettingsMixin, Settings):
    arctic_s3_bucket: str = Field(..., description="Required for data-collection")  # Override: required

class StrategyServiceSettings(ArcticSettingsMixin, MLflowSettingsMixin, Settings):
    # Inherits all Arctic + MLflow fields with defaults
    pass

Available mixins: - ArcticSettingsMixin: arctic_s3_bucket, arctic_library, arctic_s3_endpoint, arctic_region - MLflowSettingsMixin: mlflow_tracking_uri, mlflow_username, mlflow_password, mlflow_verify_ssl

4. Thread-Safe Singleton with lru_cache¶

from functools import lru_cache


@lru_cache(maxsize=1)
def get_secrets_client():
    return boto3.client("secretsmanager")

5. Factory Pattern (Already exists in source.py)¶

class SourceFactory:
    @staticmethod
    def create(kind: str) -> DataSource: ...

6. Handler Registry Pattern (entrypoint/base.py)¶

Enables dependency inversion for mode handlers, avoiding circular imports:

from tradai.common.entrypoint.base import StrategyEntrypoint
from tradai.common.entrypoint.settings import TradingMode
from tradai.common.protocols import ModeHandler

# Services register handlers using decorator
@StrategyEntrypoint.register_handler(TradingMode.BACKTEST)
class BacktestHandler:
    def __init__(self, entrypoint: StrategyEntrypoint) -> None:
        self._ep = entrypoint

    def run(self) -> int:
        # Execute backtest logic
        return 0

# Handler is invoked via registry lookup (no direct import)
handler_factory = StrategyEntrypoint.get_handler(TradingMode.BACKTEST)
handler = handler_factory(entrypoint)
return handler.run()

Benefits: - Breaks circular dependencies (common → strategy-service) - Services own their handlers - Base class remains generic

Module Organization¶

tradai/common/
├── __init__.py              # Public API exports
├── exceptions.py            # Custom exception hierarchy
├── protocols.py             # Shared protocols (ModeHandler, etc.)
├── settings.py              # Settings base class (Pydantic)
├── settings_mixins.py       # Reusable settings mixins (ArcticSettingsMixin, MLflowSettingsMixin)
├── logger_mixin.py          # Logging mixin (composition pattern)
├── tracing.py               # OpenTelemetry tracing utilities
├── validators.py            # Shared field validators (S3 bucket names, etc.)
├── types.py                 # Type aliases
├── warmup_loader.py         # Model warmup loading
├── aws/                     # AWS integrations
│   ├── __init__.py          # Re-exports (get_dynamodb_resource, etc.)
│   ├── _client_factory.py   # Cached boto3 client/resource factory
│   ├── cloudwatch_logs.py   # CloudWatch log retrieval
│   ├── cloudwatch_metrics.py # CloudWatch metrics publishing
│   ├── dynamodb/            # DynamoDB adapters (sync & async)
│   ├── ecs_deploy.py        # ECS deployment helper
│   ├── s3_utils.py          # S3 path parsing, upload/download
│   ├── async_s3_utils.py    # Async S3 utilities
│   ├── state_repository.py  # DynamoDB state repository
│   ├── step_functions.py    # Step Functions client
│   └── constants.py         # AWS constants
├── clients/                 # External service clients
│   ├── data_collection_client.py  # Data collection REST client
│   └── mlflow_client.py          # MLflow REST client with circuit breaker
├── repositories/            # Generic repository patterns
│   ├── base.py              # DynamoDBRepository base class
│   ├── codecs.py            # DynamoDB codec system
│   ├── crud.py              # Generic CRUD repository
│   ├── deployment.py        # Deployment repository
│   ├── in_memory.py         # InMemoryStatefulRepository (A/B test, shadow test)
│   ├── pagination.py        # Paginated query support
│   ├── protocols.py         # Repository protocols
│   ├── query.py             # Query builder
│   └── trading_state.py     # Trading state repository
├── entities/                # Domain entities (Pydantic, frozen)
│   ├── __init__.py          # Barrel exports
│   ├── aws.py               # S3Path, AWSConfig, JobStatus
│   ├── backtest.py          # BacktestConfig, BacktestResult, BacktestJobStatus
│   ├── config_version.py    # ConfigVersion, ConfigVersionStatus
│   ├── deployment.py        # DeploymentRecord, DeploymentStatus, DeploymentDiff
│   ├── exchange.py          # ExchangeConfig, TradingMode, OperatingMode
│   ├── hyperopt.py          # OptunaHyperoptConfig, TrialResult, HyperparamSearchSpace
│   ├── identifiers.py       # Arn, ExperimentName, TableName
│   ├── mlflow.py            # ModelVersion, RegisteredModel, ExperimentRun
│   ├── pagination.py        # PaginatedResponse
│   ├── retraining.py        # RetrainingState, RetrainingDecision
│   ├── strategy.py          # Strategy
│   └── trading_state.py     # TradingState, StrategyPnL, LiveTrade
├── entrypoint/              # Strategy execution entrypoints
│   ├── base.py              # StrategyEntrypoint with handler registry
│   ├── settings.py          # TradingMode, EntrypointSettings
│   └── training/            # Training entrypoint (extracted from monolithic handler)
│       ├── __init__.py      # Public API
│       ├── handler.py       # TrainHandler (orchestrator only)
│       ├── manifest_builder.py  # Config manifest construction
│       ├── metadata.py          # Training metadata extraction
│       ├── mlflow_reporter.py   # MLflow experiment logging
│       ├── model_registrar.py   # Model registration
│       └── result_parser.py     # Backtest result parsing
├── resilience/              # Resilience patterns
│   ├── circuit_breaker.py   # CircuitBreaker with state machine
│   ├── execution.py         # Resilient execution helpers
│   ├── policy.py            # ResiliencePolicy (retry + CB)
│   └── retry.py             # Retry with exponential backoff
├── auth/                    # Authentication utilities
│   ├── __init__.py          # JWT validator, credential reloader
│   └── fastapi_deps.py      # Reusable FastAPI auth dependencies
├── ab_testing/              # A/B testing framework
│   ├── entities.py          # ABTestConfig, ABTestState, ABTestResult
│   ├── manager.py           # ABTestManager lifecycle
│   ├── repository.py        # DynamoDB + in-memory repositories
│   ├── statistical_tester.py # Hypothesis testing (lazy-loaded)
│   └── traffic_router.py    # Request routing to variants
├── alerting/                # SNS-based alert management
│   ├── entities.py          # Alert, AlertContext, AlertSeverity
│   ├── service.py           # AlertService (SNS publisher)
│   └── webhook.py           # SlackNotifier, DiscordNotifier
├── config/                  # Strategy configuration management
│   ├── catalog.py           # StrategyConfig schema
│   ├── loader.py            # StrategyConfigLoader (S3/MLflow)
│   ├── merge.py             # ConfigMergeService (OmegaConf)
│   ├── repository.py        # ConfigVersionRepository (DynamoDB)
│   ├── service.py           # ConfigVersionService
│   └── validators.py        # ConfigValidator (mode-specific)
├── drift/                   # Model drift detection
│   ├── detector.py          # DriftDetector (PSI-based, lazy-loaded)
│   └── entities.py          # DriftResult, DriftSeverity, DriftThresholds
├── fastapi/                 # FastAPI utilities
│   ├── app_factory.py       # create_app(), AppConfig, create_lifespan()
│   ├── dependencies.py      # validated_resource_name(), get_correlation_id()
│   ├── di_helpers.py        # Dependency injection helpers
│   ├── error_schemas.py     # ErrorResponse, ValidationErrorResponse
│   └── middleware.py        # RequestTracingMiddleware
├── features/                # Feature engineering
│   ├── lineage.py           # DataLineage tracking
│   └── schema.py            # FeatureSchema, compute_dataframe_hash()
├── freqtrade/               # Freqtrade integration
│   ├── cli.py               # FreqtradeCLIBuilder
│   ├── registry.py          # FreqAIModelRegistry
│   └── runner.py            # FreqtradeRunner, FreqtradeBacktester
├── health/                  # Health check service
│   ├── aws_checkers.py      # DynamoDBHealthChecker, S3HealthChecker, etc.
│   ├── base.py              # BaseHealthChecker
│   ├── checkers.py          # RedisHealthChecker, HTTPHealthChecker, etc.
│   ├── protocols.py         # HealthChecker protocol
│   ├── reporter.py          # HealthReporter (background heartbeat)
│   └── service.py           # HealthService (aggregator)
├── http_client/             # HTTP client with retries
│   └── client.py            # HttpClient, HttpClientConfig, HttpRetryConfig
├── lambda_/                 # Lambda handler framework
│   ├── backtest_processor.py # SQSJobProcessor, BatchResult
│   ├── context.py           # LambdaContext[T] DI container
│   ├── decorators.py        # @lambda_handler, @lambda_handler_with_di
│   ├── entities.py          # DriftState, HealthState, HeartbeatState
│   ├── response.py          # LambdaResponse builder
│   └── settings.py          # LambdaSettings + per-handler settings
├── mlflow/                  # MLflow integration
│   ├── adapter.py           # MLflowAdapter (REST + SDK)
│   ├── artifacts.py         # Artifact management
│   ├── backtest_logger.py   # BacktestMLflowLogger
│   ├── client.py            # MLflow REST client
│   ├── experiments.py       # Experiment management
│   ├── registry.py          # Model registry operations
│   ├── tags.py              # 40+ tag constants (TAG_STRATEGY_NAME, etc.)
│   └── utils.py             # URI conversion, parsing
├── model_comparison/        # Model comparison (statistical)
│   ├── comparator.py        # ModelComparator (champion/challenger)
│   └── entities.py          # ComparisonResult, PromotionDecision
├── validation/              # Deployment validation
│   └── entities.py          # DryRunValidationReport, GoLiveValidationReport
└── utils/                   # Utility functions
    ├── __init__.py          # Re-exports (git utils, etc.)
    ├── datetime.py          # Date/time conversion helpers
    ├── decimal.py           # Decimal precision utilities
    ├── git.py               # Git commit/branch/dirty checks
    ├── profit.py            # Profit calculation helpers
    └── seed.py              # Reproducible random seeding

Module Details¶

config/ — Strategy Configuration Management¶

Manages loading, merging, versioning, and validating strategy configurations across environments.

Key Classes: - StrategyConfigLoader — Loads strategy configs from S3 or MLflow, resolving by strategy name + version - ConfigMergeService — Merges multiple config layers using OmegaConf (base → environment → override) - ConfigVersionService — Content-addressable config versioning with hash-based deduplication - ConfigVersionRepository — DynamoDB persistence for config versions - ConfigValidator — Mode-specific validation (LIVE requires all fields; DRY_RUN allows partial configs) - StrategyConfig — Canonical Pydantic schema for strategy configuration

Usage:

from tradai.common.config import StrategyConfigLoader, ConfigMergeService

drift/ — Model Drift Detection¶

Detects distribution shifts between training and production data using Population Stability Index (PSI).

Key Classes: - DriftDetector — PSI-based drift detection across feature distributions (lazy-loaded, requires numpy/scipy) - DriftResult — Overall drift detection result with severity and per-feature breakdown - DriftSeverity — Enum: NONE, MINOR, MODERATE, SEVERE - DriftThresholds — Configurable PSI thresholds for severity classification - FeatureDrift — Per-feature drift information with PSI value

features/ — Feature Engineering¶

Provides feature schema validation and data lineage tracking for ML pipelines.

Key Classes: - FeatureSchema — Defines and validates expected feature columns, types, and ranges - DataLineage — Tracks data transformation provenance (source → transformations → output) - compute_dataframe_hash() — Deterministic hash of DataFrame contents for reproducibility checks

freqtrade/ — Freqtrade Integration¶

Wraps Freqtrade CLI and subprocess management for backtesting and model registry operations.

Key Classes: - FreqtradeCLIBuilder — Fluent builder for Freqtrade CLI commands (backtesting, hyperopt, etc.) - FreqtradeRunner — Subprocess management with stdout/stderr capture - FreqtradeBacktester — Specialized runner for backtest execution - FreqAIModelRegistry — Discovery and management of FreqAI model directories

lambda_/ — Lambda Handler Framework¶

Dependency injection framework for AWS Lambda handlers with consistent setup, error handling, and response formatting.

Key Classes: - LambdaSettings — Pydantic BaseSettings for automatic env var loading per handler - LambdaContext[T] — Generic DI container providing typed settings, publishers, and repositories - @lambda_handler — Decorator for consistent Lambda setup, error handling, and response formatting - @lambda_handler_with_di — DI variant that injects LambdaContext into handler - SQSJobProcessor — Batch SQS message processing with partial failure support - LambdaResponse — Builder for consistent API Gateway-compatible response formatting

Per-handler settings: HealthCheckSettings, DriftMonitorSettings, PulumiDriftSettings, RetrainingSchedulerSettings, HeartbeatSettings, ModelManagementSettings, SQSTriggerSettings, DynamoDBSettings

health/ — Health Check Service¶

Aggregates health status from multiple dependencies with pre-built checkers and background heartbeat reporting.

Key Classes: - HealthService — Aggregates results from multiple HealthChecker instances - HealthResult — Overall health status with individual dependency check results - BaseHealthChecker — Abstract base for implementing custom health checks - HealthReporter — Background task that periodically reports health to DynamoDB

Pre-built Checkers: - AWS: DynamoDBHealthChecker, S3HealthChecker, SQSHealthChecker, SNSHealthChecker - Infrastructure: RedisHealthChecker, DatabaseHealthChecker, HTTPHealthChecker, MLflowHealthChecker

http_client/ — HTTP Client¶

HTTP client with configurable retry and circuit breaker integration.

Key Classes: - HttpClient — Requests-based client with automatic retries, timeout, and header management - HttpClientConfig — Configuration for base URL, timeout, headers - HttpRetryConfig — Retry configuration (max attempts, backoff, retryable status codes) - HttpResponse — Response wrapper with status, body, and headers

Multi-channel alerting infrastructure supporting SNS, Slack, and Discord notifications.

Key Classes: - AlertService — Publishes alerts to SNS topics with severity-based routing - Alert — Alert model with message, severity, context, and timestamp - AlertSeverity — Enum: INFO, WARNING, ERROR, CRITICAL - SlackNotifier — Sends formatted alerts to Slack via webhook - DiscordNotifier — Sends formatted alerts to Discord via webhook

fastapi/ — FastAPI Utilities¶

App factory, middleware, DI helpers, and standardized error responses for FastAPI services.

Key Classes: - create_app() — Factory function that builds FastAPI app with middleware, exception handlers, and health endpoints - AppConfig — Configuration for app title, version, CORS, middleware - RequestTracingMiddleware — Injects correlation IDs into request context - ErrorResponse — Standardized error response schema - get_correlation_id() — FastAPI dependency for extracting/generating correlation IDs - validated_resource_name() — DI dependency for validating path parameters

mlflow/ — MLflow Integration¶

Comprehensive MLflow adapter for experiment tracking, model registry, and artifact management.

Key Classes: - MLflowAdapter — Unified interface for MLflow REST API and Python SDK operations - BacktestMLflowLogger — Logs backtest results, metrics, and artifacts to MLflow experiments - MLflowURIConverter — Converts between MLflow artifact URIs and S3 paths - 40+ tag constants — Standardized tag keys (TAG_STRATEGY_NAME, TAG_GIT_COMMIT, TAG_SHARPE_RATIO, etc.) ensuring consistent metadata across all experiments

model_comparison/ — Model Comparison¶

Statistical comparison of champion vs challenger models with automated promotion decisions.

Key Classes: - ModelComparator — Compares two model versions across multiple metrics with statistical significance - ComparisonResult — Overall comparison result with per-metric breakdown and promotion recommendation - PromotionDecision — Enum: PROMOTE, KEEP_CHAMPION, INCONCLUSIVE - ModelCandidate — Model version metadata and performance metrics - MetricComparison — Per-metric comparison with direction (higher/lower is better)

validation/ — Deployment Validation¶

Validation reports for dry-run and go-live deployment gates.

Key Classes (Dry-Run): - DryRunValidationReport — Aggregates candle metrics, order metrics, position metrics, and heartbeat uptime - DryRunCheckResult — Individual check with severity (INFO, WARNING, CRITICAL) - HeartbeatMetrics, CandleMetrics, OrderMetrics, PositionMetrics — Domain-specific metric containers

Key Classes (Go-Live): - GoLiveValidationReport — Infrastructure readiness checks before live trading promotion - GoLiveCheckResult — Individual infrastructure check result - AlarmCheckResult, LambdaCheckResult, SNSCheckResult — AWS resource-specific checks

ab_testing/ — A/B Testing Framework¶

Manages canary-style A/B tests with traffic routing, statistical testing, and lifecycle management.

Key Classes: - ABTestManager — Orchestrates test lifecycle: create → route traffic → collect metrics → decide - ABTestConfig — Test configuration (variants, traffic split, duration, success criteria) - ABTestState — Current test state with metrics and stage - CanaryStage — Enum for progressive rollout stages: 5%, 25%, 50%, 100% - StatisticalTester — Hypothesis testing for A/B results (lazy-loaded, requires scipy) - TrafficRouter — Routes incoming requests to champion or challenger variant - ABTestRepository / DynamoDBABTestRepository — Persistence for test state

Entity Types Catalog¶

The entities/ directory contains 12 entity files with ~50 Pydantic models:

Entity File	Key Classes	Purpose
`backtest.py`	`BacktestConfig`, `BacktestResult`, `BacktestJobStatus`, `BacktestJobMessage`	Backtest lifecycle
`deployment.py`	`DeploymentRecord`, `DeploymentStatus`, `DeploymentDiff`	Deployment tracking
`exchange.py`	`ExchangeConfig`, `TradingMode`, `OperatingMode`	Exchange config
`hyperopt.py`	`OptunaHyperoptConfig`, `TrialResult`, `HyperparamSearchSpace`	Hyperparameter optimization
`mlflow.py`	`ModelVersion`, `RegisteredModel`, `ExperimentRun`	MLflow integration
`retraining.py`	`RetrainingState`, `RetrainingDecision`	Retraining workflows
`trading_state.py`	`TradingState`, `StrategyPnL`, `LiveTrade`	Live trading state
`identifiers.py`	`Arn`, `ExperimentName`, `TableName`	Type-safe AWS identifiers
`aws.py`	`S3Path`, `AWSConfig`, `JobStatus`	AWS infrastructure
`config_version.py`	`ConfigVersion`, `ConfigVersionStatus`	Config versioning
`pagination.py`	`PaginatedResponse`	Generic pagination
`strategy.py`	`Strategy`	Strategy entity

Trading Mode Safety¶

LIVE Mode Fail-Fast (entrypoint/trading.py)¶

LIVE trading requires valid configuration - empty or invalid configs cause immediate failure:

def _load_strategy_config(self) -> dict[str, Any]:
    is_live = self._settings.trading_mode == TradingMode.LIVE
    try:
        config_dict = loader.load_config(...)
        self._validate_config_for_mode(config_dict)
        return config_dict
    except Exception as e:
        if is_live:
            # LIVE mode: fail immediately - never trade with empty config
            raise TradingError(f"Failed to load strategy config for LIVE trading: {e}")
        # DRY_RUN: warn and continue (for development/testing)
        self.logger.warning(f"Config load failed, using empty config: {e}")
        return {}

Unified StrategyConfig (strategy_config_loader.py)¶

Single canonical schema for strategy configuration:

class StrategyConfig(BaseModel):
    model_config = ConfigDict(frozen=True, extra="allow")

    name: str = Field(..., min_length=1)
    version: str = Field(..., pattern=r"^\d+\.\d+\.\d+$")
    timeframe: str = Field(..., pattern=r"^\d+[mhd]$")
    pairs: list[str] = Field(default_factory=list)
    stake_currency: str = Field(default="USDT")
    exchange: str = Field(default="binance")
    parameters: dict[str, Any] = Field(default_factory=dict)
    buy_params: dict[str, Any] = Field(default_factory=dict)
    sell_params: dict[str, Any] = Field(default_factory=dict)
    freqai: dict[str, Any] | None = Field(default=None)

Services should import from tradai.common:

from tradai.common import StrategyConfig  # Canonical schema

Security Improvements¶

No Hardcoded Credentials
All secrets from AWS Secrets Manager
Secret names from environment variables
No Hardcoded Infrastructure IDs
All AWS ARNs, subnets, security groups from env vars
Configuration factory pattern
Thread Safety
Thread locks for shared state
Immutable entities with Pydantic frozen=True
Input Validation
S3 path validation
File path sanitization
Pydantic validation throughout

Code Quality Standards¶

Type Hints: 100% coverage (strict mypy)
Docstrings: Google style for all public APIs
Test Coverage: 85%+ target
Absolute Imports: Enforced via Ruff
No Dead Code: Remove all test blocks from production files

Dependencies¶

Core:

pydantic>=2.0.0 (validation, immutability)
pydantic-settings>=2.0.0 (environment-based config)
boto3>=1.35.0 (AWS SDK)

Utilities:

redis>=5.0.0 (caching)
python-json-logger>=2.0.7 (structured logging)
PyYAML>=6.0.1 (config files)
requests>=2.32.0 (HTTP client)
omegaconf>=2.3.0 (Hydra compatibility)
mlflow-skinny>=2.18.0 (experiment tracking)
docker>=7.0.0 (ECS job manager)

Testing Strategy¶

Unit Tests (fast, isolated):
Mock all external dependencies
Test business logic only
Use pytest-mock
Integration Tests (slower, real services):
LocalStack for AWS services
Test actual AWS interactions
Test Structure:

tests/
├── unit/
│   ├── test_exceptions.py
│   ├── test_base_settings.py
│   ├── aws/
│   │   ├── test_secrets_manager.py
│   │   └── test_s3_utils.py
│   └── utils/
│       └── test_cache.py
└── integration/
    └── aws/
        └── test_ecr_util.py

Migration from Original¶

Removed (Security/Quality)¶

❌ secrets_maneger.py test code (lines 56-58)
❌ mlflow_client.py test code (lines 530-550)
❌ ecr_util.py test code (lines 519-560)
❌ ecr_util.py hardcoded ARNs (lines 16-26)
❌ Commented dead code in mlflow_client.py

Renamed (Correctness)¶

✅ secrets_maneger.py → secrets_manager.py

Skipped (Complexity)¶

⏭ scope.py (700+ lines, will redesign)
⏭ log.py (50+ methods, use logger_mixin instead)
⏭ strategy_config_mixin.py (move to services layer)

Added (Architecture)¶

✅ exceptions.py (custom exception hierarchy)
✅ repositories.py (repository ABCs)
✅ aws/s3_utils.py (DRY S3 parsing)
✅ types.py (type aliases)

Health & Risk (C3/H1)¶

Risk Controls (`entities/risk_limits.py`)¶

RiskLimits — platform-level risk limits (drawdown, open trades, leverage, action on breach)
validate_deployment_bounds() classmethod — shared pre-flight validation (CLI + backend)
RiskBreach / RiskCheckResult — structured breach reporting

Risk Monitor (`health/risk_monitor.py`)¶

Pure evaluator: receives metrics, returns RiskCheckResult (no I/O)
Fail-closed: tracks consecutive metric failures, triggers breach after threshold
Only accessed from single _heartbeat_loop thread — no lock needed

Metrics Collector (`health/metrics_collector.py`)¶

Transforms Freqtrade REST API responses → StrategyPnL + [LiveTrade]
Calls /profit and /status exactly once per collect_all() invocation
Handles ratio→percentage conversion (Freqtrade returns 0.0-1.0)

Integration (`health/reporter.py`)¶

HealthReporter._heartbeat_loop() orchestrates: collect → heartbeat → risk check → pause/resume
Metric snapshots persisted to DynamoDB via TradingStateRepository.update_metrics()
Risk breach actions: pause Freqtrade, update DynamoDB, send CRITICAL alert
_risk_breach_alerted flag prevents alert storms; recovery sends INFO alert

Success Criteria¶

[ ] All 4 security vulnerabilities fixed
[ ] 85%+ test coverage
[ ] 100% type hint coverage
[ ] No hardcoded credentials or infrastructure IDs
[ ] All public APIs documented
[ ] Thread-safe implementations
[ ] Absolute imports only