Skip to content

tradai-common Design Document

Overview

tradai-common provides shared utilities, base classes, and AWS integrations for the TradAI platform. This library follows SOLID principles and applies DRY patterns throughout.

Architecture Decisions

What We Keep (Improved)

  1. base_service.py - Base class for services with Hydra config loading

  2. ADD: Docstrings, settings_class validation

  3. KEEP: LoggerMixin composition pattern

  4. base_settings.py - Pydantic settings with S3 support

  5. ADD: Path validation, proper return types

  6. KEEP: Strict mode, frozen entities

  7. logger_mixin.py - Logging mixin for composition

  8. KEEP: Already well-designed

  9. common.py - Utility functions for time, quantization

  10. KEEP: Excellent type hints

  11. ADD: More validation where needed

  12. cache.py - Caching mechanisms

  13. KEEP: Already well-designed with Final types

  14. mlflow_client.py - MLflow REST client

  15. REMOVE: All test code (lines 530-550)

  16. IMPROVE: Error handling

What We Fix (Security)

  1. secrets_manager.py (rename from secrets_maneger.py)

  2. FIX: Thread-safe with lock

  3. FIX: Secret name from environment variables
  4. FIX: Proper error handling
  5. REMOVE: Test code

  6. ecr_util.py - ECS job manager

  7. FIX: All AWS config from environment variables

  8. REMOVE: Hardcoded ARNs, subnets, security groups
  9. REMOVE: Test code (lines 519-560)
  10. KEEP: Async patterns, CloudWatch streaming

What We Add (SOLID)

  1. exceptions.py - Custom exception hierarchy

  2. NEW: TradAIError base class

  3. NEW: ValidationError, NotFoundError, ConfigurationError, etc.

  4. repositories.py - Repository pattern ABCs

  5. NEW: Repository[T] generic base

  6. NEW: DataRepository for market data
  7. NEW: Dependency Inversion principle

  8. s3_utils.py - S3 path parsing (DRY)

  9. NEW: S3Path frozen dataclass

  10. NEW: Centralized parsing logic

What We Skip (Too Complex/Refactor Later)

  1. scope.py - Thread-unsafe, overly complex

  2. SKIP: Will redesign with dependency injection later

  3. REASON: 500+ lines, global state, tight coupling

  4. log.py - 50+ methods, too complex

  5. SKIP: Will simplify later if needed

  6. REASON: Logging is already handled by logger_mixin

  7. strategy_config_mixin.py - Service-specific logic

  8. SKIP: Move to strategy service layer later

  9. REASON: Violates Single Responsibility

Design Patterns Applied

1. Repository Pattern (Dependency Inversion)

# Abstract interface
class DataRepository(ABC):
    @abstractmethod
    def get_ohlcv(...) -> pd.DataFrame:
        pass

# Concrete implementation
class ArcticDBDataRepository(DataRepository):
    def get_ohlcv(...) -> pd.DataFrame:
        # Implementation

2. Frozen Entities (Immutability)

from pydantic import BaseModel


class Strategy(BaseModel):
    name: str
    version: str

    class Config:
        frozen = True  # Immutable

3. Mixin Pattern (Composition over Inheritance)

class LoggerMixin:
    @property
    def logger(self) -> logging.Logger:
        ...

class MyService(LoggerMixin):
    # Gets logging for free

3a. Settings Mixin Pattern (DRY Configuration)

Reusable Pydantic settings mixins for shared configuration fields across services:

from tradai.common.settings_mixins import ArcticSettingsMixin, MLflowSettingsMixin

# Mixin defines shared fields with sensible defaults
class ArcticSettingsMixin:
    arctic_s3_bucket: str = Field(default="", description="S3 bucket for ArcticDB")
    arctic_library: str = Field(default="futures", description="ArcticDB library name")
    arctic_s3_endpoint: str = Field(default="s3.us-east-1.amazonaws.com")
    arctic_region: str = Field(default="us-east-1")

# Services inherit and optionally override
class DataCollectionSettings(ArcticSettingsMixin, Settings):
    arctic_s3_bucket: str = Field(..., description="Required for data-collection")  # Override: required

class StrategyServiceSettings(ArcticSettingsMixin, MLflowSettingsMixin, Settings):
    # Inherits all Arctic + MLflow fields with defaults
    pass

Available mixins: - ArcticSettingsMixin: arctic_s3_bucket, arctic_library, arctic_s3_endpoint, arctic_region - MLflowSettingsMixin: mlflow_tracking_uri, mlflow_username, mlflow_password, mlflow_verify_ssl

4. Thread-Safe Singleton with lru_cache

from functools import lru_cache


@lru_cache(maxsize=1)
def get_secrets_client():
    return boto3.client("secretsmanager")

5. Factory Pattern (Already exists in source.py)

class SourceFactory:
    @staticmethod
    def create(kind: str) -> DataSource: ...

6. Handler Registry Pattern (entrypoint/base.py)

Enables dependency inversion for mode handlers, avoiding circular imports:

from tradai.common.entrypoint.base import StrategyEntrypoint
from tradai.common.entrypoint.settings import TradingMode
from tradai.common.protocols import ModeHandler

# Services register handlers using decorator
@StrategyEntrypoint.register_handler(TradingMode.BACKTEST)
class BacktestHandler:
    def __init__(self, entrypoint: StrategyEntrypoint) -> None:
        self._ep = entrypoint

    def run(self) -> int:
        # Execute backtest logic
        return 0

# Handler is invoked via registry lookup (no direct import)
handler_factory = StrategyEntrypoint.get_handler(TradingMode.BACKTEST)
handler = handler_factory(entrypoint)
return handler.run()

Benefits: - Breaks circular dependencies (common → strategy-service) - Services own their handlers - Base class remains generic

Module Organization

tradai/common/
├── __init__.py              # Public API exports
├── exceptions.py            # Custom exception hierarchy
├── protocols.py             # Shared protocols (ModeHandler, etc.)
├── settings.py              # Settings base class (Pydantic)
├── settings_mixins.py       # Reusable settings mixins (ArcticSettingsMixin, MLflowSettingsMixin)
├── logger_mixin.py          # Logging mixin (composition pattern)
├── tracing.py               # OpenTelemetry tracing utilities
├── validators.py            # Shared field validators (S3 bucket names, etc.)
├── types.py                 # Type aliases
├── warmup_loader.py         # Model warmup loading
├── aws/                     # AWS integrations
│   ├── __init__.py          # Re-exports (get_dynamodb_resource, etc.)
│   ├── _client_factory.py   # Cached boto3 client/resource factory
│   ├── cloudwatch_logs.py   # CloudWatch log retrieval
│   ├── cloudwatch_metrics.py # CloudWatch metrics publishing
│   ├── dynamodb/            # DynamoDB adapters (sync & async)
│   ├── ecs_deploy.py        # ECS deployment helper
│   ├── s3_utils.py          # S3 path parsing, upload/download
│   ├── async_s3_utils.py    # Async S3 utilities
│   ├── state_repository.py  # DynamoDB state repository
│   ├── step_functions.py    # Step Functions client
│   └── constants.py         # AWS constants
├── clients/                 # External service clients
│   ├── data_collection_client.py  # Data collection REST client
│   └── mlflow_client.py          # MLflow REST client with circuit breaker
├── repositories/            # Generic repository patterns
│   ├── base.py              # DynamoDBRepository base class
│   ├── codecs.py            # DynamoDB codec system
│   ├── crud.py              # Generic CRUD repository
│   ├── deployment.py        # Deployment repository
│   ├── in_memory.py         # InMemoryStatefulRepository (A/B test, shadow test)
│   ├── pagination.py        # Paginated query support
│   ├── protocols.py         # Repository protocols
│   ├── query.py             # Query builder
│   └── trading_state.py     # Trading state repository
├── entities/                # Domain entities (Pydantic, frozen)
│   ├── __init__.py          # Barrel exports
│   ├── aws.py               # S3Path, AWSConfig, JobStatus
│   ├── backtest.py          # BacktestConfig, BacktestResult, BacktestJobStatus
│   ├── config_version.py    # ConfigVersion, ConfigVersionStatus
│   ├── deployment.py        # DeploymentRecord, DeploymentStatus, DeploymentDiff
│   ├── exchange.py          # ExchangeConfig, TradingMode, OperatingMode
│   ├── hyperopt.py          # OptunaHyperoptConfig, TrialResult, HyperparamSearchSpace
│   ├── identifiers.py       # Arn, ExperimentName, TableName
│   ├── mlflow.py            # ModelVersion, RegisteredModel, ExperimentRun
│   ├── pagination.py        # PaginatedResponse
│   ├── retraining.py        # RetrainingState, RetrainingDecision
│   ├── strategy.py          # Strategy
│   └── trading_state.py     # TradingState, StrategyPnL, LiveTrade
├── entrypoint/              # Strategy execution entrypoints
│   ├── base.py              # StrategyEntrypoint with handler registry
│   ├── settings.py          # TradingMode, EntrypointSettings
│   └── training/            # Training entrypoint (extracted from monolithic handler)
│       ├── __init__.py      # Public API
│       ├── handler.py       # TrainHandler (orchestrator only)
│       ├── manifest_builder.py  # Config manifest construction
│       ├── metadata.py          # Training metadata extraction
│       ├── mlflow_reporter.py   # MLflow experiment logging
│       ├── model_registrar.py   # Model registration
│       └── result_parser.py     # Backtest result parsing
├── resilience/              # Resilience patterns
│   ├── circuit_breaker.py   # CircuitBreaker with state machine
│   ├── execution.py         # Resilient execution helpers
│   ├── policy.py            # ResiliencePolicy (retry + CB)
│   └── retry.py             # Retry with exponential backoff
├── auth/                    # Authentication utilities
│   ├── __init__.py          # JWT validator, credential reloader
│   └── fastapi_deps.py      # Reusable FastAPI auth dependencies
├── ab_testing/              # A/B testing framework
│   ├── entities.py          # ABTestConfig, ABTestState, ABTestResult
│   ├── manager.py           # ABTestManager lifecycle
│   ├── repository.py        # DynamoDB + in-memory repositories
│   ├── statistical_tester.py # Hypothesis testing (lazy-loaded)
│   └── traffic_router.py    # Request routing to variants
├── alerting/                # SNS-based alert management
│   ├── entities.py          # Alert, AlertContext, AlertSeverity
│   ├── service.py           # AlertService (SNS publisher)
│   └── webhook.py           # SlackNotifier, DiscordNotifier
├── config/                  # Strategy configuration management
│   ├── catalog.py           # StrategyConfig schema
│   ├── loader.py            # StrategyConfigLoader (S3/MLflow)
│   ├── merge.py             # ConfigMergeService (OmegaConf)
│   ├── repository.py        # ConfigVersionRepository (DynamoDB)
│   ├── service.py           # ConfigVersionService
│   └── validators.py        # ConfigValidator (mode-specific)
├── drift/                   # Model drift detection
│   ├── detector.py          # DriftDetector (PSI-based, lazy-loaded)
│   └── entities.py          # DriftResult, DriftSeverity, DriftThresholds
├── fastapi/                 # FastAPI utilities
│   ├── app_factory.py       # create_app(), AppConfig, create_lifespan()
│   ├── dependencies.py      # validated_resource_name(), get_correlation_id()
│   ├── di_helpers.py        # Dependency injection helpers
│   ├── error_schemas.py     # ErrorResponse, ValidationErrorResponse
│   └── middleware.py        # RequestTracingMiddleware
├── features/                # Feature engineering
│   ├── lineage.py           # DataLineage tracking
│   └── schema.py            # FeatureSchema, compute_dataframe_hash()
├── freqtrade/               # Freqtrade integration
│   ├── cli.py               # FreqtradeCLIBuilder
│   ├── registry.py          # FreqAIModelRegistry
│   └── runner.py            # FreqtradeRunner, FreqtradeBacktester
├── health/                  # Health check service
│   ├── aws_checkers.py      # DynamoDBHealthChecker, S3HealthChecker, etc.
│   ├── base.py              # BaseHealthChecker
│   ├── checkers.py          # RedisHealthChecker, HTTPHealthChecker, etc.
│   ├── protocols.py         # HealthChecker protocol
│   ├── reporter.py          # HealthReporter (background heartbeat)
│   └── service.py           # HealthService (aggregator)
├── http_client/             # HTTP client with retries
│   └── client.py            # HttpClient, HttpClientConfig, HttpRetryConfig
├── lambda_/                 # Lambda handler framework
│   ├── backtest_processor.py # SQSJobProcessor, BatchResult
│   ├── context.py           # LambdaContext[T] DI container
│   ├── decorators.py        # @lambda_handler, @lambda_handler_with_di
│   ├── entities.py          # DriftState, HealthState, HeartbeatState
│   ├── response.py          # LambdaResponse builder
│   └── settings.py          # LambdaSettings + per-handler settings
├── mlflow/                  # MLflow integration
│   ├── adapter.py           # MLflowAdapter (REST + SDK)
│   ├── artifacts.py         # Artifact management
│   ├── backtest_logger.py   # BacktestMLflowLogger
│   ├── client.py            # MLflow REST client
│   ├── experiments.py       # Experiment management
│   ├── registry.py          # Model registry operations
│   ├── tags.py              # 40+ tag constants (TAG_STRATEGY_NAME, etc.)
│   └── utils.py             # URI conversion, parsing
├── model_comparison/        # Model comparison (statistical)
│   ├── comparator.py        # ModelComparator (champion/challenger)
│   └── entities.py          # ComparisonResult, PromotionDecision
├── validation/              # Deployment validation
│   └── entities.py          # DryRunValidationReport, GoLiveValidationReport
└── utils/                   # Utility functions
    ├── __init__.py          # Re-exports (git utils, etc.)
    ├── datetime.py          # Date/time conversion helpers
    ├── decimal.py           # Decimal precision utilities
    ├── git.py               # Git commit/branch/dirty checks
    ├── profit.py            # Profit calculation helpers
    └── seed.py              # Reproducible random seeding

Module Details

config/ — Strategy Configuration Management

Manages loading, merging, versioning, and validating strategy configurations across environments.

Key Classes: - StrategyConfigLoader — Loads strategy configs from S3 or MLflow, resolving by strategy name + version - ConfigMergeService — Merges multiple config layers using OmegaConf (base → environment → override) - ConfigVersionService — Content-addressable config versioning with hash-based deduplication - ConfigVersionRepository — DynamoDB persistence for config versions - ConfigValidator — Mode-specific validation (LIVE requires all fields; DRY_RUN allows partial configs) - StrategyConfig — Canonical Pydantic schema for strategy configuration

Usage:

from tradai.common.config import StrategyConfigLoader, ConfigMergeService

drift/ — Model Drift Detection

Detects distribution shifts between training and production data using Population Stability Index (PSI).

Key Classes: - DriftDetector — PSI-based drift detection across feature distributions (lazy-loaded, requires numpy/scipy) - DriftResult — Overall drift detection result with severity and per-feature breakdown - DriftSeverity — Enum: NONE, MINOR, MODERATE, SEVERE - DriftThresholds — Configurable PSI thresholds for severity classification - FeatureDrift — Per-feature drift information with PSI value

features/ — Feature Engineering

Provides feature schema validation and data lineage tracking for ML pipelines.

Key Classes: - FeatureSchema — Defines and validates expected feature columns, types, and ranges - DataLineage — Tracks data transformation provenance (source → transformations → output) - compute_dataframe_hash() — Deterministic hash of DataFrame contents for reproducibility checks

freqtrade/ — Freqtrade Integration

Wraps Freqtrade CLI and subprocess management for backtesting and model registry operations.

Key Classes: - FreqtradeCLIBuilder — Fluent builder for Freqtrade CLI commands (backtesting, hyperopt, etc.) - FreqtradeRunner — Subprocess management with stdout/stderr capture - FreqtradeBacktester — Specialized runner for backtest execution - FreqAIModelRegistry — Discovery and management of FreqAI model directories

lambda_/ — Lambda Handler Framework

Dependency injection framework for AWS Lambda handlers with consistent setup, error handling, and response formatting.

Key Classes: - LambdaSettings — Pydantic BaseSettings for automatic env var loading per handler - LambdaContext[T] — Generic DI container providing typed settings, publishers, and repositories - @lambda_handler — Decorator for consistent Lambda setup, error handling, and response formatting - @lambda_handler_with_di — DI variant that injects LambdaContext into handler - SQSJobProcessor — Batch SQS message processing with partial failure support - LambdaResponse — Builder for consistent API Gateway-compatible response formatting

Per-handler settings: HealthCheckSettings, DriftMonitorSettings, PulumiDriftSettings, RetrainingSchedulerSettings, HeartbeatSettings, ModelManagementSettings, SQSTriggerSettings, DynamoDBSettings

health/ — Health Check Service

Aggregates health status from multiple dependencies with pre-built checkers and background heartbeat reporting.

Key Classes: - HealthService — Aggregates results from multiple HealthChecker instances - HealthResult — Overall health status with individual dependency check results - BaseHealthChecker — Abstract base for implementing custom health checks - HealthReporter — Background task that periodically reports health to DynamoDB

Pre-built Checkers: - AWS: DynamoDBHealthChecker, S3HealthChecker, SQSHealthChecker, SNSHealthChecker - Infrastructure: RedisHealthChecker, DatabaseHealthChecker, HTTPHealthChecker, MLflowHealthChecker

http_client/ — HTTP Client

HTTP client with configurable retry and circuit breaker integration.

Key Classes: - HttpClient — Requests-based client with automatic retries, timeout, and header management - HttpClientConfig — Configuration for base URL, timeout, headers - HttpRetryConfig — Retry configuration (max attempts, backoff, retryable status codes) - HttpResponse — Response wrapper with status, body, and headers

alerting/ — SNS-Based Alert Management

Multi-channel alerting infrastructure supporting SNS, Slack, and Discord notifications.

Key Classes: - AlertService — Publishes alerts to SNS topics with severity-based routing - Alert — Alert model with message, severity, context, and timestamp - AlertSeverity — Enum: INFO, WARNING, ERROR, CRITICAL - SlackNotifier — Sends formatted alerts to Slack via webhook - DiscordNotifier — Sends formatted alerts to Discord via webhook

fastapi/ — FastAPI Utilities

App factory, middleware, DI helpers, and standardized error responses for FastAPI services.

Key Classes: - create_app() — Factory function that builds FastAPI app with middleware, exception handlers, and health endpoints - AppConfig — Configuration for app title, version, CORS, middleware - RequestTracingMiddleware — Injects correlation IDs into request context - ErrorResponse — Standardized error response schema - get_correlation_id() — FastAPI dependency for extracting/generating correlation IDs - validated_resource_name() — DI dependency for validating path parameters

mlflow/ — MLflow Integration

Comprehensive MLflow adapter for experiment tracking, model registry, and artifact management.

Key Classes: - MLflowAdapter — Unified interface for MLflow REST API and Python SDK operations - BacktestMLflowLogger — Logs backtest results, metrics, and artifacts to MLflow experiments - MLflowURIConverter — Converts between MLflow artifact URIs and S3 paths - 40+ tag constants — Standardized tag keys (TAG_STRATEGY_NAME, TAG_GIT_COMMIT, TAG_SHARPE_RATIO, etc.) ensuring consistent metadata across all experiments

model_comparison/ — Model Comparison

Statistical comparison of champion vs challenger models with automated promotion decisions.

Key Classes: - ModelComparator — Compares two model versions across multiple metrics with statistical significance - ComparisonResult — Overall comparison result with per-metric breakdown and promotion recommendation - PromotionDecision — Enum: PROMOTE, KEEP_CHAMPION, INCONCLUSIVE - ModelCandidate — Model version metadata and performance metrics - MetricComparison — Per-metric comparison with direction (higher/lower is better)

validation/ — Deployment Validation

Validation reports for dry-run and go-live deployment gates.

Key Classes (Dry-Run): - DryRunValidationReport — Aggregates candle metrics, order metrics, position metrics, and heartbeat uptime - DryRunCheckResult — Individual check with severity (INFO, WARNING, CRITICAL) - HeartbeatMetrics, CandleMetrics, OrderMetrics, PositionMetrics — Domain-specific metric containers

Key Classes (Go-Live): - GoLiveValidationReport — Infrastructure readiness checks before live trading promotion - GoLiveCheckResult — Individual infrastructure check result - AlarmCheckResult, LambdaCheckResult, SNSCheckResult — AWS resource-specific checks

ab_testing/ — A/B Testing Framework

Manages canary-style A/B tests with traffic routing, statistical testing, and lifecycle management.

Key Classes: - ABTestManager — Orchestrates test lifecycle: create → route traffic → collect metrics → decide - ABTestConfig — Test configuration (variants, traffic split, duration, success criteria) - ABTestState — Current test state with metrics and stage - CanaryStage — Enum for progressive rollout stages: 5%, 25%, 50%, 100% - StatisticalTester — Hypothesis testing for A/B results (lazy-loaded, requires scipy) - TrafficRouter — Routes incoming requests to champion or challenger variant - ABTestRepository / DynamoDBABTestRepository — Persistence for test state

Entity Types Catalog

The entities/ directory contains 12 entity files with ~50 Pydantic models:

Entity File Key Classes Purpose
backtest.py BacktestConfig, BacktestResult, BacktestJobStatus, BacktestJobMessage Backtest lifecycle
deployment.py DeploymentRecord, DeploymentStatus, DeploymentDiff Deployment tracking
exchange.py ExchangeConfig, TradingMode, OperatingMode Exchange config
hyperopt.py OptunaHyperoptConfig, TrialResult, HyperparamSearchSpace Hyperparameter optimization
mlflow.py ModelVersion, RegisteredModel, ExperimentRun MLflow integration
retraining.py RetrainingState, RetrainingDecision Retraining workflows
trading_state.py TradingState, StrategyPnL, LiveTrade Live trading state
identifiers.py Arn, ExperimentName, TableName Type-safe AWS identifiers
aws.py S3Path, AWSConfig, JobStatus AWS infrastructure
config_version.py ConfigVersion, ConfigVersionStatus Config versioning
pagination.py PaginatedResponse Generic pagination
strategy.py Strategy Strategy entity

Trading Mode Safety

LIVE Mode Fail-Fast (entrypoint/trading.py)

LIVE trading requires valid configuration - empty or invalid configs cause immediate failure:

def _load_strategy_config(self) -> dict[str, Any]:
    is_live = self._settings.trading_mode == TradingMode.LIVE
    try:
        config_dict = loader.load_config(...)
        self._validate_config_for_mode(config_dict)
        return config_dict
    except Exception as e:
        if is_live:
            # LIVE mode: fail immediately - never trade with empty config
            raise TradingError(f"Failed to load strategy config for LIVE trading: {e}")
        # DRY_RUN: warn and continue (for development/testing)
        self.logger.warning(f"Config load failed, using empty config: {e}")
        return {}

Unified StrategyConfig (strategy_config_loader.py)

Single canonical schema for strategy configuration:

class StrategyConfig(BaseModel):
    model_config = ConfigDict(frozen=True, extra="allow")

    name: str = Field(..., min_length=1)
    version: str = Field(..., pattern=r"^\d+\.\d+\.\d+$")
    timeframe: str = Field(..., pattern=r"^\d+[mhd]$")
    pairs: list[str] = Field(default_factory=list)
    stake_currency: str = Field(default="USDT")
    exchange: str = Field(default="binance")
    parameters: dict[str, Any] = Field(default_factory=dict)
    buy_params: dict[str, Any] = Field(default_factory=dict)
    sell_params: dict[str, Any] = Field(default_factory=dict)
    freqai: dict[str, Any] | None = Field(default=None)

Services should import from tradai.common:

from tradai.common import StrategyConfig  # Canonical schema

Security Improvements

  1. No Hardcoded Credentials

  2. All secrets from AWS Secrets Manager

  3. Secret names from environment variables

  4. No Hardcoded Infrastructure IDs

  5. All AWS ARNs, subnets, security groups from env vars

  6. Configuration factory pattern

  7. Thread Safety

  8. Thread locks for shared state

  9. Immutable entities with Pydantic frozen=True

  10. Input Validation

  11. S3 path validation

  12. File path sanitization
  13. Pydantic validation throughout

Code Quality Standards

  1. Type Hints: 100% coverage (strict mypy)
  2. Docstrings: Google style for all public APIs
  3. Test Coverage: 85%+ target
  4. Absolute Imports: Enforced via Ruff
  5. No Dead Code: Remove all test blocks from production files

Dependencies

Core:

  • pydantic>=2.0.0 (validation, immutability)
  • pydantic-settings>=2.0.0 (environment-based config)
  • boto3>=1.35.0 (AWS SDK)

Utilities:

  • redis>=5.0.0 (caching)
  • python-json-logger>=2.0.7 (structured logging)
  • PyYAML>=6.0.1 (config files)
  • requests>=2.32.0 (HTTP client)
  • omegaconf>=2.3.0 (Hydra compatibility)
  • mlflow-skinny>=2.18.0 (experiment tracking)
  • docker>=7.0.0 (ECS job manager)

Testing Strategy

  1. Unit Tests (fast, isolated):

  2. Mock all external dependencies

  3. Test business logic only
  4. Use pytest-mock

  5. Integration Tests (slower, real services):

  6. LocalStack for AWS services

  7. Test actual AWS interactions

  8. Test Structure:

tests/
├── unit/
│   ├── test_exceptions.py
│   ├── test_base_settings.py
│   ├── aws/
│   │   ├── test_secrets_manager.py
│   │   └── test_s3_utils.py
│   └── utils/
│       └── test_cache.py
└── integration/
    └── aws/
        └── test_ecr_util.py

Migration from Original

Removed (Security/Quality)

  • ❌ secrets_maneger.py test code (lines 56-58)
  • ❌ mlflow_client.py test code (lines 530-550)
  • ❌ ecr_util.py test code (lines 519-560)
  • ❌ ecr_util.py hardcoded ARNs (lines 16-26)
  • ❌ Commented dead code in mlflow_client.py

Renamed (Correctness)

  • ✅ secrets_maneger.py → secrets_manager.py

Skipped (Complexity)

  • ⏭ scope.py (700+ lines, will redesign)
  • ⏭ log.py (50+ methods, use logger_mixin instead)
  • ⏭ strategy_config_mixin.py (move to services layer)

Added (Architecture)

  • ✅ exceptions.py (custom exception hierarchy)
  • ✅ repositories.py (repository ABCs)
  • ✅ aws/s3_utils.py (DRY S3 parsing)
  • ✅ types.py (type aliases)

Health & Risk (C3/H1)

Risk Controls (entities/risk_limits.py)

  • RiskLimits — platform-level risk limits (drawdown, open trades, leverage, action on breach)
  • validate_deployment_bounds() classmethod — shared pre-flight validation (CLI + backend)
  • RiskBreach / RiskCheckResult — structured breach reporting

Risk Monitor (health/risk_monitor.py)

  • Pure evaluator: receives metrics, returns RiskCheckResult (no I/O)
  • Fail-closed: tracks consecutive metric failures, triggers breach after threshold
  • Only accessed from single _heartbeat_loop thread — no lock needed

Metrics Collector (health/metrics_collector.py)

  • Transforms Freqtrade REST API responses → StrategyPnL + [LiveTrade]
  • Calls /profit and /status exactly once per collect_all() invocation
  • Handles ratio→percentage conversion (Freqtrade returns 0.0-1.0)

Integration (health/reporter.py)

  • HealthReporter._heartbeat_loop() orchestrates: collect → heartbeat → risk check → pause/resume
  • Metric snapshots persisted to DynamoDB via TradingStateRepository.update_metrics()
  • Risk breach actions: pause Freqtrade, update DynamoDB, send CRITICAL alert
  • _risk_breach_alerted flag prevents alert storms; recovery sends INFO alert

Success Criteria

  • [ ] All 4 security vulnerabilities fixed
  • [ ] 85%+ test coverage
  • [ ] 100% type hint coverage
  • [ ] No hardcoded credentials or infrastructure IDs
  • [ ] All public APIs documented
  • [ ] Thread-safe implementations
  • [ ] Absolute imports only