Configuration & Model Versioning¶
How strategy configurations and ML models are versioned, promoted, and managed across environments.
graph LR
subgraph Config["Strategy Config"]
C1["tradai.yaml"] --> C2["S3 Upload"]
C2 --> C3["DRAFT"]
C3 --> C4["ACTIVE"]
C4 --> C5["DEPRECATED"]
end
subgraph Model["ML Model"]
M1["Training"] --> M2["MLflow Registry"]
M2 --> M3["Unversioned"]
M3 --> M4["Staging"]
M4 --> M5["Production"]
M5 --> M6["Archived"]
end
C4 -.->|deployed together| M5 Strategy Configuration¶
tradai.yaml¶
Every strategy has a tradai.yaml at its root that defines how TradAI services interact with it:
strategy:
name: "MyStrategy"
version: "1.0.0"
entry_point: "mystrategy.strategy:MyStrategy"
category: "momentum"
timeframe: "1h"
strategy_service:
source:
KIND: Binance
adapter:
KIND: AWS
bucket_name: tradai-data
library: ohlcv
defaults:
timerange: "20240101-20241201"
symbols:
- "BTC/USDT:USDT"
- "ETH/USDT:USDT"
stake_amount: 1000
max_open_trades: 3
mlflow:
tracking_uri: ${MLFLOW_TRACKING_URI:-http://localhost:5001}
experiment_name: "strategies/mystrategy"
auto_log_params: true
optimization:
defaults:
epochs: 100
loss_function: sharpe
spaces: [buy, sell]
presets:
quick:
epochs: 50
spaces: [buy]
standard:
epochs: 200
spaces: [buy, sell]
production:
epochs: 1000
spaces: [buy, sell, roi, stoploss, trailing]
walk_forward: true
deployment:
ecr:
repository: "tradai/strategies/mystrategy"
ecs:
cpu: 512
memory: 1024
Config Storage¶
Configurations are stored in S3 and versioned in DynamoDB:
graph TD
YAML["tradai.yaml"] -->|upload| S3["S3 Bucket - tradai-configs"]
S3 -->|version| DDB["DynamoDB - config-versions"]
DDB -->|ACTIVE version| ECS["ECS Task"]
ENV[".env / Secrets Manager"] -->|merge| Merge["ConfigMergeService"]
S3 -->|base config| Merge
Merge --> ECS | Component | Location | Purpose |
|---|---|---|
tradai.yaml | Strategy repo root | Source of truth for strategy config |
| S3 bucket | tradai-configs-{env} | Persisted config versions |
| DynamoDB table | tradai-config-versions-{env} | Version registry with lifecycle tracking |
| Secrets Manager | AWS Secrets Manager | Exchange credentials, API keys |
Config Version Lifecycle¶
Each config version follows a strict lifecycle:
stateDiagram-v2
[*] --> DRAFT : create_version()
DRAFT --> ACTIVE : activate()
ACTIVE --> DEPRECATED : new version activated
DEPRECATED --> [*] : TTL auto-cleanup (90 days)
note right of DRAFT : Not yet validated
note right of ACTIVE : Currently deployed (one per strategy)
note right of DEPRECATED : Superseded, auto-deleted after 90d Key rules:
- Only one ACTIVE version per strategy at any time
- Activating a new version automatically deprecates the previous one
- Deprecated versions have a 90-day TTL for auto-cleanup in DynamoDB
- Versions are content-addressable (SHA256 hash) -- duplicate configs are detected
Config Version Entity¶
Each version is tracked with these fields:
| Field | Type | Description |
|---|---|---|
strategy_name | string | Partition key (e.g., "PascalStrategy") |
config_id | string | Sort key: v{version}-{hash[:8]} |
config_hash | string | SHA256 of normalized config content |
config_data | dict | Frozen config content |
status | enum | DRAFT, ACTIVE, or DEPRECATED |
version_number | int | Sequential version per strategy |
created_at | datetime | When created |
deployed_at | datetime | When activated (null if DRAFT) |
superseded_by | string | config_id of newer version (if deprecated) |
Managing Config Versions¶
# Upload and create a new config version
tradai strategy stage MyStrategy --version 1
# View strategy config
tradai strategy list
# The config service handles versioning programmatically:
from tradai.common.config.service import ConfigVersionService
service = ConfigVersionService(table_name="tradai-config-versions-dev")
# Create a new version (starts as DRAFT)
version = service.create_version(
strategy_name="MyStrategy",
config_data={"timeframe": "1h", "symbols": ["BTC/USDT:USDT"]},
description="Updated symbols list",
)
# Activate it (auto-deprecates previous ACTIVE version)
active = service.activate("MyStrategy", version.config_id)
# List all versions for a strategy
versions = service.list_versions("MyStrategy")
# Get the currently active version
current = service.get_active("MyStrategy")
Config Loading at Runtime¶
When a strategy container starts, configs are loaded and merged from multiple sources:
graph TD
S3["S3 Config - (base)"] --> Loader["StrategyConfigLoader"]
MLflow["MLflow Tags - (model params)"] --> Loader
ENV["Environment Vars - (overrides)"] --> Loader
Loader --> Merge["ConfigMergeService"]
Merge --> Validate["Validation"]
Validate -->|pass| Config["StrategyConfig - (runtime)"]
Validate -->|fail| Error["Startup Error"] Priority order (highest wins):
- Environment variables
- MLflow model tags
- S3 stored config
tradai.yamldefaults
Model Versioning (MLflow)¶
Model Lifecycle¶
Models are tracked in the MLflow Model Registry with four stages:
stateDiagram-v2
[*] --> None : register()
None --> Staging : stage()
Staging --> Production : promote()
Production --> Archived : new model promoted
Archived --> Production : rollback()
note right of None : Newly registered, not yet validated
note right of Staging : Under validation, dry-run testing
note right of Production : Live trading model (one per strategy)
note right of Archived : Previous production, available for rollback | Stage | Description | Who sets it |
|---|---|---|
| None | Freshly registered, not yet reviewed | ModelRegistrar (automatic after training) |
| Staging | Under validation, dry-run testing | tradai strategy set-version or API |
| Production | Active production model | Promotion after validation passes |
| Archived | Previous production model, kept for rollback | Auto-archived when new model is promoted |
Model Registration¶
After a backtest or training run, models are automatically registered:
sequenceDiagram
participant ECS as ECS Task - (Freqtrade)
participant MLflow as MLflow - Registry
participant DDB as DynamoDB - State
ECS->>MLflow: Log metrics + params
ECS->>MLflow: Log model artifacts
ECS->>MLflow: Register model version
MLflow-->>ECS: Version number
ECS->>DDB: Update job status
ECS->>MLflow: Tag with git_commit, strategy_name The ModelRegistrar handles this automatically:
from tradai.common.entrypoint.training.model_registrar import ModelRegistrar
registrar = ModelRegistrar(mlflow_adapter=adapter)
result = registrar.register(config=training_config, result=training_result)
# result.model_version is now set
Model Comparison¶
Before promoting a new model, the ModelComparator compares it against the current champion:
graph TD
Challenger["Challenger Model - (new version)"] --> Compare["ModelComparator"]
Champion["Champion Model - (current Production)"] --> Compare
Compare --> Decision{"Better?"}
Decision -->|"yes"| Promote["Promote to Production"]
Decision -->|"no"| Keep["Keep current champion"]
Decision -->|"insufficient data"| Manual["Manual review needed"] Key comparison metrics:
| Metric | Weight | Required | Threshold |
|---|---|---|---|
total_profit | High | Yes | Must be positive |
sharpe_ratio | High | Yes | >= 0.5 |
max_drawdown | Medium | Yes | <= 30% |
win_rate | Medium | No | >= 40% |
total_trades | Low | Yes | >= 10 |
CLI Commands¶
# Stage a model version for validation
tradai strategy set-version MyStrategy 3 --stage Staging
# Promote to production (archives current champion)
tradai strategy set-version MyStrategy 3 --stage Production
# Promote without archiving previous version
tradai strategy set-version MyStrategy 3 --no-archive
# Preview promotion (dry run)
tradai strategy set-version MyStrategy 3 --dry-run
# Rollback to previous deployment
tradai deploy rollback MyStrategy --env dev
# Rollback to a specific deployment
tradai deploy rollback MyStrategy --env dev --deployment deploy-abc123
API Endpoints¶
The Strategy Service exposes model management APIs:
| Method | Path | Description |
|---|---|---|
GET | /api/v1/strategies/{name}/models | List model versions |
GET | /api/v1/strategies/{name}/models/rollback-candidates | List rollback candidates |
POST | /api/v1/strategies/{name}/models/stage | Stage a model version |
POST | /api/v1/strategies/{name}/models/promote | Promote to Production |
POST | /api/v1/strategies/{name}/models/rollback | Rollback model |
Automated Retraining Pipeline¶
The retraining workflow is orchestrated by Step Functions:
graph TD
Trigger["Trigger - Schedule / Drift / Manual"] --> Check["Check Retraining - Needed?"]
Check -->|"yes"| Train["Train Model - (ECS + FreqAI)"]
Check -->|"no"| Skip["Skip"]
Train --> Compare["Compare Models - (Lambda)"]
Compare -->|"better"| Promote["Promote Model - (Lambda)"]
Compare -->|"worse"| Keep["Keep Champion"]
Promote --> Notify["Notify - (SNS + Slack)"]
Keep --> Notify Lambda functions involved:
| Lambda | Role |
|---|---|
check-retraining-needed | Evaluates drift scores and schedules |
compare-models | Champion vs challenger comparison |
promote-model | Transitions model to Production stage |
model-rollback | Reverts to previous model version |
drift-monitor | Monitors PSI for model/data drift |
retraining-scheduler | Triggers retraining on schedule |
Environment-Specific Configuration¶
Settings Hierarchy¶
Each service uses Pydantic settings with environment variable prefixes:
| Service | Prefix | Key Settings |
|---|---|---|
| Backend | BACKEND_ | BACKEND_EXECUTOR_MODE, BACKEND_BACKTEST_QUEUE_URL |
| Strategy Service | STRATEGY_SERVICE_ | STRATEGY_SERVICE_MLFLOW_TRACKING_URI, STRATEGY_SERVICE_STRATEGY_PATH |
| Data Collection | DATA_COLLECTION_ | DATA_COLLECTION_EXCHANGES, DATA_COLLECTION_ARCTIC_S3_BUCKET |
Settings Mixins¶
Common settings are shared via mixins:
# MLflow settings (shared by Strategy Service and Backend)
class MLflowSettingsMixin:
mlflow_tracking_uri: str # MLFLOW_TRACKING_URI
mlflow_username: str # MLFLOW_USERNAME
mlflow_password: str # MLFLOW_PASSWORD
# ArcticDB settings (shared by Data Collection and Strategy Service)
class ArcticSettingsMixin:
arctic_s3_bucket: str # ARCTIC_S3_BUCKET
arctic_library_name: str # ARCTIC_LIBRARY_NAME (default: "ohlcv")
arctic_s3_endpoint: str # ARCTIC_S3_ENDPOINT
Per-Environment Differences¶
| Setting | Dev | Staging | Prod |
|---|---|---|---|
| Executor mode | local | sqs | stepfunctions |
| RDS instance | db.t4g.micro | db.t4g.micro | db.t4g.small |
| ECS launch type | EC2 (consolidated) | EC2 (consolidated) | Fargate |
| Log retention | 30 days | 30 days | 90 days |
| Deletion protection | Off | Off | On |
| MLflow URL | http://localhost:5001 | Service Discovery | Service Discovery |
Traceability¶
Every operation is traceable across the entire system:
graph LR
TraceID["trace_id"] --> Backend["Backend API"]
TraceID --> SQS["SQS Message"]
TraceID --> SF["Step Functions"]
TraceID --> ECS["ECS Task"]
TraceID --> MLflow["MLflow Run"]
TraceID --> DDB["DynamoDB - job record"]
JobID["job_id"] --> DDB
JobID --> S3["S3 Results"]
RunID["mlflow_run_id"] --> MLflow
RunID --> DDB
GitSHA["git_commit"] --> MLflow
GitSHA --> DDB | Field | Description | Where stored |
|---|---|---|
trace_id | End-to-end correlation ID | DynamoDB, Step Functions input, ECS env |
job_id | DynamoDB job identifier | DynamoDB, S3 result paths |
mlflow_run_id | MLflow experiment run | DynamoDB, BacktestResult, MLflow |
git_commit | Code version SHA | BacktestResult, MLflow tags |
Quick Reference¶
Config Commands¶
| Command | Description |
|---|---|
tradai strategy list | List registered strategies |
tradai strategy stage NAME --version V | Stage a strategy version |
tradai strategy set-version NAME V | Set model version stage |
tradai strategy set-version NAME V --stage Production | Promote model |
tradai strategy set-version NAME V --dry-run | Preview promotion |
tradai deploy strategy ./path --env dev | Deploy strategy to ECS |
tradai deploy rollback NAME --env dev | Rollback deployment |
Key Source Files¶
| Component | Path |
|---|---|
| ConfigVersion entity | libs/tradai-common/src/tradai/common/entities/config_version.py |
| ConfigVersionService | libs/tradai-common/src/tradai/common/config/service.py |
| S3ConfigRepository | libs/tradai-common/src/tradai/common/aws/s3_config_repository.py |
| ConfigMergeService | libs/tradai-common/src/tradai/common/config/merge.py |
| StrategyConfigLoader | libs/tradai-common/src/tradai/common/config/loader.py |
| ModelStage enum | libs/tradai-common/src/tradai/common/entities/mlflow.py |
| MLflowAdapter | libs/tradai-common/src/tradai/common/mlflow/adapter.py |
| ModelComparator | libs/tradai-common/src/tradai/common/model_comparison/comparator.py |
| ModelRegistrar | libs/tradai-common/src/tradai/common/entrypoint/training/model_registrar.py |
| Promotion routes | services/strategy-service/src/tradai/strategy_service/api/promotion_routes.py |
| Config routes | services/strategy-service/src/tradai/strategy_service/api/config_routes.py |
See Also¶
- Strategy Lifecycle -- full path from idea to production
- ML Lifecycle Architecture -- detailed ML pipeline design
- Configuration Management Architecture -- system-level config design
- State Machines -- all state machine definitions