retraining-scheduler¶
Schedules and triggers model retraining based on drift detection, scheduled intervals, or manual requests.
Overview¶
| Property | Value |
|---|---|
| Trigger | EventBridge / Manual |
| Runtime | Python 3.11 |
| Timeout | 300 seconds |
| Memory | 512 MB |
Input Schema¶
{
"models": [ # Optional, uses DEFAULT_MODELS if not provided
{
"name": "PascalStrategy",
"strategy": "PascalFreqAIStrategy",
"freqai_model": "LightGBMRegressor",
"train_period_days": 30,
"pairs": ["BTC/USDT:USDT", "ETH/USDT:USDT"],
"timeframe": "1h"
}
],
"force": false, # Force retraining regardless of state
"trigger": "manual" # Optional: override trigger type
}
Output Schema¶
{
"success": true,
"data": {
"summary": {
"models_evaluated": 2,
"retraining_triggered": 1,
"skipped": 1,
"errors": 0
},
"results": [
{
"model_name": "PascalStrategy",
"status": "triggered",
"trigger": "drift_detected",
"task_arn": "arn:aws:ecs:...:task/abc123",
"retraining_triggered": true,
"timestamp": "2024-01-01T12:00:00Z"
},
{
"model_name": "RadStrategy",
"status": "skipped",
"reason": "recently_retrained",
"hours_since_last": 12.5,
"retraining_triggered": false,
"timestamp": "2024-01-01T12:00:00Z"
}
]
}
}
Environment Variables¶
| Variable | Required | Default | Description |
|---|---|---|---|
ECS_CLUSTER | Yes | - | ECS cluster name/ARN |
ECS_SUBNETS | Yes | - | Comma-separated subnet IDs |
ECS_SECURITY_GROUPS | Yes | - | Comma-separated SG IDs |
ECS_TASK_DEFINITION_PREFIX | No | "tradai-" | Task definition prefix |
ECS_CONTAINER_NAME | No | "strategy" | Container name for overrides |
USE_SPOT | No | false | Use Fargate Spot instances |
RETRAINING_STATE_TABLE | Yes | - | DynamoDB state table |
DRIFT_STATE_TABLE | Yes | - | Drift detection state table |
MIN_HOURS_BETWEEN_RETRAINING | No | 24 | Minimum cooldown |
RETRAINING_INTERVAL_DAYS | No | 7 | Scheduled retraining interval |
MLFLOW_TRACKING_URI | No | - | MLflow server URL |
ALERT_SNS_TOPIC_ARN | Yes | - | SNS topic ARN |
Trigger Types¶
| Trigger | Description | Priority |
|---|---|---|
drift_detected | Significant PSI drift detected | High |
scheduled | Periodic retraining interval reached | Medium |
manual | Explicit user request | Highest |
Retraining Decision Flow¶
flowchart TD
A[Evaluate Model] --> B{Force flag?}
B -->|Yes| C[Trigger: manual]
B -->|No| D{Recently retrained?}
D -->|Yes| E[Skip: cooldown]
D -->|No| F{Drift detected?}
F -->|Yes| G[Trigger: drift_detected]
F -->|No| H{Scheduled interval due?}
H -->|Yes| I[Trigger: scheduled]
H -->|No| J[Skip: no trigger]
C --> K[Launch ECS Task]
G --> K
I --> K
K --> L[Update State]
L --> M[Send Notification] CloudWatch Metrics¶
| Metric | Description |
|---|---|
RetrainingTriggered | Count of triggered retraining jobs |
RetrainingTrigger_drift_detected | Drift-triggered retraining count |
RetrainingTrigger_scheduled | Scheduled retraining count |
RetrainingTrigger_manual | Manual retraining count |
ECS Task Configuration¶
The Lambda launches Fargate tasks with: - Environment variables: TRADING_MODE=train, STRATEGY, MODEL_NAME, etc. - Capacity provider: FARGATE_SPOT if enabled, otherwise FARGATE - Container command not overridden - uses ENTRYPOINT from image
EventBridge Schedule¶
{
"ScheduleExpression": "rate(6 hours)",
"Targets": [{
"Arn": "arn:aws:lambda:...:retraining-scheduler",
"Input": "{\"models\": [{\"name\": \"PascalStrategy\", \"strategy\": \"PascalFreqAIStrategy\"}]}"
}]
}
SNS Notification Format¶
TradAI Model Retraining Notification
Environment: prod
Model: PascalStrategy
Trigger: Drift Detected
Task ARN: arn:aws:ecs:...:task/abc123
Timestamp: 2024-01-01T12:00:00Z
A model retraining job has been triggered. You will receive another
notification when the training completes.
Reason: Significant drift was detected in model predictions.
See Also¶
Related Lambdas:
- drift-monitor - Detects model drift (triggers retraining)
- check-retraining-needed - Step Functions decision variant
- compare-models - Champion vs challenger comparison
- promote-model - Model promotion after training
Architecture:
- ML Lifecycle - Full ML training pipeline
- Architecture Overview - Pipeline diagram
Services:
- Strategy Service - Model registry integration
CLI:
- CLI Reference - A/B testing commands