drift-monitor¶
Monitors model drift by comparing recent backtest metrics against reference baselines using PSI.
Overview¶
| Property | Value |
|---|---|
| Trigger | EventBridge scheduled event |
| Runtime | Python 3.11 |
| Timeout | 300 seconds |
| Memory | 512 MB |
Input Schema¶
{
"models": ["PascalStrategy", "RadStrategy"], # Default: predefined list
"experiment_prefix": "backtest-", # MLflow experiment prefix
"reference_period_days": 30, # Reference baseline period
"current_period_days": 7 # Current evaluation period
}
Output Schema¶
{
"summary": {
"models_analyzed": 2,
"drifted": 1,
"errors": 0
},
"results": [
{
"model_name": "PascalStrategy",
"status": "analyzed",
"overall_psi": 0.35,
"severity": "significant",
"reference_period": {
"start": "2024-01-01T00:00:00Z",
"end": "2024-01-31T00:00:00Z",
"samples": 45
},
"current_period": {
"start": "2024-02-01T00:00:00Z",
"end": "2024-02-07T00:00:00Z",
"samples": 12
},
"metric_drifts": [
{
"metric": "profit_total",
"psi": 0.45,
"severity": "significant",
"reference_mean": 1250.5,
"current_mean": 890.2
}
],
"requires_attention": true,
"timestamp": "2024-02-07T12:00:00Z"
}
]
}
Environment Variables¶
| Variable | Required | Default | Description |
|---|---|---|---|
MLFLOW_TRACKING_URI | Yes | - | MLflow server URL |
MLFLOW_TRACKING_USERNAME | No | - | MLflow username |
MLFLOW_TRACKING_PASSWORD | No | - | MLflow password |
DYNAMODB_TABLE_NAME | Yes | - | State repository table |
ALERT_SNS_TOPIC_ARN | Yes | - | SNS topic for alerts |
PSI_MODERATE_THRESHOLD | No | 0.1 | Moderate drift threshold |
PSI_SIGNIFICANT_THRESHOLD | No | 0.25 | Significant drift threshold |
REFERENCE_PERIOD_DAYS | No | 30 | Reference period length |
CURRENT_PERIOD_DAYS | No | 7 | Current period length |
MIN_SAMPLES | No | 10 | Minimum samples required |
Monitored Metrics¶
| Metric | Description |
|---|---|
profit_total | Total profit from backtest |
win_rate | Trade win percentage |
sharpe_ratio | Risk-adjusted return |
max_drawdown | Maximum portfolio drawdown |
trades_count | Number of trades executed |
PSI Severity Levels¶
| PSI Range | Severity | Action |
|---|---|---|
| < 0.1 | None | No action needed |
| 0.1 - 0.25 | Moderate | Monitor closely |
| > 0.25 | Significant | Trigger retraining |
CloudWatch Metrics¶
| Metric | Description |
|---|---|
OverallPSI | Overall PSI score per model |
DriftDetected | Count of drift detections |
PSI_profit_total | PSI for profit metric |
PSI_sharpe_ratio | PSI for Sharpe ratio |
Key Features¶
- Uses numpy for vectorized PSI calculation
- Implements quantile-based binning (10 bins)
- Tracks drift state to avoid duplicate alerts
- Sends detailed SNS alerts with metric breakdown
EventBridge Schedule¶
{
"ScheduleExpression": "rate(6 hours)",
"Targets": [{
"Arn": "arn:aws:lambda:...:drift-monitor",
"Input": "{\"models\": [\"PascalStrategy\"]}"
}]
}
See Also¶
Related Lambdas:
- check-retraining-needed - Uses drift state for decisions
- retraining-scheduler - Triggers retraining on drift
- model-rollback - Rollback on severe drift
Architecture:
- ML Lifecycle - Drift detection in ML pipeline
- Architecture Overview - Drift monitoring diagram
SDK:
- tradai-common - MLflow integration
CLI:
- CLI Reference -
tradai monitorcommands