Skip to content

drift-monitor¶

Monitors model drift by comparing recent backtest metrics against reference baselines using PSI.

Overview¶

Property	Value
Trigger	EventBridge scheduled event
Runtime	Python 3.11
Timeout	300 seconds
Memory	512 MB

Input Schema¶

{
    "models": ["PascalStrategy", "RadStrategy"],  # Default: predefined list
    "experiment_prefix": "backtest-",             # MLflow experiment prefix
    "reference_period_days": 30,                  # Reference baseline period
    "current_period_days": 7                      # Current evaluation period
}

Output Schema¶

{
    "summary": {
        "models_analyzed": 2,
        "drifted": 1,
        "errors": 0
    },
    "results": [
        {
            "model_name": "PascalStrategy",
            "status": "analyzed",
            "overall_psi": 0.35,
            "severity": "significant",
            "reference_period": {
                "start": "2024-01-01T00:00:00Z",
                "end": "2024-01-31T00:00:00Z",
                "samples": 45
            },
            "current_period": {
                "start": "2024-02-01T00:00:00Z",
                "end": "2024-02-07T00:00:00Z",
                "samples": 12
            },
            "metric_drifts": [
                {
                    "metric": "profit_total",
                    "psi": 0.45,
                    "severity": "significant",
                    "reference_mean": 1250.5,
                    "current_mean": 890.2
                }
            ],
            "requires_attention": true,
            "timestamp": "2024-02-07T12:00:00Z"
        }
    ]
}

Environment Variables¶

Variable	Required	Default	Description
`MLFLOW_TRACKING_URI`	Yes	-	MLflow server URL
`MLFLOW_TRACKING_USERNAME`	No	-	MLflow username
`MLFLOW_TRACKING_PASSWORD`	No	-	MLflow password
`DYNAMODB_TABLE_NAME`	Yes	-	State repository table
`ALERT_SNS_TOPIC_ARN`	Yes	-	SNS topic for alerts
`PSI_MODERATE_THRESHOLD`	No	0.1	Moderate drift threshold
`PSI_SIGNIFICANT_THRESHOLD`	No	0.25	Significant drift threshold
`REFERENCE_PERIOD_DAYS`	No	30	Reference period length
`CURRENT_PERIOD_DAYS`	No	7	Current period length
`MIN_SAMPLES`	No	10	Minimum samples required

Monitored Metrics¶

Metric	Description
`profit_total`	Total profit from backtest
`win_rate`	Trade win percentage
`sharpe_ratio`	Risk-adjusted return
`max_drawdown`	Maximum portfolio drawdown
`trades_count`	Number of trades executed

PSI Severity Levels¶

PSI Range	Severity	Action
< 0.1	None	No action needed
0.1 - 0.25	Moderate	Monitor closely
> 0.25	Significant	Trigger retraining

CloudWatch Metrics¶

Metric	Description
`OverallPSI`	Overall PSI score per model
`DriftDetected`	Count of drift detections
`PSI_profit_total`	PSI for profit metric
`PSI_sharpe_ratio`	PSI for Sharpe ratio

Key Features¶

Uses numpy for vectorized PSI calculation
Implements quantile-based binning (10 bins)
Tracks drift state to avoid duplicate alerts
Sends detailed SNS alerts with metric breakdown

EventBridge Schedule¶

{
  "ScheduleExpression": "rate(6 hours)",
  "Targets": [{
    "Arn": "arn:aws:lambda:...:drift-monitor",
    "Input": "{\"models\": [\"PascalStrategy\"]}"
  }]
}

See Also¶

Related Lambdas:

check-retraining-needed - Uses drift state for decisions
retraining-scheduler - Triggers retraining on drift
model-rollback - Rollback on severe drift

Architecture:

ML Lifecycle - Drift detection in ML pipeline
Architecture Overview - Drift monitoring diagram

SDK:

tradai-common - MLflow integration

CLI:

CLI Reference - tradai monitor commands