Skip to content

drift-monitor

Monitors model drift by comparing recent backtest metrics against reference baselines using PSI.

Overview

Property Value
Trigger EventBridge scheduled event
Runtime Python 3.11
Timeout 300 seconds
Memory 512 MB

Input Schema

{
    "models": ["PascalStrategy", "RadStrategy"],  # Default: predefined list
    "experiment_prefix": "backtest-",             # MLflow experiment prefix
    "reference_period_days": 30,                  # Reference baseline period
    "current_period_days": 7                      # Current evaluation period
}

Output Schema

{
    "summary": {
        "models_analyzed": 2,
        "drifted": 1,
        "errors": 0
    },
    "results": [
        {
            "model_name": "PascalStrategy",
            "status": "analyzed",
            "overall_psi": 0.35,
            "severity": "significant",
            "reference_period": {
                "start": "2024-01-01T00:00:00Z",
                "end": "2024-01-31T00:00:00Z",
                "samples": 45
            },
            "current_period": {
                "start": "2024-02-01T00:00:00Z",
                "end": "2024-02-07T00:00:00Z",
                "samples": 12
            },
            "metric_drifts": [
                {
                    "metric": "profit_total",
                    "psi": 0.45,
                    "severity": "significant",
                    "reference_mean": 1250.5,
                    "current_mean": 890.2
                }
            ],
            "requires_attention": true,
            "timestamp": "2024-02-07T12:00:00Z"
        }
    ]
}

Environment Variables

Variable Required Default Description
MLFLOW_TRACKING_URI Yes - MLflow server URL
MLFLOW_TRACKING_USERNAME No - MLflow username
MLFLOW_TRACKING_PASSWORD No - MLflow password
DYNAMODB_TABLE_NAME Yes - State repository table
ALERT_SNS_TOPIC_ARN Yes - SNS topic for alerts
PSI_MODERATE_THRESHOLD No 0.1 Moderate drift threshold
PSI_SIGNIFICANT_THRESHOLD No 0.25 Significant drift threshold
REFERENCE_PERIOD_DAYS No 30 Reference period length
CURRENT_PERIOD_DAYS No 7 Current period length
MIN_SAMPLES No 10 Minimum samples required

Monitored Metrics

Metric Description
profit_total Total profit from backtest
win_rate Trade win percentage
sharpe_ratio Risk-adjusted return
max_drawdown Maximum portfolio drawdown
trades_count Number of trades executed

PSI Severity Levels

PSI Range Severity Action
< 0.1 None No action needed
0.1 - 0.25 Moderate Monitor closely
> 0.25 Significant Trigger retraining

CloudWatch Metrics

Metric Description
OverallPSI Overall PSI score per model
DriftDetected Count of drift detections
PSI_profit_total PSI for profit metric
PSI_sharpe_ratio PSI for Sharpe ratio

Key Features

  • Uses numpy for vectorized PSI calculation
  • Implements quantile-based binning (10 bins)
  • Tracks drift state to avoid duplicate alerts
  • Sends detailed SNS alerts with metric breakdown

EventBridge Schedule

{
  "ScheduleExpression": "rate(6 hours)",
  "Targets": [{
    "Arn": "arn:aws:lambda:...:drift-monitor",
    "Input": "{\"models\": [\"PascalStrategy\"]}"
  }]
}

See Also

Related Lambdas:

Architecture:

SDK:

CLI: