model-rollback¶
Rolls back a model to a previous version when performance degrades or alarms trigger.
Overview¶
| Property | Value |
|---|---|
| Trigger | CloudWatch Alarm / Step Functions |
| Runtime | Python 3.11 |
| Timeout | 120 seconds |
| Memory | 256 MB |
Input Schema¶
{
"model_name": "PascalStrategy", # Required
"target_version": "2", # Optional, uses latest Archived if not specified
"reason": "PERFORMANCE_DEGRADATION", # Default: "MANUAL_REQUEST"
"alarm_name": "high-drawdown-alarm", # Optional
"dry_run": false # Default: false
}
Rollback Reasons: - MANUAL_REQUEST - Manual rollback request - PERFORMANCE_DEGRADATION - Performance metrics degraded - TEST_FAILURE - Post-deployment tests failed - DRIFT_SEVERE - Severe model drift detected
Output Schema¶
{
"rolled_back": true,
"model_name": "PascalStrategy",
"from_version": "3", # Previous Production version
"to_version": "2", # New Production version
"reason": "PERFORMANCE_DEGRADATION",
"alarm_name": "high-drawdown-alarm",
"timestamp": "2024-01-01T12:00:00Z"
}
Environment Variables¶
| Variable | Required | Default | Description |
|---|---|---|---|
MLFLOW_TRACKING_URI | Yes | - | MLflow server URL |
MLFLOW_TRACKING_USERNAME | No | - | MLflow username |
MLFLOW_TRACKING_PASSWORD | No | - | MLflow password |
ROLLBACK_STATE_TABLE | No | "tradai-rollback-state" | DynamoDB state table |
ROLLBACK_COOLDOWN_HOURS | No | 24 | Min hours between rollbacks |
ALERT_SNS_TOPIC_ARN | Yes | - | SNS topic for notifications |
Rollback Process¶
flowchart TD
A[Rollback Request] --> B{Dry run?}
B -->|Yes| C[Return preview]
B -->|No| D{Cooldown check}
D -->|In cooldown| E[Reject: Too recent]
D -->|OK| F[Archive current Production]
F --> G[Promote target to Production]
G --> H[Record rollback state]
H --> I[Send SNS notification] Key Features¶
- Enforces cooldown period to prevent thrashing
- Dry-run mode for inspection without execution
- Archives old Production version before promoting new one
- Tracks rollback count and reason history
- Sends detailed SNS notifications
CloudWatch Alarm Integration¶
{
"AlarmName": "high-drawdown-alarm",
"AlarmActions": [
"arn:aws:lambda:...:model-rollback"
],
"Dimensions": [
{
"Name": "ModelName",
"Value": "PascalStrategy"
}
]
}
SNS Notification Format¶
{
"subject": "Model Rollback: PascalStrategy",
"message": {
"model_name": "PascalStrategy",
"from_version": "3",
"to_version": "2",
"reason": "PERFORMANCE_DEGRADATION",
"triggered_by": "high-drawdown-alarm",
"timestamp": "2024-01-01T12:00:00Z"
}
}
Related¶
- promote-model - Forward promotion
- compare-models - Model comparison before promotion