check-retraining-needed¶
Evaluates whether a model requires retraining based on drift detection, scheduled intervals, or manual triggers.
Overview¶
| Property | Value |
|---|---|
| Trigger | Step Functions state machine |
| Runtime | Python 3.11 |
| Timeout | 30 seconds |
| Memory | 256 MB |
Input Schema¶
{
"model_name": "PascalStrategy", # Required
"force": false, # Optional, forces retraining
"manual_trigger": false # Optional, marks as manual request
}
Output Schema¶
{
"model_name": "PascalStrategy",
"decision": "needs_retraining", # needs_retraining | recently_trained | no_retraining | invalid_model
"trigger": "drift_detected", # manual | drift_detected | scheduled
"reason": "Significant drift detected (PSI=0.350)",
"hours_since_retrain": 168.5,
"days_since_retrain": 7,
"drift_severity": "significant",
"drift_psi": 0.35
}
Environment Variables¶
| Variable | Required | Default | Description |
|---|---|---|---|
DYNAMODB_TABLE_NAME | Yes | - | State repository table |
DRIFT_STATE_TABLE | No | "tradai-drift-state" | Drift state table |
RETRAINING_STATE_TABLE | No | "tradai-retraining-state" | Retraining state table |
RETRAINING_INTERVAL_DAYS | No | 7 | Days between scheduled retraining |
MIN_HOURS_BETWEEN_RETRAINING | No | 24 | Minimum hours between attempts |
ALLOWED_MODELS | No | "" (all) | Comma-separated allowlist of valid model names. When set, only listed models are accepted; others return invalid_model. Empty string permits any model that passes format validation. |
Decision Logic¶
flowchart TD
A[Start] --> V{model_name format valid?}
V -->|No| X[INVALID_MODEL]
V -->|Yes| AL{In ALLOWED_MODELS?}
AL -->|No| X
AL -->|Yes / allowlist empty| B{Force or manual_trigger?}
B -->|Yes| C[NEEDS_RETRAINING: MANUAL]
B -->|No| MH{Hours since last retrain?}
MH -->|< min_hours| I[RECENTLY_TRAINED]
MH -->|>= min_hours| D{Check drift state}
D -->|Significant drift| E[NEEDS_RETRAINING: DRIFT_DETECTED]
D -->|No drift| F{Days since last?}
F -->|>= interval| H[NEEDS_RETRAINING: SCHEDULED]
F -->|< interval| J[NO_RETRAINING] Key Features¶
- Input validation: rejects
model_namenot matching^[A-Z][A-Za-z0-9_]{1,49}\Z(PascalCase identifier, max 50 chars,\Zblocks trailing newlines) - Allowlist gate: when
ALLOWED_MODELSis set, rejects any model not in the list — prevents bogus strategy names from wasting Fargate compute - Checks drift severity against thresholds
- Validates minimum interval between retraining attempts
- Supports forced retraining and manual triggers
- Returns decision rationale for audit trail
Step Functions Integration¶
The Lambda feeds the EvaluateRetrainingNeed Choice state, which branches on $.check_result.decision:
| Decision | Route |
|---|---|
needs_retraining | RunRetraining (ECS Fargate) |
invalid_model | HandleInvalidModel → NotifyFailure |
recently_trained / no_retraining | SkipRetraining |
See infra/compute/asl_templates/retraining_workflow.json.j2 for the full ASL definition.
Related¶
- drift-monitor - Detects drift that triggers retraining
- retraining-scheduler - Schedules retraining tasks