update-status¶
Updates job status in DynamoDB workflow-state table. Called by Step Functions at key workflow transitions (RUNNING, COMPLETED, FAILED).
Overview¶
| Property | Value |
|---|---|
| Trigger | Step Functions task callback |
| Runtime | Python 3.11 |
| Timeout | 30 seconds |
| Memory | 256 MB |
| Settings class | UpdateStatusSettings |
Input Schema¶
{
"job_id": "run-abc-123", # Required: unique job identifier (DynamoDB key: run_id)
"status": "RUNNING", # Required: PENDING | RUNNING | COMPLETED | FAILED | CANCELLED
"trace_id": "trace-xyz-789", # Optional: E2E correlation ID (P1.4)
"execution_arn": "arn:aws:states:...", # Optional: Step Functions execution ARN
"started_at": "2024-02-07T12:00:00Z", # Optional: ISO timestamp
"completed_at": "2024-02-07T12:30:00Z", # Optional: ISO timestamp
"ecs_task_arn": "arn:aws:ecs:...", # Optional: ECS task ARN
"error": { # Optional: error details (for FAILED)
"Error": "States.TaskFailed",
"Cause": "Container exited with code 1"
}
}
trace_id Field¶
The trace_id field is persisted to DynamoDB for end-to-end correlation across workflow steps. When provided, it is written as a top-level attribute on the DynamoDB item, enabling cross-service debugging by querying jobs by trace ID.
Output Schema¶
Responses use Step Functions compatible format via LambdaResponse.to_step_functions():
Success¶
{
"statusCode": 200,
"body": {
"success": true,
"data": {
"job_id": "run-abc-123",
"status": "running",
"updated": true
},
"environment": "dev"
}
}
Stale Update (Rejected Transition)¶
{
"statusCode": 200,
"body": {
"success": true,
"data": {
"job_id": "run-abc-123",
"status": "running",
"updated": false,
"reason": "stale_or_invalid_transition"
},
"environment": "dev"
}
}
When a ConditionalCheckFailedException occurs (invalid transition or stale update), the handler returns HTTP 200 with updated: false rather than failing. This prevents Step Functions from retrying an update that will never succeed.
Environment Variables¶
| Variable | Required | Default | Description |
|---|---|---|---|
WORKFLOW_STATE_TABLE | Yes | - | DynamoDB table for workflow state |
ENVIRONMENT | No | dev | Environment name |
State Transition Guards¶
The lambda enforces valid state transitions using DynamoDB conditional expressions. Each transition condition includes attribute_exists(run_id) to verify the item exists.
| Target Status | Allowed From |
|---|---|
PENDING | PENDING |
RUNNING | PENDING, RUNNING |
COMPLETED | RUNNING, COMPLETED |
FAILED | PENDING, RUNNING, FAILED |
CANCELLED | PENDING, RUNNING, CANCELLED |
Transition Condition Implementation¶
The _build_transition_condition() function builds DynamoDB ConditionExpression strings with only the ExpressionAttributeValues that are actually referenced, because DynamoDB rejects unused values with ValidationException.
Execution ARN Guard¶
When execution_arn is provided, an additional condition is appended:
This prevents a different Step Functions execution from overwriting another execution's state.
DynamoDB Update Details¶
The update writes these fields:
| Field | When Written | Description |
|---|---|---|
status | Always | Job status value (lowercase enum value) |
updated_at | Always | ISO timestamp of update |
execution_arn | If provided | Step Functions execution ARN |
started_at | If provided | Job start timestamp |
completed_at | If provided | Job completion timestamp |
ecs_task_arn | If provided | ECS task ARN |
trace_id | If provided | E2E correlation ID |
error_message | On FAILED status | Error details (truncated to 2000 chars) |
Error Message Handling¶
For FAILED status, the error field is processed: - If error is a dict (Step Functions format): extracts Error and Cause fields - If error is a string: used directly - Truncated to 2000 characters to avoid DynamoDB item size limits
Key Features¶
- Conditional transitions: Validates status transitions to prevent stale/invalid updates
- Stale update handling: Returns
updated: false(HTTP 200) instead of failing, allowing Step Functions to continue - trace_id for E2E correlation: Persists trace_id for cross-service debugging
- Execution ARN isolation: Guards against duplicate execution ARN updates from different workflows
- Error truncation: Limits error messages to 2000 chars for DynamoDB item size
- Terminal state tracking: Publishes separate
JobsTerminatedmetric forCOMPLETEDandFAILED
CloudWatch Metrics¶
Namespace suffix: WorkflowStatus
| Metric | Dimensions | Description |
|---|---|---|
StatusUpdates | Status, Success, Environment | Count of status update attempts |
JobsTerminated | Status, Environment | Count of jobs reaching terminal state (COMPLETED or FAILED) |
Step Functions Integration¶
This lambda is called at multiple points in the backtest workflow (v11):
- UpdateStatusRunning -- Before ECS task launch
- UpdateStatusCompleted -- After successful backtest (inside
HandleSuccessparallel state) - UpdateStatusFailed -- On any failure, after cleanup
- UpdateStatusValidationFailed -- On strategy validation failure
See Also¶
Related Lambdas:
- Backtest Consumer - Triggers backtest workflows
- Cleanup Resources - Cleans up on failure
- Notify Completion - Sends completion notifications
Architecture:
- Step Functions - Workflow definitions
- Services - Service architecture
SDK:
- tradai-common - Lambda handler utilities, entities