Skip to content

update-status

Updates job status in DynamoDB workflow-state table. Called by Step Functions at key workflow transitions (RUNNING, COMPLETED, FAILED).

Overview

Property Value
Trigger Step Functions task callback
Runtime Python 3.11
Timeout 30 seconds
Memory 256 MB
Settings class UpdateStatusSettings

Input Schema

{
    "job_id": "run-abc-123",                    # Required: unique job identifier (DynamoDB key: run_id)
    "status": "RUNNING",                         # Required: PENDING | RUNNING | COMPLETED | FAILED | CANCELLED
    "trace_id": "trace-xyz-789",                # Optional: E2E correlation ID (P1.4)
    "execution_arn": "arn:aws:states:...",       # Optional: Step Functions execution ARN
    "started_at": "2024-02-07T12:00:00Z",       # Optional: ISO timestamp
    "completed_at": "2024-02-07T12:30:00Z",     # Optional: ISO timestamp
    "ecs_task_arn": "arn:aws:ecs:...",           # Optional: ECS task ARN
    "error": {                                   # Optional: error details (for FAILED)
        "Error": "States.TaskFailed",
        "Cause": "Container exited with code 1"
    }
}

trace_id Field

The trace_id field is persisted to DynamoDB for end-to-end correlation across workflow steps. When provided, it is written as a top-level attribute on the DynamoDB item, enabling cross-service debugging by querying jobs by trace ID.

Output Schema

Responses use Step Functions compatible format via LambdaResponse.to_step_functions():

Success

{
    "statusCode": 200,
    "body": {
        "success": true,
        "data": {
            "job_id": "run-abc-123",
            "status": "running",
            "updated": true
        },
        "environment": "dev"
    }
}

Stale Update (Rejected Transition)

{
    "statusCode": 200,
    "body": {
        "success": true,
        "data": {
            "job_id": "run-abc-123",
            "status": "running",
            "updated": false,
            "reason": "stale_or_invalid_transition"
        },
        "environment": "dev"
    }
}

When a ConditionalCheckFailedException occurs (invalid transition or stale update), the handler returns HTTP 200 with updated: false rather than failing. This prevents Step Functions from retrying an update that will never succeed.

Environment Variables

Variable Required Default Description
WORKFLOW_STATE_TABLE Yes - DynamoDB table for workflow state
ENVIRONMENT No dev Environment name

State Transition Guards

The lambda enforces valid state transitions using DynamoDB conditional expressions. Each transition condition includes attribute_exists(run_id) to verify the item exists.

Target Status Allowed From
PENDING PENDING
RUNNING PENDING, RUNNING
COMPLETED RUNNING, COMPLETED
FAILED PENDING, RUNNING, FAILED
CANCELLED PENDING, RUNNING, CANCELLED

Transition Condition Implementation

The _build_transition_condition() function builds DynamoDB ConditionExpression strings with only the ExpressionAttributeValues that are actually referenced, because DynamoDB rejects unused values with ValidationException.

Execution ARN Guard

When execution_arn is provided, an additional condition is appended:

AND (attribute_not_exists(execution_arn) OR execution_arn = :execution_arn)

This prevents a different Step Functions execution from overwriting another execution's state.

DynamoDB Update Details

The update writes these fields:

Field When Written Description
status Always Job status value (lowercase enum value)
updated_at Always ISO timestamp of update
execution_arn If provided Step Functions execution ARN
started_at If provided Job start timestamp
completed_at If provided Job completion timestamp
ecs_task_arn If provided ECS task ARN
trace_id If provided E2E correlation ID
error_message On FAILED status Error details (truncated to 2000 chars)

Error Message Handling

For FAILED status, the error field is processed: - If error is a dict (Step Functions format): extracts Error and Cause fields - If error is a string: used directly - Truncated to 2000 characters to avoid DynamoDB item size limits

Key Features

  • Conditional transitions: Validates status transitions to prevent stale/invalid updates
  • Stale update handling: Returns updated: false (HTTP 200) instead of failing, allowing Step Functions to continue
  • trace_id for E2E correlation: Persists trace_id for cross-service debugging
  • Execution ARN isolation: Guards against duplicate execution ARN updates from different workflows
  • Error truncation: Limits error messages to 2000 chars for DynamoDB item size
  • Terminal state tracking: Publishes separate JobsTerminated metric for COMPLETED and FAILED

CloudWatch Metrics

Namespace suffix: WorkflowStatus

Metric Dimensions Description
StatusUpdates Status, Success, Environment Count of status update attempts
JobsTerminated Status, Environment Count of jobs reaching terminal state (COMPLETED or FAILED)

Step Functions Integration

This lambda is called at multiple points in the backtest workflow (v11):

  1. UpdateStatusRunning -- Before ECS task launch
  2. UpdateStatusCompleted -- After successful backtest (inside HandleSuccess parallel state)
  3. UpdateStatusFailed -- On any failure, after cleanup
  4. UpdateStatusValidationFailed -- On strategy validation failure

See Also

Related Lambdas:

Architecture:

SDK: