Skip to content

cleanup-resources

Cleans up orphaned ECS tasks when Step Functions workflows fail.

Overview

Property Value
Trigger Step Functions Catch block
Runtime Python 3.11
Timeout 120 seconds
Memory 256 MB

Input Schema

{
    "job_id": "uuid-string",
    "execution_arn": "arn:aws:states:...",
    "strategy_name": "MyStrategy",
    "error": {
        "Error": "States.TaskFailed",
        "Cause": "Task timeout"
    },
    "cluster": "tradai-cluster"  # Optional, uses ECS_CLUSTER if not provided
}

Output Schema

{
    "statusCode": 200,
    "job_id": "uuid-string",
    "execution_arn": "arn:aws:states:...",
    "tasks_stopped": 2,
    "stopped_task_arns": [
        "arn:aws:ecs:...:task/abc123",
        "arn:aws:ecs:...:task/def456"
    ]
}

Environment Variables

Variable Required Default Description
ECS_CLUSTER Yes - ECS cluster name or ARN

AWS Services Used

  • ECS - Lists running tasks and stops them by tags
  • CloudWatch - Publishes cleanup metrics

CloudWatch Metrics

Metric Description
CleanupInvocations Number of cleanup invocations
TasksStopped Number of ECS tasks stopped

Key Features

  • Searches for tasks by job_id and strategy_name tags
  • Uses pagination to handle large task lists (100 tasks per batch)
  • Publishes metrics for cleanup success tracking
  • Handles both tagged and untagged task identification

Step Functions Integration

{
  "RunBacktest": {
    "Type": "Task",
    "Resource": "arn:aws:ecs:...",
    "Catch": [
      {
        "ErrorEquals": ["States.ALL"],
        "ResultPath": "$.error",
        "Next": "CleanupResources"
      }
    ],
    "Next": "Success"
  },
  "CleanupResources": {
    "Type": "Task",
    "Resource": "arn:aws:lambda:...:cleanup-resources",
    "Parameters": {
      "job_id.$": "$.job_id",
      "execution_arn.$": "$$.Execution.Id",
      "strategy_name.$": "$.strategy_name",
      "error.$": "$.error"
    },
    "Next": "FailWorkflow"
  }
}