Skip to content

TradAI Final Architecture - Step Functions Workflows

Version: 9.2 | Date: 2025-12-09


When to Use Step Functions

Step Functions is one of four execution modes for backtesting (see 05-SERVICES.md Section 6).

Mode Use Case Use Step Functions?
local Development/testing NO - Docker only
ecs Simple production backtests NO - Direct ECS launch
sqs High-volume with backpressure NO - SQS → Lambda → ECS
stepfunctions Complex multi-step workflows YES

Use Step Functions when you need: - Data freshness validation before backtest - Strategy validation (ECR/S3 checks) - Multi-step workflows (data sync → backtest → analyze) - Visual debugging and execution history - Complex retry/catch logic

Skip Step Functions when: - Simple backtest execution (use ECS direct or SQS mode) - Development/testing (use local mode) - Container handles its own status updates (v9.2 architecture)


Workflow Overview

Workflow Type States Duration Trigger
Backtest Workflow (Full) STANDARD 9 10-70 min SQS → Lambda
Backtest Workflow (Simplified) STANDARD 5 10-60 min SQS → Lambda
Data Sync Workflow STANDARD 5 10-30 min API Gateway
Deploy Workflow STANDARD 6 5-15 min API Gateway

Critical: All workflows use STANDARD type (not EXPRESS) because: - Backtests can run 30-60+ minutes - EXPRESS has 5-minute maximum duration - STANDARD provides execution history for 90 days


1. Backtest Workflow

Workflow Diagram

                                    ┌──────────────────┐
                                    │      START       │
                                    └────────┬─────────┘
                              ┌──────────────┴──────────────┐
                              │    Parallel Validation      │
                              │                             │
                    ┌─────────┴─────────┐     ┌────────────┴────────────┐
                    │  ValidateStrategy │     │   CheckDataFreshness    │
                    │     (Lambda)      │     │       (Lambda)          │
                    └─────────┬─────────┘     └────────────┬────────────┘
                              │                            │
                              └──────────────┬─────────────┘
                                    ┌────────┴────────┐
                                    │ EvaluateResults │
                                    │    (Choice)     │
                                    └────────┬────────┘
                           ┌─────────────────┼─────────────────┐
                           │ Valid           │ Invalid         │ Data Stale
                           ▼                 ▼                 ▼
                    ┌──────────────┐  ┌─────────────┐  ┌──────────────┐
                    │ PrepareConfig│  │FailValidation│  │TriggerDataSync│
                    │  (ECS Task)  │  │   (Fail)    │  │  (Nested WF)  │
                    └──────┬───────┘  └─────────────┘  └──────┬───────┘
                           │                                   │
                           └───────────────┬───────────────────┘
                                  ┌────────┴────────┐
                                  │   RunBacktest   │
                                  │   (ECS Task)    │
                                  │  Timeout: 2hr   │
                                  │  Heartbeat: 5m  │
                                  └────────┬────────┘
                                  ┌────────┴────────┐
                                  │ TransformResults│
                                  │    (Lambda)     │
                                  └────────┬────────┘
                                  ┌────────┴────────┐
                                  │  UpdateState    │
                                  │  (DynamoDB)     │
                                  └────────┬────────┘
                                  ┌────────┴────────┐
                                  │ CleanupResources│
                                  │    (Lambda)     │
                                  └────────┬────────┘
                                  ┌────────┴────────┐
                                  │ NotifyCompletion│
                                  │    (Lambda)     │
                                  └────────┬────────┘
                                    ┌──────┴──────┐
                                    │     END     │
                                    └─────────────┘

State Machine Definition

{
  "Comment": "TradAI Backtest Workflow v8.0",
  "StartAt": "ParallelValidation",
  "TimeoutSeconds": 7200,
  "States": {

    "ParallelValidation": {
      "Type": "Parallel",
      "Comment": "Run validation and data check in parallel (saves 1-2 min)",
      "Branches": [
        {
          "StartAt": "ValidateStrategy",
          "States": {
            "ValidateStrategy": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:us-east-1:${AccountId}:function:tradai-validate-strategy",
              "Parameters": {
                "strategy_name.$": "$.strategy_name",
                "strategy_version.$": "$.strategy_version"
              },
              "ResultPath": "$.validation",
              "Retry": [
                {
                  "ErrorEquals": ["Lambda.ServiceException", "Lambda.TooManyRequestsException"],
                  "IntervalSeconds": 2,
                  "MaxAttempts": 3,
                  "BackoffRate": 2.0
                }
              ],
              "End": true
            }
          }
        },
        {
          "StartAt": "CheckDataFreshness",
          "States": {
            "CheckDataFreshness": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:us-east-1:${AccountId}:function:tradai-data-collection-proxy",
              "Parameters": {
                "operation": "check-freshness",
                "symbols.$": "$.symbols",
                "timeframe.$": "$.timeframe"
              },
              "ResultPath": "$.freshness",
              "Retry": [
                {
                  "ErrorEquals": ["Lambda.ServiceException"],
                  "IntervalSeconds": 1,
                  "MaxAttempts": 2
                }
              ],
              "End": true
            }
          }
        }
      ],
      "ResultPath": "$.parallel_results",
      "ResultSelector": {
        "validation.$": "$[0].validation",
        "freshness.$": "$[1].freshness"
      },
      "Catch": [
        {
          "ErrorEquals": ["States.ALL"],
          "ResultPath": "$.error",
          "Next": "UpdateStateFailed"
        }
      ],
      "Next": "EvaluateValidation"
    },

    "EvaluateValidation": {
      "Type": "Choice",
      "Choices": [
        {
          "And": [
            {
              "Variable": "$.parallel_results.validation.valid",
              "BooleanEquals": true
            },
            {
              "Variable": "$.parallel_results.freshness.all_fresh",
              "BooleanEquals": true
            }
          ],
          "Next": "PrepareConfig"
        },
        {
          "Variable": "$.parallel_results.validation.valid",
          "BooleanEquals": false,
          "Next": "FailValidation"
        },
        {
          "Variable": "$.parallel_results.freshness.all_fresh",
          "BooleanEquals": false,
          "Next": "FailDataStale"
        }
      ],
      "Default": "PrepareConfig"
    },

    "FailValidation": {
      "Type": "Fail",
      "Error": "ValidationError",
      "Cause": "Strategy validation failed. Check ECR image and S3 config exist."
    },

    "FailDataStale": {
      "Type": "Fail",
      "Error": "DataStaleError",
      "Cause": "Data is stale. Please trigger data-sync workflow first."
    },

    "PrepareConfig": {
      "Type": "Task",
      "Resource": "arn:aws:states:::ecs:runTask.sync",
      "Comment": "Launch Strategy Service Task to prepare config",
      "Parameters": {
        "Cluster": "tradai-cluster",
        "TaskDefinition": "tradai-strategy-service",
        "LaunchType": "FARGATE",
        "NetworkConfiguration": {
          "AwsvpcConfiguration": {
            "Subnets": ["${PrivateSubnet1}", "${PrivateSubnet2}"],
            "SecurityGroups": ["${ECSSecurityGroup}"],
            "AssignPublicIp": "DISABLED"
          }
        },
        "Overrides": {
          "ContainerOverrides": [
            {
              "Name": "strategy-service",
              "Environment": [
                {"Name": "COMMAND", "Value": "prepare-config"},
                {"Name": "INPUT_METHOD", "Value": "inline"},
                {"Name": "INPUT_JSON", "Value.$": "States.JsonToString($)"}
              ]
            }
          ]
        }
      },
      "ResultPath": "$.prepare_result",
      "ResultSelector": {
        "ecr_url.$": "$.Overrides.ContainerOverrides[0].Environment[?(@.Name=='OUTPUT_ECR_URL')].Value",
        "config_path.$": "$.Overrides.ContainerOverrides[0].Environment[?(@.Name=='OUTPUT_CONFIG_PATH')].Value"
      },
      "Retry": [
        {
          "ErrorEquals": ["ECS.AmazonECSException"],
          "IntervalSeconds": 5,
          "MaxAttempts": 2,
          "BackoffRate": 2.0
        }
      ],
      "Catch": [
        {
          "ErrorEquals": ["States.ALL"],
          "ResultPath": "$.error",
          "Next": "CleanupAfterFailure"
        }
      ],
      "Next": "RunBacktest"
    },

    "RunBacktest": {
      "Type": "Task",
      "Resource": "arn:aws:states:::ecs:runTask.sync",
      "Comment": "Run backtest in strategy container",
      "TimeoutSeconds": 7200,
      "HeartbeatSeconds": 300,
      "Parameters": {
        "Cluster": "tradai-cluster",
        "TaskDefinition": "tradai-strategy-container",
        "LaunchType": "FARGATE",
        "CapacityProviderStrategy": [
          {
            "CapacityProvider": "FARGATE_SPOT",
            "Weight": 1
          }
        ],
        "NetworkConfiguration": {
          "AwsvpcConfiguration": {
            "Subnets": ["${PrivateSubnet1}", "${PrivateSubnet2}"],
            "SecurityGroups": ["${ECSSecurityGroup}"],
            "AssignPublicIp": "DISABLED"
          }
        },
        "Overrides": {
          "ContainerOverrides": [
            {
              "Name": "strategy-container",
              "Image.$": "$.prepare_result.ecr_url",
              "Command": ["backtest", "--config"],
              "Environment": [
                {"Name": "TRADAI_RUN_ID", "Value.$": "$.run_id"},
                {"Name": "TRADAI_EXPERIMENT", "Value.$": "$.experiment_name"},
                {"Name": "CONFIG_S3_PATH", "Value.$": "$.prepare_result.config_path"}
              ]
            }
          ],
          "Cpu": "1024",
          "Memory": "2048"
        },
        "Tags": [
          {"Key": "RunId", "Value.$": "$.run_id"},
          {"Key": "Strategy", "Value.$": "$.strategy_name"},
          {"Key": "Workflow", "Value": "backtest"}
        ]
      },
      "ResultPath": "$.backtest_result",
      "Retry": [
        {
          "ErrorEquals": ["ECS.AmazonECSException"],
          "IntervalSeconds": 10,
          "MaxAttempts": 1
        }
      ],
      "Catch": [
        {
          "ErrorEquals": ["States.ALL"],
          "ResultPath": "$.error",
          "Next": "CleanupAfterFailure"
        }
      ],
      "Next": "TransformResults"
    },

    "TransformResults": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:${AccountId}:function:tradai-transform-results",
      "Parameters": {
        "run_id.$": "$.run_id",
        "strategy_name.$": "$.strategy_name",
        "backtest_output.$": "$.backtest_result"
      },
      "ResultPath": "$.transformed_result",
      "Retry": [
        {
          "ErrorEquals": ["Lambda.ServiceException"],
          "IntervalSeconds": 1,
          "MaxAttempts": 2
        }
      ],
      "Next": "UpdateStateCompleted"
    },

    "UpdateStateCompleted": {
      "Type": "Task",
      "Resource": "arn:aws:states:::dynamodb:updateItem",
      "Parameters": {
        "TableName": "tradai-workflow-state",
        "Key": {
          "run_id": {"S.$": "$.run_id"}
        },
        "UpdateExpression": "SET #status = :status, completed_at = :completed, result = :result",
        "ExpressionAttributeNames": {
          "#status": "status"
        },
        "ExpressionAttributeValues": {
          ":status": {"S": "COMPLETED"},
          ":completed": {"S.$": "$$.State.EnteredTime"},
          ":result": {"S.$": "States.JsonToString($.transformed_result)"}
        }
      },
      "ResultPath": null,
      "Next": "CleanupResources"
    },

    "UpdateStateFailed": {
      "Type": "Task",
      "Resource": "arn:aws:states:::dynamodb:updateItem",
      "Parameters": {
        "TableName": "tradai-workflow-state",
        "Key": {
          "run_id": {"S.$": "$.run_id"}
        },
        "UpdateExpression": "SET #status = :status, completed_at = :completed, error_info = :error",
        "ExpressionAttributeNames": {
          "#status": "status"
        },
        "ExpressionAttributeValues": {
          ":status": {"S": "FAILED"},
          ":completed": {"S.$": "$$.State.EnteredTime"},
          ":error": {"S.$": "States.JsonToString($.error)"}
        }
      },
      "ResultPath": null,
      "Next": "NotifyFailure"
    },

    "CleanupResources": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:${AccountId}:function:tradai-cleanup-resources",
      "Parameters": {
        "run_id.$": "$.run_id",
        "config_path.$": "$.prepare_result.config_path"
      },
      "ResultPath": null,
      "Next": "NotifyCompletion"
    },

    "CleanupAfterFailure": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:${AccountId}:function:tradai-cleanup-resources",
      "Parameters": {
        "run_id.$": "$.run_id",
        "config_path.$": "$.prepare_result.config_path"
      },
      "ResultPath": null,
      "Catch": [
        {
          "ErrorEquals": ["States.ALL"],
          "ResultPath": null,
          "Next": "UpdateStateFailed"
        }
      ],
      "Next": "UpdateStateFailed"
    },

    "NotifyCompletion": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:${AccountId}:function:tradai-notify-completion",
      "Parameters": {
        "run_id.$": "$.run_id",
        "status": "COMPLETED",
        "strategy_name.$": "$.strategy_name",
        "result.$": "$.transformed_result"
      },
      "End": true
    },

    "NotifyFailure": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:${AccountId}:function:tradai-notify-completion",
      "Parameters": {
        "run_id.$": "$.run_id",
        "status": "FAILED",
        "strategy_name.$": "$.strategy_name",
        "error.$": "$.error"
      },
      "End": true
    }
  }
}

2. Simplified Backtest Workflow (v9.2)

In the v9.2 architecture, the strategy container handles its own lifecycle (status updates, MLflow logging, S3 upload). This allows a simplified Step Functions workflow that only handles validation and orchestration.

Simplified Workflow Diagram

                                    ┌──────────────────┐
                                    │      START       │
                                    └────────┬─────────┘
                              ┌──────────────┴──────────────┐
                              │    Parallel Validation      │
                              │                             │
                    ┌─────────┴─────────┐     ┌────────────┴────────────┐
                    │  ValidateStrategy │     │   CheckDataFreshness    │
                    │     (Lambda)      │     │       (Lambda)          │
                    └─────────┬─────────┘     └────────────┬────────────┘
                              │                            │
                              └──────────────┬─────────────┘
                                    ┌────────┴────────┐
                                    │ EvaluateResults │
                                    │    (Choice)     │
                                    └────────┬────────┘
                                  ┌──────────┴──────────┐
                                  │ Valid              │ Invalid
                                  ▼                    ▼
                           ┌──────────────┐     ┌─────────────┐
                           │ RunBacktest  │     │FailWorkflow │
                           │  (ECS Task)  │     │   (Fail)    │
                           │              │     └─────────────┘
                           │ Container    │
                           │ handles:     │
                           │ - Status→DDB │
                           │ - MLflow log │
                           │ - S3 upload  │
                           └──────┬───────┘
                           ┌──────┴──────┐
                           │ Notify      │ (Optional)
                           │ Completion  │
                           └──────┬──────┘
                           ┌──────┴──────┐
                           │     END     │
                           └─────────────┘

Simplified State Machine Definition

{
  "Comment": "TradAI Backtest Workflow v9.2 (Simplified - Container handles status)",
  "StartAt": "ParallelValidation",
  "TimeoutSeconds": 7200,
  "States": {

    "ParallelValidation": {
      "Type": "Parallel",
      "Branches": [
        {
          "StartAt": "ValidateStrategy",
          "States": {
            "ValidateStrategy": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:us-east-1:${AccountId}:function:tradai-validate-strategy",
              "Parameters": {
                "strategy_name.$": "$.strategy_name",
                "strategy_version.$": "$.strategy_version"
              },
              "ResultPath": "$.validation",
              "End": true
            }
          }
        },
        {
          "StartAt": "CheckDataFreshness",
          "States": {
            "CheckDataFreshness": {
              "Type": "Task",
              "Resource": "arn:aws:lambda:us-east-1:${AccountId}:function:tradai-data-collection-proxy",
              "Parameters": {
                "operation": "check-freshness",
                "symbols.$": "$.symbols",
                "timeframe.$": "$.timeframe"
              },
              "ResultPath": "$.freshness",
              "End": true
            }
          }
        }
      ],
      "ResultPath": "$.parallel_results",
      "Next": "EvaluateValidation"
    },

    "EvaluateValidation": {
      "Type": "Choice",
      "Choices": [
        {
          "And": [
            {"Variable": "$.parallel_results[0].validation.valid", "BooleanEquals": true},
            {"Variable": "$.parallel_results[1].freshness.all_fresh", "BooleanEquals": true}
          ],
          "Next": "RunBacktest"
        }
      ],
      "Default": "FailValidation"
    },

    "FailValidation": {
      "Type": "Fail",
      "Error": "ValidationError",
      "Cause": "Strategy or data validation failed"
    },

    "RunBacktest": {
      "Type": "Task",
      "Resource": "arn:aws:states:::ecs:runTask.sync",
      "Comment": "Container handles DynamoDB status, MLflow, S3 upload",
      "TimeoutSeconds": 7200,
      "HeartbeatSeconds": 300,
      "Parameters": {
        "Cluster": "tradai-cluster",
        "TaskDefinition.$": "States.Format('strategy-{}', $.strategy_name)",
        "LaunchType": "FARGATE",
        "CapacityProviderStrategy": [
          {"CapacityProvider": "FARGATE_SPOT", "Weight": 1},
          {"CapacityProvider": "FARGATE", "Weight": 0, "Base": 1}
        ],
        "NetworkConfiguration": {
          "AwsvpcConfiguration": {
            "Subnets": ["${PrivateSubnet1}", "${PrivateSubnet2}"],
            "SecurityGroups": ["${ECSSecurityGroup}"],
            "AssignPublicIp": "DISABLED"
          }
        },
        "Overrides": {
          "ContainerOverrides": [
            {
              "Name": "strategy",
              "Environment": [
                {"Name": "RUN_ID", "Value.$": "$.run_id"},
                {"Name": "STRATEGY", "Value.$": "$.strategy_name"},
                {"Name": "TIMEFRAME", "Value.$": "$.timeframe"},
                {"Name": "TIMERANGE", "Value.$": "States.Format('{}-{}', $.start_date, $.end_date)"},
                {"Name": "PAIRS", "Value.$": "States.JsonToString($.symbols)"},
                {"Name": "EXPERIMENT_NAME", "Value.$": "$.experiment_name"},
                {"Name": "DYNAMODB_TABLE", "Value": "tradai-workflow-state"},
                {"Name": "MLFLOW_TRACKING_URI", "Value": "http://mlflow.tradai.local:5000"}
              ]
            }
          ]
        }
      },
      "ResultPath": "$.ecs_result",
      "Next": "NotifyCompletion",
      "Catch": [
        {
          "ErrorEquals": ["States.ALL"],
          "ResultPath": "$.error",
          "Next": "NotifyFailure"
        }
      ]
    },

    "NotifyCompletion": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:${AccountId}:function:tradai-notify-completion",
      "Parameters": {
        "run_id.$": "$.run_id",
        "status": "COMPLETED",
        "strategy_name.$": "$.strategy_name"
      },
      "End": true
    },

    "NotifyFailure": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:${AccountId}:function:tradai-notify-completion",
      "Parameters": {
        "run_id.$": "$.run_id",
        "status": "FAILED",
        "strategy_name.$": "$.strategy_name",
        "error.$": "$.error"
      },
      "End": true
    }
  }
}

Full vs Simplified Workflow Comparison

Aspect Full Workflow (v8.0) Simplified (v9.2)
States 9 5
DynamoDB updates Step Functions Container
MLflow logging Lambda Container
S3 upload Lambda Container
Cleanup Lambda Container
Result transformation Lambda Container
Lambdas required 6 3
Complexity High Medium
Debugging Visual + Lambda logs Visual + Container logs
Use when Need orchestration visibility Simple backtest execution

3. Error Handling Strategy

Retry Configuration

Error Type Max Attempts Interval Backoff
Lambda.ServiceException 3 2s 2.0x
Lambda.TooManyRequestsException 3 2s 2.0x
ECS.AmazonECSException 2 5s 2.0x
States.TaskFailed 1 10s 1.0x

Error Categories

Transient Errors (Retry):
├─ Lambda.ServiceException       → Auto-retry with backoff
├─ Lambda.TooManyRequestsException → Auto-retry with backoff
├─ ECS.AmazonECSException        → Auto-retry once
└─ States.Timeout                → Check heartbeat, may retry

Business Errors (Fail Fast):
├─ ValidationError               → Fail immediately
├─ DataStaleError               → Fail immediately
└─ ConfigurationError           → Fail immediately

Infrastructure Errors (Alert):
├─ States.Permissions           → Alert ops team
├─ States.ResultPathMatchFailure → Alert dev team
└─ Unknown errors               → DLQ + alert

Dead Letter Queue Integration

{
  "PublishToDLQ": {
    "Type": "Task",
    "Resource": "arn:aws:states:::sqs:sendMessage",
    "Parameters": {
      "QueueUrl": "${DeadLetterQueueUrl}",
      "MessageBody": {
        "workflow": "backtest",
        "run_id.$": "$.run_id",
        "error.$": "$.error",
        "timestamp.$": "$$.State.EnteredTime",
        "execution_arn.$": "$$.Execution.Id"
      }
    },
    "Next": "NotifyFailure"
  }
}

3. Data Sync Workflow

{
  "Comment": "TradAI Data Sync Workflow",
  "StartAt": "ValidateRequest",
  "States": {

    "ValidateRequest": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:${AccountId}:function:tradai-validate-data-request",
      "Next": "FetchAndStoreData"
    },

    "FetchAndStoreData": {
      "Type": "Task",
      "Resource": "arn:aws:states:::ecs:runTask.sync",
      "TimeoutSeconds": 1800,
      "HeartbeatSeconds": 300,
      "Parameters": {
        "Cluster": "tradai-cluster",
        "TaskDefinition": "tradai-data-collection-task",
        "LaunchType": "FARGATE",
        "NetworkConfiguration": {
          "AwsvpcConfiguration": {
            "Subnets": ["${PrivateSubnet1}", "${PrivateSubnet2}"],
            "SecurityGroups": ["${ECSSecurityGroup}"],
            "AssignPublicIp": "DISABLED"
          }
        },
        "Overrides": {
          "ContainerOverrides": [
            {
              "Name": "data-collection",
              "Environment": [
                {"Name": "COMMAND", "Value": "full-sync"},
                {"Name": "SYMBOLS", "Value.$": "States.JsonToString($.symbols)"},
                {"Name": "TIMEFRAME", "Value.$": "$.timeframe"}
              ]
            }
          ]
        }
      },
      "ResultPath": "$.sync_result",
      "Next": "ValidateDataQuality"
    },

    "ValidateDataQuality": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:${AccountId}:function:tradai-validate-data-quality",
      "Next": "NotifyCompletion"
    },

    "NotifyCompletion": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:${AccountId}:function:tradai-notify-completion",
      "Parameters": {
        "workflow": "data-sync",
        "status": "COMPLETED"
      },
      "End": true
    }
  }
}

4. CloudWatch Integration

Metrics Published

Metric Namespace Dimensions Unit
ExecutionStarted TradAI/StepFunctions WorkflowType Count
ExecutionSucceeded TradAI/StepFunctions WorkflowType Count
ExecutionFailed TradAI/StepFunctions WorkflowType Count
ExecutionDuration TradAI/StepFunctions WorkflowType Seconds
BacktestDuration TradAI/Backtest Strategy Seconds

CloudWatch Alarms

Alarms:
  - Name: backtest-failures-high
    Metric: ExecutionFailed
    Threshold: 5 in 1 hour
    Action: SNS notification

  - Name: backtest-duration-long
    Metric: ExecutionDuration
    Threshold: 3600 seconds (1 hour)
    Action: SNS warning

  - Name: dlq-messages
    Metric: ApproximateNumberOfMessagesVisible
    Threshold: > 0
    Action: SNS alert

5. Cost Analysis

Step Functions Pricing (Standard)

Metric Rate Monthly Usage Cost
State transitions $0.025 per 1000 ~4500 (50 backtests × 9 states × 10) $0.11
Total $0.11

Comparison: EXPRESS vs STANDARD

Aspect EXPRESS STANDARD
Max Duration 5 minutes 1 year
Cost per 1M transitions $1.00 $25.00
State persistence No 90 days
Suitable for backtests NO YES

6. Testing Workflows

Test Input

{
  "run_id": "test-001",
  "strategy_name": "MomentumStrategy",
  "strategy_version": "1.0.0",
  "experiment_name": "test-experiment",
  "symbols": ["BTC/USDT:USDT"],
  "timeframe": "1h",
  "config_overrides": {
    "backtester": {
      "START_DATE": "2024-01-01",
      "END_DATE": "2024-06-01"
    }
  }
}

Expected Flow (Happy Path)

1. ParallelValidation
   ├─ ValidateStrategy: {valid: true, ecr_url: "..."}
   └─ CheckDataFreshness: {all_fresh: true}

2. EvaluateValidation → PrepareConfig

3. PrepareConfig
   └─ Output: {ecr_url: "...", config_path: "s3://..."}

4. RunBacktest
   └─ Duration: 10-30 minutes
   └─ Output: {exit_code: 0, results_path: "s3://..."}

5. TransformResults
   └─ Output: {sharpe_ratio: 2.1, total_return: 45.3, ...}

6. UpdateStateCompleted → DynamoDB updated

7. CleanupResources → Temp files deleted

8. NotifyCompletion → SNS/Email sent

Next Steps

  1. Review 07-COST-ANALYSIS.md for complete cost breakdown
  2. Review 09-PULUMI-CODE.md for Step Functions deployment code