Skip to content

data-collection-proxy

Provides a Lambda interface to the data-collection service for Step Functions integration, cross-account access, and API Gateway backend.

Overview

Property Value
Trigger Step Functions / API Gateway / Direct
Runtime Python 3.11
Timeout 60 seconds
Memory 128 MB

Input Schema

{
    "operation": "check-freshness",     # Required
    "params": {                         # Operation-specific parameters
        "symbols": ["BTC/USDT:USDT", "ETH/USDT:USDT"],
        "timeframe": "1h"
    }
}

Supported Operations

Operation Description Params
health Health check None
fetch_ohlcv Fetch OHLCV data symbol, timeframe, limit, start, end
symbols List available symbols None
status Collection status None
check-freshness Check data freshness symbols, timeframe

Output Schema

check-freshness Response

{
    "statusCode": 200,
    "body": {
        "success": true,
        "data": {
            "operation": "check-freshness",
            "result": {
                "all_fresh": true,
                "stale_symbols": []
            }
        }
    }
}

fetch_ohlcv Response

{
    "statusCode": 200,
    "body": {
        "success": true,
        "data": {
            "operation": "fetch_ohlcv",
            "result": {
                "symbol": "BTC/USDT:USDT",
                "timeframe": "1h",
                "data": [...],
                "count": 1000
            }
        }
    }
}

Error Response

{
    "statusCode": 200,
    "body": {
        "success": true,
        "data": {
            "operation": "check-freshness",
            "result": {
                "all_fresh": false,         # Safe default
                "stale_symbols": [],
                "error": "Connection failed: timeout"
            }
        }
    }
}

Environment Variables

Variable Required Default Description
DATA_COLLECTION_URL No - Direct URL to service
SERVICE_DISCOVERY_NAMESPACE No "tradai.local" Cloud Map namespace
REQUEST_TIMEOUT No 30 HTTP timeout in seconds
MAX_RETRIES No 2 Retry attempts on failure

Service URL Resolution

  1. Direct URL: If DATA_COLLECTION_URL is set, use it directly
  2. Cloud Map DNS: Construct URL from namespace:
    http://data-collection.{namespace}:8002
    

Request Flow

flowchart TD
    A[Lambda Request] --> B[Resolve Service URL]
    B -->|Not found| C[Return error response]
    B -->|Found| D{Route Operation}
    D -->|health| E[GET /api/v1/health]
    D -->|fetch_ohlcv| F[GET /api/v1/ohlcv]
    D -->|symbols| G[GET /api/v1/symbols]
    D -->|status| H[GET /api/v1/status]
    D -->|check-freshness| I[GET /api/v1/freshness]
    E --> J[Make Request with Retries]
    F --> J
    G --> J
    H --> J
    I --> J
    J -->|Success| K[Return result]
    J -->|5xx Error| L{Retries left?}
    L -->|Yes| J
    L -->|No| M[Return safe error]
    J -->|4xx Error| M

Retry Logic

  • Retries on 5xx server errors with exponential backoff: 1s, 2s, 4s
  • No retry on 4xx client errors
  • Max retries configurable via MAX_RETRIES

OHLCV Parameter Validation

Parameter Validation
symbol Max 50 chars, must contain /
timeframe Regex: ^\d+[mhd]$
limit 1-10000, defaults to 1000

CloudWatch Metrics

Metric Description
ProxyRequests Total requests by operation
ProxyLatency Request latency in ms
ProxyErrors Failed requests by operation

Step Functions Integration

{
  "CheckDataFreshness": {
    "Type": "Task",
    "Resource": "arn:aws:lambda:...:data-collection-proxy",
    "Parameters": {
      "operation": "check-freshness",
      "params": {
        "symbols.$": "$.symbols",
        "timeframe.$": "$.timeframe"
      }
    },
    "ResultPath": "$.freshness",
    "Next": "EvaluateFreshness"
  },
  "EvaluateFreshness": {
    "Type": "Choice",
    "Choices": [{
      "Variable": "$.freshness.body.data.result.all_fresh",
      "BooleanEquals": true,
      "Next": "RunBacktest"
    }],
    "Default": "CollectData"
  }
}

Error Handling for Step Functions

The Lambda returns Step Functions compatible responses even on errors: - For check-freshness: Returns all_fresh: false on error (safe default) - Error message included in result.error field - statusCode: 200 to prevent Step Functions task failure