Skip to content

data-collection-proxy

Provides a Lambda interface to the data-collection service for Step Functions integration, cross-account access, and API Gateway backend.

Overview

Property Value
Trigger Step Functions / API Gateway / Direct
Runtime Python 3.11
Timeout 180 seconds
Memory 256 MB
Settings class DataCollectionProxySettings

Input Schema

{
    "operation": "ensure-data",          # Required (default: "health")
    "params": {                          # Operation-specific parameters
        "symbols": ["BTC/USDT:USDT", "ETH/USDT:USDT"],
        "timeframe": "1h",
        "start_date": "20240101",
        "end_date": "20240131",
        "exchange": "binance_futures"
    }
}

Supported Operations

Operation Method Endpoint Params
health GET /api/v1/health None
fetch_ohlcv GET /api/v1/ohlcv symbol, timeframe, limit, start, end
symbols GET /api/v1/symbols None
status GET /api/v1/status None
ensure-data POST /api/v1/sync symbols, timeframe, start_date, end_date, exchange, padding_days

ensure-data Operation

The ensure-data operation is designed for Step Functions integration to guarantee data coverage before a backtest runs. It:

  1. Accepts backtest date range (start_date, end_date)
  2. Pads the start date by padding_days (default: 30 days) to cover startup candle lookback
  3. Sends a POST /api/v1/sync request to the data-collection service
  4. The service checks ArcticDB coverage first and only fetches from the exchange when data is missing

ensure-data Parameters

Parameter Required Default Description
symbols Yes - List of symbols (max 50), must contain /
timeframe No 1h Candle timeframe (regex: ^\d+[mhd]$)
start_date Yes - Backtest start date (YYYYMMDD or YYYY-MM-DD)
end_date Yes - Backtest end date (YYYYMMDD or YYYY-MM-DD)
exchange No binance_futures Exchange key (binance_futures, bybit_futures, okx_futures)
padding_days No 30 Startup candle padding days (mirrors ArcticBacktestExecutor default)

ensure-data Payload Example

# Input to Lambda
{
    "operation": "ensure-data",
    "params": {
        "symbols": ["BTC/USDT:USDT"],
        "timeframe": "1h",
        "start_date": "20240201",
        "end_date": "20240228",
        "exchange": "binance_futures"
    }
}

# POST /api/v1/sync payload sent to data-collection service
{
    "symbols": ["BTC/USDT:USDT"],
    "start_date": "2024-01-02",     # Padded 30 days back from 2024-02-01
    "end_date": "2024-02-28",
    "timeframe": "1h",
    "exchange": "binance_futures"
}

Output Schema

All responses use Step Functions compatible format via LambdaResponse.to_step_functions():

Success Response

{
    "statusCode": 200,
    "body": {
        "success": true,
        "data": {
            "operation": "ensure-data",
            "result": { ... }  # Response from data-collection service
        },
        "environment": "dev"
    }
}

Error Response

{
    "statusCode": 200,
    "body": {
        "success": true,
        "data": {
            "operation": "ensure-data",
            "result": {
                "error": "Connection failed: timeout"
            }
        },
        "environment": "dev"
    }
}

Errors return statusCode: 200 to prevent Step Functions task failure. The error message is included in result.error.

Environment Variables

Variable Required Default Description
DATA_COLLECTION_URL No - Direct URL to service
SERVICE_DISCOVERY_NAMESPACE No tradai.local Cloud Map namespace
REQUEST_TIMEOUT No 30 HTTP timeout in seconds
MAX_RETRIES No 2 Retry attempts on failure

Service URL Resolution

  1. Direct URL: If DATA_COLLECTION_URL is set, use it directly
  2. Cloud Map DNS: Construct URL from namespace:
    http://data-collection.{namespace}:8002
    

Request Flow

flowchart TD
    A[Lambda Request] --> B[Resolve Service URL]
    B -->|Not found| C[Return error response]
    B -->|Found| D{Route Operation}
    D -->|health| E[GET /api/v1/health]
    D -->|fetch_ohlcv| F[GET /api/v1/ohlcv]
    D -->|symbols| G[GET /api/v1/symbols]
    D -->|status| H[GET /api/v1/status]
    D -->|ensure-data| I[POST /api/v1/sync]
    D -->|unknown| X[Return unknown operation error]
    E --> J[Make Request with Retries]
    F --> J
    G --> J
    H --> J
    I --> J
    J -->|Success| K[Return result via to_step_functions]
    J -->|5xx Error| L{Retries left?}
    L -->|Yes| J
    L -->|No| M[Return safe error]
    J -->|4xx Error| M

Retry Logic

  • Retries on 5xx server errors with exponential backoff: 1s, 2s, 4s
  • No retry on 4xx client errors
  • Max retries configurable via MAX_RETRIES

OHLCV Parameter Validation

Parameter Validation
symbol Max 50 chars, must contain /
timeframe Regex: ^\d+[mhd]$
limit 1-10000, defaults to 1000

Sync/Ensure Parameter Validation

Parameter Validation
symbols Must be a non-empty list of strings (max 50), each containing /
timeframe Regex: ^\d+[mhd]$
exchange Must be one of: binance_futures, bybit_futures, okx_futures

CloudWatch Metrics

Namespace suffix: DataProxy

Metric Dimensions Description
ProxyRequests Operation, Environment Total requests by operation
ProxyLatency Operation, Environment Request latency in ms
ProxyErrors Operation, Environment Failed requests by operation

Step Functions Integration

{
  "EnsureDataCoverage": {
    "Type": "Task",
    "Resource": "arn:aws:lambda:...:data-collection-proxy",
    "Parameters": {
      "operation": "ensure-data",
      "params": {
        "symbols.$": "$.symbols",
        "timeframe.$": "$.timeframe",
        "start_date.$": "$.start_date",
        "end_date.$": "$.end_date",
        "exchange.$": "$.exchange"
      }
    },
    "ResultPath": "$.data_sync",
    "Next": "RunBacktest"
  }
}