data-collection-proxy Provides a Lambda interface to the data-collection service for Step Functions integration, cross-account access, and API Gateway backend.
Overview Property Value Trigger Step Functions / API Gateway / Direct Runtime Python 3.11 Timeout 180 seconds Memory 256 MB Settings class DataCollectionProxySettings
{
"operation" : "ensure-data" , # Required (default: "health")
"params" : { # Operation-specific parameters
"symbols" : [ "BTC/USDT:USDT" , "ETH/USDT:USDT" ],
"timeframe" : "1h" ,
"start_date" : "20240101" ,
"end_date" : "20240131" ,
"exchange" : "binance_futures"
}
}
Supported Operations Operation Method Endpoint Params health GET /api/v1/health None fetch_ohlcv GET /api/v1/ohlcv symbol, timeframe, limit, start, end symbols GET /api/v1/symbols None status GET /api/v1/status None ensure-data POST /api/v1/sync symbols, timeframe, start_date, end_date, exchange, padding_days
ensure-data Operation The ensure-data operation is designed for Step Functions integration to guarantee data coverage before a backtest runs. It:
Accepts backtest date range (start_date, end_date) Pads the start date by padding_days (default: 30 days) to cover startup candle lookback Sends a POST /api/v1/sync request to the data-collection service The service checks ArcticDB coverage first and only fetches from the exchange when data is missing ensure-data Parameters Parameter Required Default Description symbols Yes - List of symbols (max 50), must contain / timeframe No 1h Candle timeframe (regex: ^\d+[mhd]$) start_date Yes - Backtest start date (YYYYMMDD or YYYY-MM-DD) end_date Yes - Backtest end date (YYYYMMDD or YYYY-MM-DD) exchange No binance_futures Exchange key (binance_futures, bybit_futures, okx_futures) padding_days No 30 Startup candle padding days (mirrors ArcticBacktestExecutor default)
ensure-data Payload Example # Input to Lambda
{
"operation" : "ensure-data" ,
"params" : {
"symbols" : [ "BTC/USDT:USDT" ],
"timeframe" : "1h" ,
"start_date" : "20240201" ,
"end_date" : "20240228" ,
"exchange" : "binance_futures"
}
}
# POST /api/v1/sync payload sent to data-collection service
{
"symbols" : [ "BTC/USDT:USDT" ],
"start_date" : "2024-01-02" , # Padded 30 days back from 2024-02-01
"end_date" : "2024-02-28" ,
"timeframe" : "1h" ,
"exchange" : "binance_futures"
}
Output Schema All responses use Step Functions compatible format via LambdaResponse.to_step_functions():
Success Response {
"statusCode" : 200 ,
"body" : {
"success" : true ,
"data" : {
"operation" : "ensure-data" ,
"result" : { ... } # Response from data-collection service
},
"environment" : "dev"
}
}
Error Response {
"statusCode" : 200 ,
"body" : {
"success" : true ,
"data" : {
"operation" : "ensure-data" ,
"result" : {
"error" : "Connection failed: timeout"
}
},
"environment" : "dev"
}
}
Errors return statusCode: 200 to prevent Step Functions task failure. The error message is included in result.error.
Environment Variables Variable Required Default Description DATA_COLLECTION_URL No - Direct URL to service SERVICE_DISCOVERY_NAMESPACE No tradai.local Cloud Map namespace REQUEST_TIMEOUT No 30 HTTP timeout in seconds MAX_RETRIES No 2 Retry attempts on failure
Service URL Resolution Direct URL : If DATA_COLLECTION_URL is set, use it directly Cloud Map DNS : Construct URL from namespace: http://data-collection.{namespace}:8002
Request Flow flowchart TD
A[Lambda Request] --> B[Resolve Service URL]
B -->|Not found| C[Return error response]
B -->|Found| D{Route Operation}
D -->|health| E[GET /api/v1/health]
D -->|fetch_ohlcv| F[GET /api/v1/ohlcv]
D -->|symbols| G[GET /api/v1/symbols]
D -->|status| H[GET /api/v1/status]
D -->|ensure-data| I[POST /api/v1/sync]
D -->|unknown| X[Return unknown operation error]
E --> J[Make Request with Retries]
F --> J
G --> J
H --> J
I --> J
J -->|Success| K[Return result via to_step_functions]
J -->|5xx Error| L{Retries left?}
L -->|Yes| J
L -->|No| M[Return safe error]
J -->|4xx Error| M Retry Logic Retries on 5xx server errors with exponential backoff: 1s, 2s, 4s No retry on 4xx client errors Max retries configurable via MAX_RETRIES OHLCV Parameter Validation Parameter Validation symbol Max 50 chars, must contain / timeframe Regex: ^\d+[mhd]$ limit 1-10000, defaults to 1000
Sync/Ensure Parameter Validation Parameter Validation symbols Must be a non-empty list of strings (max 50), each containing / timeframe Regex: ^\d+[mhd]$ exchange Must be one of: binance_futures, bybit_futures, okx_futures
CloudWatch Metrics Namespace suffix: DataProxy
Metric Dimensions Description ProxyRequests Operation, Environment Total requests by operation ProxyLatency Operation, Environment Request latency in ms ProxyErrors Operation, Environment Failed requests by operation
Step Functions Integration {
"EnsureDataCoverage" : {
"Type" : "Task" ,
"Resource" : "arn:aws:lambda:...:data-collection-proxy" ,
"Parameters" : {
"operation" : "ensure-data" ,
"params" : {
"symbols.$" : "$.symbols" ,
"timeframe.$" : "$.timeframe" ,
"start_date.$" : "$.start_date" ,
"end_date.$" : "$.end_date" ,
"exchange.$" : "$.exchange"
}
},
"ResultPath" : "$.data_sync" ,
"Next" : "RunBacktest"
}
}