Skip to content

Collect Market Data

Sync OHLCV data from exchanges to ArcticDB for backtesting.

Prerequisites

  • TradAI workspace set up (just setup completed)
  • Exchange API credentials (for authenticated endpoints)
  • AWS credentials configured (for S3/ArcticDB storage)

Steps

1. Configure Exchange Credentials

Set up exchange configuration in your environment:

# In .env or shell
export DATA_COLLECTION_EXCHANGES='{
  "binance_futures": {
    "name": "binance",
    "type": "futures",
    "api_key": "your-api-key",
    "api_secret": "your-api-secret"
  }
}'

# Configure ArcticDB storage
export DATA_COLLECTION_ARCTIC_S3_BUCKET="tradai-arcticdb-dev"
export DATA_COLLECTION_ARCTIC_LIBRARY="futures"

2. Start Data Collection Service

# Start services
just up

# Or run standalone
tradai data serve --port 8002

3. Sync Market Data

# Sync single symbol
tradai data sync BTC/USDT:USDT \
  --start 2024-01-01 \
  --end 2024-06-30 \
  --timeframe 1h

# Sync multiple symbols
tradai data sync BTC/USDT:USDT ETH/USDT:USDT SOL/USDT:USDT \
  --start 2024-01-01 \
  --end 2024-06-30 \
  --timeframe 1h

# Incremental sync (only new data)
tradai data sync BTC/USDT:USDT \
  --start 2024-01-01 \
  --end 2024-06-30 \
  --incremental

Available Commands

List Available Symbols

# List all symbols from Binance Futures
tradai data list-symbols --exchange binance_futures

# Limit output
tradai data list-symbols --exchange binance_futures --limit 20

Check Data Freshness

# Check if data is stale
tradai data check-freshness BTC/USDT:USDT ETH/USDT:USDT

# Custom stale threshold (hours)
tradai data check-freshness BTC/USDT:USDT --threshold 48

Example output:

Checking freshness for 2 symbols (threshold: 24h)...

Symbol Freshness Status:
────────────────────────────────────────────────────────────────
  BTC/USDT:USDT                    FRESH  Latest: 2024-06-30T23:00:00
  ETH/USDT:USDT                    STALE  Latest: 2024-06-15T12:00:00
────────────────────────────────────────────────────────────────
Summary: 1/2 symbols are stale

Health Check

tradai data health

API Endpoints

The Data Collection service exposes REST endpoints:

Method Endpoint Description
GET /api/v1/health Service health check
POST /api/v1/sync Full data sync
POST /api/v1/sync/incremental Incremental sync
GET /api/v1/freshness Check data freshness
GET /api/v1/symbols List available symbols
GET /docs OpenAPI documentation

Example API Calls

# Health check
curl http://localhost:8002/api/v1/health

# Sync data via API
curl -X POST http://localhost:8002/api/v1/sync \
  -H "Content-Type: application/json" \
  -d '{
    "symbols": ["BTC/USDT:USDT", "ETH/USDT:USDT"],
    "start_date": "2024-01-01",
    "end_date": "2024-06-30",
    "timeframe": "1h",
    "exchange": "binance_futures"
  }'

# Check freshness via API
curl "http://localhost:8002/api/v1/freshness?symbols=BTC/USDT:USDT&symbols=ETH/USDT:USDT"

Supported Exchanges

Exchange Key Type Notes
Binance Futures binance_futures futures Most liquid
Binance Spot binance_spot spot No leverage

Supported Timeframes

Timeframe Code Notes
1 minute 1m High frequency
5 minutes 5m Scalping
15 minutes 15m Day trading
30 minutes 30m Day trading
1 hour 1h Swing trading (default)
4 hours 4h Swing trading
1 day 1d Position trading

Storage Architecture

Data is stored in ArcticDB, a high-performance time-series database:

ArcticDB (S3-backed)
└── Library: "futures"
    ├── BTC/USDT:USDT
    │   └── OHLCV DataFrame (timestamp, open, high, low, close, volume)
    ├── ETH/USDT:USDT
    │   └── OHLCV DataFrame
    └── ...

Configuration Reference

Variable Description Default
DATA_COLLECTION_HOST Server host 0.0.0.0
DATA_COLLECTION_PORT Server port 8002
DATA_COLLECTION_DEBUG Debug mode false
DATA_COLLECTION_LOG_LEVEL Log level INFO
DATA_COLLECTION_ARCTIC_S3_BUCKET S3 bucket for ArcticDB Required
DATA_COLLECTION_ARCTIC_LIBRARY ArcticDB library name futures
DATA_COLLECTION_EXCHANGES Exchange configs (JSON) Required

Batch Data Collection

For large data collections, use a script:

"""Batch data collection script."""

from datetime import datetime, timedelta
from tradai.data_collection.core.service import DataCollectionService
from tradai.data_collection.core.settings import get_settings

# Symbols to sync
SYMBOLS = [
    "BTC/USDT:USDT",
    "ETH/USDT:USDT",
    "SOL/USDT:USDT",
    "BNB/USDT:USDT",
    "XRP/USDT:USDT",
]

# Date range
START_DATE = "2023-01-01"
END_DATE = "2024-06-30"

# Timeframes
TIMEFRAMES = ["1h", "4h", "1d"]

def main():
    settings = get_settings()
    service = DataCollectionService(settings)

    for timeframe in TIMEFRAMES:
        print(f"\nSyncing {timeframe} data...")
        for symbol in SYMBOLS:
            print(f"  {symbol}...", end=" ")
            result = service.sync_data(
                symbols=[symbol],
                start_date=START_DATE,
                end_date=END_DATE,
                timeframe=timeframe,
                exchange="binance_futures",
            )
            print(f"{result.rows_synced} rows")

if __name__ == "__main__":
    main()

Run:

uv run python scripts/batch_sync.py

Troubleshooting

"Exchange not configured"

Ensure DATA_COLLECTION_EXCHANGES environment variable is set correctly.

"ArcticDB connection failed"

  1. Check AWS credentials are configured
  2. Verify S3 bucket exists and is accessible
  3. Check DATA_COLLECTION_ARCTIC_S3_BUCKET is correct

"Rate limit exceeded"

The service handles rate limiting automatically. For bulk syncs, consider: - Spreading requests over time - Using incremental sync mode

Next Steps