Collect Market Data¶
Sync OHLCV data from exchanges to ArcticDB for backtesting.
Prerequisites¶
- TradAI workspace set up (
just setupcompleted) - Exchange API credentials (for authenticated endpoints)
- AWS credentials configured (for S3/ArcticDB storage)
Steps¶
1. Configure Exchange Credentials¶
Set up exchange configuration in your environment:
# In .env or shell
export DATA_COLLECTION_EXCHANGES='{
"binance_futures": {
"name": "binance",
"type": "futures",
"api_key": "your-api-key",
"api_secret": "your-api-secret"
}
}'
# Configure ArcticDB storage
export DATA_COLLECTION_ARCTIC_S3_BUCKET="tradai-arcticdb-dev"
export DATA_COLLECTION_ARCTIC_LIBRARY="futures"
2. Start Data Collection Service¶
3. Sync Market Data¶
# Sync single symbol
tradai data sync BTC/USDT:USDT \
--start 2024-01-01 \
--end 2024-06-30 \
--timeframe 1h
# Sync multiple symbols
tradai data sync BTC/USDT:USDT ETH/USDT:USDT SOL/USDT:USDT \
--start 2024-01-01 \
--end 2024-06-30 \
--timeframe 1h
# Incremental sync (only new data)
tradai data sync BTC/USDT:USDT \
--start 2024-01-01 \
--end 2024-06-30 \
--incremental
Available Commands¶
List Available Symbols¶
# List all symbols from Binance Futures
tradai data list-symbols --exchange binance_futures
# Limit output
tradai data list-symbols --exchange binance_futures --limit 20
Check Data Freshness¶
# Check if data is stale
tradai data check-freshness BTC/USDT:USDT ETH/USDT:USDT
# Custom stale threshold (hours)
tradai data check-freshness BTC/USDT:USDT --threshold 48
Example output:
Checking freshness for 2 symbols (threshold: 24h)...
Symbol Freshness Status:
────────────────────────────────────────────────────────────────
BTC/USDT:USDT FRESH Latest: 2024-06-30T23:00:00
ETH/USDT:USDT STALE Latest: 2024-06-15T12:00:00
────────────────────────────────────────────────────────────────
Summary: 1/2 symbols are stale
Health Check¶
API Endpoints¶
The Data Collection service exposes REST endpoints:
| Method | Endpoint | Description |
|---|---|---|
GET | /api/v1/health | Service health check |
POST | /api/v1/sync | Full data sync |
POST | /api/v1/sync/incremental | Incremental sync |
GET | /api/v1/freshness | Check data freshness |
GET | /api/v1/symbols | List available symbols |
GET | /docs | OpenAPI documentation |
Example API Calls¶
# Health check
curl http://localhost:8002/api/v1/health
# Sync data via API
curl -X POST http://localhost:8002/api/v1/sync \
-H "Content-Type: application/json" \
-d '{
"symbols": ["BTC/USDT:USDT", "ETH/USDT:USDT"],
"start_date": "2024-01-01",
"end_date": "2024-06-30",
"timeframe": "1h",
"exchange": "binance_futures"
}'
# Check freshness via API
curl "http://localhost:8002/api/v1/freshness?symbols=BTC/USDT:USDT&symbols=ETH/USDT:USDT"
Supported Exchanges¶
| Exchange | Key | Type | Notes |
|---|---|---|---|
| Binance Futures | binance_futures | futures | Most liquid |
| Binance Spot | binance_spot | spot | No leverage |
Supported Timeframes¶
| Timeframe | Code | Notes |
|---|---|---|
| 1 minute | 1m | High frequency |
| 5 minutes | 5m | Scalping |
| 15 minutes | 15m | Day trading |
| 30 minutes | 30m | Day trading |
| 1 hour | 1h | Swing trading (default) |
| 4 hours | 4h | Swing trading |
| 1 day | 1d | Position trading |
Storage Architecture¶
Data is stored in ArcticDB, a high-performance time-series database:
ArcticDB (S3-backed)
└── Library: "futures"
├── BTC/USDT:USDT
│ └── OHLCV DataFrame (timestamp, open, high, low, close, volume)
├── ETH/USDT:USDT
│ └── OHLCV DataFrame
└── ...
Configuration Reference¶
| Variable | Description | Default |
|---|---|---|
DATA_COLLECTION_HOST | Server host | 0.0.0.0 |
DATA_COLLECTION_PORT | Server port | 8002 |
DATA_COLLECTION_DEBUG | Debug mode | false |
DATA_COLLECTION_LOG_LEVEL | Log level | INFO |
DATA_COLLECTION_ARCTIC_S3_BUCKET | S3 bucket for ArcticDB | Required |
DATA_COLLECTION_ARCTIC_LIBRARY | ArcticDB library name | futures |
DATA_COLLECTION_EXCHANGES | Exchange configs (JSON) | Required |
Batch Data Collection¶
For large data collections, use a script:
"""Batch data collection script."""
from datetime import datetime, timedelta
from tradai.data_collection.core.service import DataCollectionService
from tradai.data_collection.core.settings import get_settings
# Symbols to sync
SYMBOLS = [
"BTC/USDT:USDT",
"ETH/USDT:USDT",
"SOL/USDT:USDT",
"BNB/USDT:USDT",
"XRP/USDT:USDT",
]
# Date range
START_DATE = "2023-01-01"
END_DATE = "2024-06-30"
# Timeframes
TIMEFRAMES = ["1h", "4h", "1d"]
def main():
settings = get_settings()
service = DataCollectionService(settings)
for timeframe in TIMEFRAMES:
print(f"\nSyncing {timeframe} data...")
for symbol in SYMBOLS:
print(f" {symbol}...", end=" ")
result = service.sync_data(
symbols=[symbol],
start_date=START_DATE,
end_date=END_DATE,
timeframe=timeframe,
exchange="binance_futures",
)
print(f"{result.rows_synced} rows")
if __name__ == "__main__":
main()
Run:
Troubleshooting¶
"Exchange not configured"¶
Ensure DATA_COLLECTION_EXCHANGES environment variable is set correctly.
"ArcticDB connection failed"¶
- Check AWS credentials are configured
- Verify S3 bucket exists and is accessible
- Check
DATA_COLLECTION_ARCTIC_S3_BUCKETis correct
"Rate limit exceeded"¶
The service handles rate limiting automatically. For bulk syncs, consider: - Spreading requests over time - Using incremental sync mode
Next Steps¶
- Your First Backtest - Run backtests with your data
- Create a New Strategy - Build strategies