Data Collection Service — Design Document¶
Overview¶
Market data fetching and storage service. Syncs OHLCV data from exchanges (via CCXT) into ArcticDB, providing both REST API and CLI interfaces.
Architecture¶
3-Layer Pattern¶
src/tradai/data_collection/
├── api/ # Presentation layer
│ ├── routes.py # REST endpoints (sync, freshness, symbols)
│ ├── streaming_routes.py # WebSocket streaming endpoints
│ ├── schemas.py # Request/response Pydantic models
│ └── dependencies.py # FastAPI dependency injection
├── core/ # Business logic
│ ├── service.py # DataCollectionService (orchestrates sync)
│ ├── entities.py # Domain entities (SyncResult, FreshnessCheck)
│ ├── factories.py # Repository/adapter factories
│ ├── settings.py # Service configuration (Pydantic Settings)
│ └── streaming/ # Real-time data streaming logic
└── infrastructure/ # External adapters
└── health_checkers.py # Service health check implementations
Module Responsibilities¶
| Module | Purpose |
|---|---|
api/routes.py | REST endpoints: /sync, /sync/incremental, /freshness, /symbols |
api/streaming_routes.py | WebSocket endpoints for real-time data |
core/service.py | Orchestrates data fetching, validation, and storage |
core/entities.py | SyncResult, FreshnessStatus domain entities |
core/factories.py | Creates exchange clients and storage adapters |
core/settings.py | DataCollectionSettings from environment variables |
infrastructure/health_checkers.py | ArcticDB and exchange connectivity checks |
Dependencies¶
Libraries Used¶
- tradai-common: LoggerMixin, health check framework, FastAPI utilities
- tradai-data: CCXT exchange adapters, ArcticDB storage adapters
External Services¶
- ArcticDB (S3-backed): Time-series storage for OHLCV data
- Exchange APIs: Binance Futures/Spot via CCXT
Consumed By¶
- Backend service: Proxies data collection requests
- CLI:
tradai data sync,tradai data check-freshness - Lambdas:
data-collection-proxyLambda invokes this service
Key Design Decisions¶
- Incremental sync — Only fetches data newer than the latest stored timestamp, minimizing API calls and storage writes.
- Exchange abstraction via CCXT — All exchange interactions go through
tradai-data's CCXT adapter, making it easy to add new exchanges. - ArcticDB for time-series — Chose ArcticDB over PostgreSQL for OHLCV data due to columnar storage efficiency and native S3 backend.
- Streaming support — WebSocket endpoints allow real-time data consumption for live trading scenarios.
Configuration¶
| Variable | Description | Default |
|---|---|---|
DATA_COLLECTION_HOST | Server host | 0.0.0.0 |
DATA_COLLECTION_PORT | Server port | 8002 |
DATA_COLLECTION_EXCHANGES | Exchange configs (JSON) | Required |
DATA_COLLECTION_ARCTIC_S3_BUCKET | S3 bucket for ArcticDB | Required |
DATA_COLLECTION_ARCTIC_LIBRARY | ArcticDB library name | futures |
API Reference¶
See Data Collection README for complete endpoint documentation.