ArcticDB Data Model¶
Overview¶
TradAI stores and retrieves time-series market data using ArcticDB, an open-source DataFrame database backed by S3. ArcticDB provides versioned, columnar storage optimized for financial time-series workloads, with native support for date-range queries and batch operations.
All OHLCV (Open, High, Low, Close, Volume) data flows through the DataAdapter protocol, with ArcticAdapter as the production implementation.
Storage Architecture¶
S3 Bucket¶
Each environment has a dedicated S3 bucket following the naming convention:
For example: tradai-arcticdb-dev, tradai-arcticdb-staging, tradai-arcticdb-prod.
This is defined in the infrastructure config (infra/shared/tradai_infra_shared/config.py):
The bucket has versioning enabled and no lifecycle policy (data is retained indefinitely).
Connection String¶
ArcticDB connects to S3 using a URI of the form:
For LocalStack or MinIO development environments, the adapter switches to plain s3:// with explicit credentials and path-style addressing:
s3://<host>:<bucket>?port=4566&access=test&secret=test®ion=eu-central-1&use_virtual_addressing=false
Library¶
Within a bucket, data is organized into libraries. The default library name is ohlcv, configured via the ARCTIC_LIBRARY setting. The library is created automatically on first access (create_if_missing=True).
Library names are normalized to lowercase by the settings validator.
Data Schema¶
Each symbol is stored as a separate ArcticDB symbol entry. The DataFrame for each symbol has the following structure:
| Column | Dtype | Description |
|---|---|---|
date | datetime | Candle timestamp (UTC, used as DataFrame index) |
open | float64 | Opening price |
high | float64 | Highest price in the period |
low | float64 | Lowest price in the period |
close | float64 | Closing price |
volume | float64 | Trading volume |
Index column
When stored in ArcticDB, the date column becomes the DataFrame index. On read, it is reset back to a regular column before being wrapped in OHLCVData.
The in-memory representation (OHLCVData) adds a symbol column to identify which trading pair each row belongs to. This column is stripped before writing to ArcticDB and re-added on read.
Required Columns¶
The OHLCVData entity enforces these seven required columns:
Symbol Naming¶
ArcticDB symbol names cannot contain / or : characters. The adapter normalizes trading symbols using a double-underscore separator (__).
Normalization Rules¶
| Trading Symbol | ArcticDB Symbol | Type |
|---|---|---|
BTC/USDT:USDT | BTC__USDT__USDT | Futures |
ETH/USDT:USDT | ETH__USDT__USDT | Futures |
BTC/USDT | BTC__USDT | Spot |
Normalization (write path): replaces / and : with __.
Denormalization (read path): splits on __ and reconstructs the trading format.
- 3+ parts: futures format
BASE/QUOTE:SETTLE(e.g.,BTC__USDT__USDTbecomesBTC/USDT:USDT) - 2 parts: spot format
BASE/QUOTE(e.g.,BTC__USDTbecomesBTC/USDT) - 1 part: returned as-is
Read/Write Patterns¶
Write: Single Symbol (save)¶
The save() method uses upsert semantics:
- Groups the
OHLCVDataDataFrame by symbol. - For each symbol, drops the
symbolcolumn and setsdateas the index. - Checks if the symbol already exists in the library:
- Existing symbol: calls
library.update()withupsert=Trueto merge new rows by index. - New symbol: calls
library.write()to create the entry.
- Existing symbol: calls
- Attaches metadata (see Metadata below).
- Prunes previous versions (
prune_previous_versions=True) to avoid unbounded storage growth.
Write: Batch (save_batch)¶
The save_batch() method uses library.write_batch() for 2-3x faster writes when saving multiple symbols. Unlike save(), batch write replaces existing data rather than upserting. Use this for initial data loads.
Read: Batch (load)¶
The load() method uses library.read_batch() for efficient multi-symbol loading:
- Builds a
ReadRequestper symbol with the requested(start, end)date range. - Executes a batch read.
- For each successful result, resets the index (moving
dateback to a column) and re-inserts the denormalizedsymbolcolumn. - Concatenates all DataFrames and wraps in
OHLCVData.
Symbols that fail to read (e.g., not found) are silently skipped. If no symbols return data, a DataNotFoundError is raised.
Incremental Sync¶
The data-collection service supports incremental sync to avoid re-fetching historical data:
get_latest_date()reads metadata for each symbol to find the last stored candle date.- The
CoverageCheckercompares this against the requested date range. - Only symbols with incomplete coverage are fetched from the exchange API.
- New data is upserted via
save(), extending the existing time series.
Metadata¶
Each symbol write includes a metadata dictionary attached to the ArcticDB entry. The current schema is version 2:
| Field | Type | Description |
|---|---|---|
metadata_version | int | Schema version (currently 2) |
last_query_date | string | ISO 8601 timestamp of when the exchange API was queried |
last_candle_date | string | ISO 8601 timestamp of the latest candle in the data |
timeframe | string | CCXT timeframe string (e.g., "1h", "1d"). Optional; absent in legacy data. |
When reading the latest date for incremental sync, the adapter prefers last_candle_date and falls back to last_query_date for backwards compatibility with pre-version-2 data.
Versioning¶
ArcticDB supports automatic versioning of symbol data. Each write or update creates a new version. TradAI uses prune_previous_versions=True on all write operations, which means only the latest version is retained. This prevents unbounded growth of version history in S3.
ArcticDB's internal versioning is separate from the metadata_version field, which tracks the schema of the metadata dictionary itself.
Concurrent Access¶
ArcticDB relies on S3's strong read-after-write consistency (available since December 2020). Key behaviors:
- Multiple readers: fully supported with no coordination needed.
- Single writer per symbol: the adapter does not implement locking. In practice, each symbol is owned by one data-collection service instance at a time.
- Batch operations:
write_batchandread_batchare atomic per-symbol but not across symbols. Individual symbol failures in a batch are reported asDataErrorentries in the result list without failing the entire batch.
Platform Support¶
ArcticDB is only available on Linux x86_64 and Windows. For macOS ARM development:
- The
create_data_adapter()factory automatically returnsInMemoryAdapteron macOS. - Tests inject a mock library via the
arctic_libraryconstructor parameter. - The
ArcticLibraryProtocolinlibs/tradai-data/src/tradai/data/infrastructure/adapters/protocols.pyenables type-safe mocking.
Configuration¶
Environment Variables¶
All ArcticDB settings are configured via environment variables with the service prefix (e.g., DATA_COLLECTION_ for the data-collection service). The ArcticSettingsMixin provides the base fields.
| Variable | Default | Description |
|---|---|---|
{PREFIX}_ARCTIC_S3_BUCKET | (required) | S3 bucket name (e.g., tradai-arcticdb-dev) |
{PREFIX}_ARCTIC_LIBRARY | ohlcv | ArcticDB library name |
{PREFIX}_ARCTIC_S3_ENDPOINT | s3.{region}.amazonaws.com | S3 endpoint (use localstack:4566 for local dev) |
{PREFIX}_ARCTIC_REGION | eu-central-1 | AWS region |
{PREFIX}_ARCTIC_USE_SSL | true | Use TLS (set false for LocalStack) |
{PREFIX}_ARCTIC_ACCESS_KEY | (none) | Explicit S3 access key (LocalStack/MinIO only) |
{PREFIX}_ARCTIC_SECRET_KEY | (none) | Explicit S3 secret key (LocalStack/MinIO only) |
{PREFIX}_ARCTIC_USE_VIRTUAL_ADDRESSING | true | Virtual-hosted style URLs (set false for LocalStack) |
Where {PREFIX} is the service-specific env var prefix: DATA_COLLECTION, STRATEGY_SERVICE, etc.
Safety Checks¶
- LocalStack in production: If
use_ssl=True(indicating production) and the endpoint containslocalstackorlocalhost:4566, the adapter raises aConfigurationError. - Non-dev environments: The
DataCollectionSettingsvalidator rejects LocalStack endpoints whenENVIRONMENTis notlocalordev.
Key Source Files¶
| File | Description |
|---|---|
libs/tradai-data/src/tradai/data/infrastructure/adapters/arctic_adapter.py | ArcticAdapter implementation (read, write, batch, symbol normalization) |
libs/tradai-data/src/tradai/data/infrastructure/adapters/protocols.py | ArcticLibraryProtocol and related protocols for DI/mocking |
libs/tradai-data/src/tradai/data/infrastructure/adapters/__init__.py | create_data_adapter() factory with platform detection |
libs/tradai-data/src/tradai/data/core/entities.py | OHLCVData, DateRange, SymbolList, Timeframe value objects |
libs/tradai-data/src/tradai/data/core/repositories.py | DataAdapter protocol (storage interface) |
libs/tradai-data/src/tradai/data/core/coverage.py | CoverageChecker for incremental sync decisions |
libs/tradai-common/src/tradai/common/settings_mixins.py | ArcticSettingsMixin with shared config fields |
services/data-collection/src/tradai/data_collection/core/factories.py | create_arctic_adapter() factory wiring settings to adapter |
services/data-collection/src/tradai/data_collection/core/settings.py | DataCollectionSettings with ArcticDB validation |
services/data-collection/src/tradai/data_collection/core/service.py | DataCollectionService orchestrating sync flows |
infra/shared/tradai_infra_shared/config.py | S3 bucket naming (tradai-arcticdb-{env}) and bucket config |