Skip to content

TradAI Configuration Management

Version: 1.0.0 | Date: 2026-03-28 | Status: CURRENT

1. TL;DR

TradAI uses two distinct configuration systems that operate at different lifecycle stages:

  • Infrastructure configuration (infra/shared/config.py) defines AWS resources at deployment time via Pulumi. It sets VPC CIDRs, ECS task sizes, S3 bucket names, DynamoDB tables, and Lambda schedules. Values are environment-specific (dev/staging/prod) and resolved at pulumi up.
  • Application configuration runs at service startup and runtime. YAML config files live in S3, loaded by S3ConfigRepository, merged via ConfigMergeService (OmegaConf deep merge), and versioned in DynamoDB through ConfigVersionService. Services bind environment variables to Pydantic BaseSettings subclasses with validation, type coercion, and immutability (frozen=True).
  • Secrets are stored in AWS Secrets Manager with thread-safe retrieval, never hardcoded. RDS uses managed password rotation.

2. Configuration Systems Overview

graph TD
    subgraph "Infrastructure Config (Deploy Time)"
        A[infra/.env] -->|AWS_PROFILE, PULUMI_CONFIG_PASSPHRASE| B[Pulumi CLI]
        C[infra/shared/config.py] -->|VPC, Services, S3, DynamoDB| B
        D[pulumi config set] -->|Overrides| B
        B -->|pulumi up| E[AWS Resources]
    end

    subgraph "Application Config (Runtime)"
        F[S3 Bucket: tradai-configs-ENV] -->|YAML download| G[S3ConfigRepository]
        G -->|raw dict| H[ConfigMergeService]
        I[Runtime Overrides] -->|OmegaConf deep merge| H
        H -->|validated dict| J[StrategyConfigLoader]
        J -->|StrategyConfig entity| K[Services]

        L[.env files] -->|env vars| M[Pydantic BaseSettings]
        M -->|validated, frozen| K

        N[DynamoDB: config-versions] <-->|CRUD| O[ConfigVersionService]
        O -->|versioning, lifecycle| K
    end

    subgraph "Secrets (Runtime)"
        P[AWS Secrets Manager] -->|get_secret| Q[Thread-safe Client]
        Q -->|JSON dict| K
    end

    E -.->|bucket names, table names, ARNs| L

3. Infrastructure Configuration

Infrastructure configuration lives in infra/shared/tradai_infra_shared/config.py and is consumed by Pulumi stacks at deployment time.

3.1 Config Structure

The file defines constants, typed dictionaries, and helper functions consumed by four Pulumi stacks (persistent, foundation, compute, edge):

Section Constants Purpose
Environment ENVIRONMENT, PROJECT_NAME, AWS_REGION Stack identity; region from Pulumi config, env var, or default
Network VPC_CIDR, SUBNETS, AVAILABILITY_ZONES VPC layout with public/private/database subnets
Services SERVICES dict Per-service CPU, memory, port, desired count, health check path
Storage S3_BUCKETS, DYNAMODB_TABLES, RDS_CONFIG Resource names with {project}-{name}-{env} convention
Compute EC2_CONSOLIDATED_CONFIG, NAT_INSTANCE_TYPE Consolidated EC2 mode for dev/staging cost savings
ECR ECR_REPOS, LAMBDA_ECR_REPOS, STRATEGY_ECR_REPOS Container image repositories
API API_ROUTES, API_THROTTLING, ALB_PATH_PATTERNS API Gateway and ALB routing configuration
Auth COGNITO_CONFIG Password policy, MFA enforcement
Schedules LAMBDA_SCHEDULES EventBridge schedule expressions for periodic Lambdas

3.2 Environment-Specific Behavior

Infrastructure config uses ENVIRONMENT = pulumi.get_stack() for conditional logic:

# Example: prod gets higher specs
"desired_count": 1 if ENVIRONMENT != "prod" else 2,
"instance_class": "db.t4g.micro" if ENVIRONMENT != "prod" else "db.t4g.small",
"multi_az": ENVIRONMENT == "prod",

3.3 Pulumi Config Overrides

Runtime overrides via pulumi config set:

Override Command Effect
Consolidated mode pulumi config set consolidated_mode "false" Switch from EC2 to ECS Fargate
Skip ALB pulumi config set skip_alb "true" Skip ALB for restricted accounts
Custom AMI pulumi config set custom_ami_id "ami-xxx" Use Packer-built AMI
Lambda schedules pulumi config set drift_monitor_schedule "rate(1 day)" Override check frequency
Strategy ECR repos pulumi config set --path 'strategy_ecr_repos[0]' 'my-strat' Register strategy images
CORS origins pulumi config set --path 'cors_allowed_origins[0]' 'https://...' Set allowed origins

Config Sourcing

infra/.env is sourced automatically by justfile recipes (set -a && source ../.env && set +a) before any Pulumi command. It provides AWS_PROFILE, PULUMI_CONFIG_PASSPHRASE, and S3_PULUMI_BACKEND_URL.

4. Application Configuration

Application configuration is loaded at runtime from S3 and optionally merged with overrides. Three layers work together.

4.1 S3ConfigRepository

File: libs/tradai-common/src/tradai/common/aws/s3_config_repository.py

Provides CRUD for YAML config files in S3. Uses OmegaConf for YAML parsing to maintain consistency with Hydra patterns.

Operation Method Details
Read download(config_name) Downloads {prefix}{name}.yaml, parses via OmegaConf
Write upload(config_name, config) Serializes dict to YAML, uploads to S3
List list_configs() Paginates S3 listing, strips .yaml/.yml extensions
Exists exists(config_name) HEAD object check
Merge+Write merge_and_upload(config_name, updates) Downloads, shallow-merges, re-uploads

Key design decisions:

  • Temp file pattern: Downloads to tempfile, parses, then deletes (no local state)
  • DI-friendly: Optional client parameter for test injection
  • Sanitized errors: User input in error messages passes through sanitize_for_display to prevent XSS

4.2 ConfigMergeService

File: libs/tradai-common/src/tradai/common/config/merge.py

Orchestrates config loading, OmegaConf deep merge, and validation.

S3ConfigRepository.download() -> base dict
                                    |
                    OmegaConf.merge(base, overrides)
                                    |
                          validate_config()
                           /              \
                  required_fields     Pydantic schema
                  (presence check)    (type + constraint)
                                    |
                              merged dict
  • Deep merge: Nested dictionaries merge recursively; override values take precedence
  • Dual validation: Required-field presence check AND optional Pydantic schema validation
  • Thread-safe: Lazy repository initialization with double-checked locking
  • Convenience method: merge_configs() combines load + merge + validate in one call

4.3 StrategyConfigLoader

File: libs/tradai-common/src/tradai/common/config/loader.py

High-level facade wrapping ConfigMergeService with strategy-specific concerns:

Method Flow
load(name) S3 download -> validate -> StrategyConfig entity
load_with_overrides(name, overrides) S3 download -> deep merge -> validate -> entity
load_from_mlflow(name, stage) MLflow registry -> extract tags -> S3 download -> merge metadata -> entity
load_with_fallback(name, stage) Try MLflow first, fall back to direct S3 on failure

StrategyConfig is a frozen Pydantic model with validated fields:

  • name (min 1 char), version (semver pattern), timeframe (pattern ^\d+[mhd]$)
  • pairs, stake_currency, exchange, parameters, buy_params, sell_params, freqai

MLflow integration uses a table-driven tag extractor pattern (_TAG_EXTRACTORS) to map model version tags to config overrides with type-specific parsers (str, csv_list, int, bool).

4.4 Config Versioning

File: libs/tradai-common/src/tradai/common/config/service.py

ConfigVersionService provides content-addressable versioning with DynamoDB persistence:

  • Deduplication: SHA256 hash of normalized JSON; identical configs reuse existing versions
  • Version IDs: Format v{number}-{hash[:8]} (e.g., v3-a1b2c3d4)
  • Atomic activation: Activating a version atomically deprecates the previous active version
  • Repository: ConfigVersionRepository uses composite key (strategy_name, config_id) with GSIs for hash lookup and status queries

5. Secret Management

File: libs/tradai-common/src/tradai/common/aws/secrets_manager.py

Secret Handling Rules

  • Never hardcode secret names, ARNs, or values in source code
  • Pass secret names as parameters or read from environment variables
  • Use get_secret(secret_name) which returns parsed JSON dict
  • All retrieval is thread-safe via threading.Lock

5.1 Secret Retrieval Flow

Environment Variable (e.g., MLFLOW_SECRET_NAME)
        |
        v
get_secret(secret_name, client=None)
        |
        v
get_secrets_client()  ──> get_aws_client("secretsmanager")
        |
        v
sm_client.get_secret_value(SecretId=...)
        |
        v
json.loads(response["SecretString"]) -> dict

5.2 Error Handling

AWS Error Code Raised Exception
ResourceNotFoundException ExternalServiceError("Secret not found: ...")
AccessDeniedException ExternalServiceError("Access denied: ...")
InvalidRequestException ExternalServiceError("Invalid request: ...")
JSON decode failure ExternalServiceError("... not valid JSON")

5.3 Rotation Status

get_rotation_status(secret_id) queries Secrets Manager for rotation metadata, returning:

  • rotation_enabled (bool), rotation_lambda_arn, rotation_interval_days
  • last_rotated_date, next_rotation_date (ISO format strings)

5.4 RDS Managed Passwords

RDS credentials use AWS-managed password rotation configured in infrastructure:

# From infra config
RDS_CONFIG = {
    "engine": "postgres",
    "engine_version": "15.13",
    "backup_retention": 7 if ENVIRONMENT != "prod" else 14,
    # Password managed by Secrets Manager with automatic rotation
}

Injection Safety

The get_secret() function uses _client_factory.get_aws_client() with optional DI for testing. In production, the factory creates real boto3 clients. In tests, pass a mock client directly.

6. Service Settings Pattern

File: libs/tradai-common/src/tradai/common/base_settings.py

All services inherit from tradai.common.Settings (which extends Pydantic BaseSettings):

pydantic_settings.BaseSettings
        |
tradai.common.Settings (frozen, extra="ignore")
    - service_name, service_version, host, port, debug, log_level, cors_origins
    - prepare_config() for S3/local config loading
    - from_hydra_cfg() for Hydra integration
        |
    +-- MLflowSettingsMixin (mlflow_tracking_uri, credentials, verify_ssl)
    +-- CognitoSettingsMixin (user_pool_id, client_id, region, auth_enabled)
        |
BackendSettings (env_prefix="BACKEND_")
    - environment, data_collection_url, strategy_service_url
    - executor_mode, ecs_cluster, stepfunctions_state_machine_arn
    - s3_config_bucket, dynamodb_table_name, aws_region
    - Validators: URL format, SQS queue URL, S3 bucket name, ARN format

6.1 Key Design Decisions

Decision Implementation
Immutable frozen=True -- settings cannot change after creation
Env prefix Each service uses its own prefix (e.g., BACKEND_, STRATEGY_)
Alias chains AliasChoices for flexible env var names (e.g., BACKEND_AWS_REGION or AWS_REGION)
Singleton @lru_cache on get_settings() -- loaded once per process
Validation Field validators for URLs, ARNs, bucket names, comma-separated lists
Environment safety Model validator prevents LOCAL executor mode in non-dev environments
Multi-source env_file=("services/backend/.env", ".env") -- service-local first, then root

Config Loading Priority

Pydantic Settings resolution order (highest to lowest): Init kwargs > Environment variables > Env file > Field defaults. The S3_CONFIG env var can point to a remote YAML config downloaded at startup via Settings.prepare_config().

7. Config Version Lifecycle

File: libs/tradai-common/src/tradai/common/entities/config_version.py

stateDiagram-v2
    [*] --> DRAFT: create_version()
    DRAFT --> ACTIVE: activate()
    DRAFT --> DEPRECATED: deprecate()
    ACTIVE --> DEPRECATED: activate(newer) / deprecate()
    DEPRECATED --> [*]: TTL auto-cleanup (90 days)

    note right of DRAFT
        Initial state.
        Content-addressable via SHA256.
        Deduplication prevents duplicate versions.
    end note

    note right of ACTIVE
        Only ONE active version per strategy.
        deployed_at timestamp set automatically.
        Activation atomically deprecates previous.
    end note

    note right of DEPRECATED
        Terminal state.
        deprecated_at + superseded_by recorded.
        DynamoDB TTL auto-deletes after 90 days.
    end note

7.1 ConfigVersion Entity

Field Type Description
strategy_name str DynamoDB partition key
config_id str Sort key, format v{N}-{hash[:8]}
config_hash str SHA256 of normalized JSON (64 hex chars)
config_data dict Frozen configuration content
status ConfigVersionStatus DRAFT / ACTIVE / DEPRECATED
version_number int Sequential per strategy (>= 1)
created_by str User or system identifier
deployed_at datetime | None Set when transitioning to ACTIVE
deprecated_at datetime | None Set when transitioning to DEPRECATED
superseded_by str | None config_id of the version that replaced this one
ttl int | None Unix epoch for DynamoDB auto-cleanup (90 days after deprecation)

The entity uses DynamoDBSerializableMixin for automatic serialization and the immutable update pattern via with_status().

7.2 DynamoDB Schema

Access Pattern Key / Index Query
Get by ID PK: strategy_name, SK: config_id get_item
Get active GSI status-index: PK strategy_name, SK status query where status = "active"
Deduplicate GSI config_hash-index: PK config_hash query for existence check
List versions PK: strategy_name query with ScanIndexForward=False (newest first)

8. Environment-Specific Overrides Table

Setting Dev Staging Prod
ECS desired_count (backend) 1 1 2
RDS instance class db.t4g.micro db.t4g.micro db.t4g.small
RDS allocated storage 20 GB 20 GB 50 GB
RDS multi-AZ No No Yes
RDS backup retention 7 days 7 days 14 days
Consolidated EC2 mode Yes (cost savings) Yes (cost savings) No (ECS Fargate)
CloudWatch log retention 30 days 30 days 90 days
CORS allowed origins ["*"] [] (none) [] (none)
Live trading Spot N/A (count=0) N/A (count=0) No (reliability)
Dry-run trading Spot N/A (count=0) N/A (count=0) Yes (cost savings)
S3 results lifecycle Glacier after 30d Glacier after 30d Glacier after 30d
S3 logs lifecycle Delete after 90d Delete after 90d Delete after 90d
API throttle (default) 100 req/s, burst 200 100 req/s, burst 200 100 req/s, burst 200
API throttle (backtests) 10 req/s, burst 20 10 req/s, burst 20 10 req/s, burst 20
Local executor allowed Yes No (validator blocks) No (validator blocks)
WS secret key Auto-generated Must be explicit Must be explicit
Cognito MFA Required Required Required

Changelog

Version Date Changes
1.0.0 2026-03-28 Initial creation — dual config system documented

Dependencies

If This Changes Update This Doc
infra/shared/tradai_infra_shared/config.py Infrastructure config (Section 3)
libs/tradai-common/src/tradai/common/config/ Application config (Section 4)
libs/tradai-common/src/tradai/common/aws/secrets_manager.py Secret management (Section 5)
New Pydantic Settings class added Service settings pattern (Section 6)