TradAI Configuration Management¶
Version: 1.0.0 | Date: 2026-03-28 | Status: CURRENT
1. TL;DR¶
TradAI uses two distinct configuration systems that operate at different lifecycle stages:
- Infrastructure configuration (
infra/shared/config.py) defines AWS resources at deployment time via Pulumi. It sets VPC CIDRs, ECS task sizes, S3 bucket names, DynamoDB tables, and Lambda schedules. Values are environment-specific (dev/staging/prod) and resolved atpulumi up. - Application configuration runs at service startup and runtime. YAML config files live in S3, loaded by
S3ConfigRepository, merged viaConfigMergeService(OmegaConf deep merge), and versioned in DynamoDB throughConfigVersionService. Services bind environment variables to PydanticBaseSettingssubclasses with validation, type coercion, and immutability (frozen=True). - Secrets are stored in AWS Secrets Manager with thread-safe retrieval, never hardcoded. RDS uses managed password rotation.
2. Configuration Systems Overview¶
graph TD
subgraph "Infrastructure Config (Deploy Time)"
A[infra/.env] -->|AWS_PROFILE, PULUMI_CONFIG_PASSPHRASE| B[Pulumi CLI]
C[infra/shared/config.py] -->|VPC, Services, S3, DynamoDB| B
D[pulumi config set] -->|Overrides| B
B -->|pulumi up| E[AWS Resources]
end
subgraph "Application Config (Runtime)"
F[S3 Bucket: tradai-configs-ENV] -->|YAML download| G[S3ConfigRepository]
G -->|raw dict| H[ConfigMergeService]
I[Runtime Overrides] -->|OmegaConf deep merge| H
H -->|validated dict| J[StrategyConfigLoader]
J -->|StrategyConfig entity| K[Services]
L[.env files] -->|env vars| M[Pydantic BaseSettings]
M -->|validated, frozen| K
N[DynamoDB: config-versions] <-->|CRUD| O[ConfigVersionService]
O -->|versioning, lifecycle| K
end
subgraph "Secrets (Runtime)"
P[AWS Secrets Manager] -->|get_secret| Q[Thread-safe Client]
Q -->|JSON dict| K
end
E -.->|bucket names, table names, ARNs| L 3. Infrastructure Configuration¶
Infrastructure configuration lives in infra/shared/tradai_infra_shared/config.py and is consumed by Pulumi stacks at deployment time.
3.1 Config Structure¶
The file defines constants, typed dictionaries, and helper functions consumed by four Pulumi stacks (persistent, foundation, compute, edge):
| Section | Constants | Purpose |
|---|---|---|
| Environment | ENVIRONMENT, PROJECT_NAME, AWS_REGION | Stack identity; region from Pulumi config, env var, or default |
| Network | VPC_CIDR, SUBNETS, AVAILABILITY_ZONES | VPC layout with public/private/database subnets |
| Services | SERVICES dict | Per-service CPU, memory, port, desired count, health check path |
| Storage | S3_BUCKETS, DYNAMODB_TABLES, RDS_CONFIG | Resource names with {project}-{name}-{env} convention |
| Compute | EC2_CONSOLIDATED_CONFIG, NAT_INSTANCE_TYPE | Consolidated EC2 mode for dev/staging cost savings |
| ECR | ECR_REPOS, LAMBDA_ECR_REPOS, STRATEGY_ECR_REPOS | Container image repositories |
| API | API_ROUTES, API_THROTTLING, ALB_PATH_PATTERNS | API Gateway and ALB routing configuration |
| Auth | COGNITO_CONFIG | Password policy, MFA enforcement |
| Schedules | LAMBDA_SCHEDULES | EventBridge schedule expressions for periodic Lambdas |
3.2 Environment-Specific Behavior¶
Infrastructure config uses ENVIRONMENT = pulumi.get_stack() for conditional logic:
# Example: prod gets higher specs
"desired_count": 1 if ENVIRONMENT != "prod" else 2,
"instance_class": "db.t4g.micro" if ENVIRONMENT != "prod" else "db.t4g.small",
"multi_az": ENVIRONMENT == "prod",
3.3 Pulumi Config Overrides¶
Runtime overrides via pulumi config set:
| Override | Command | Effect |
|---|---|---|
| Consolidated mode | pulumi config set consolidated_mode "false" | Switch from EC2 to ECS Fargate |
| Skip ALB | pulumi config set skip_alb "true" | Skip ALB for restricted accounts |
| Custom AMI | pulumi config set custom_ami_id "ami-xxx" | Use Packer-built AMI |
| Lambda schedules | pulumi config set drift_monitor_schedule "rate(1 day)" | Override check frequency |
| Strategy ECR repos | pulumi config set --path 'strategy_ecr_repos[0]' 'my-strat' | Register strategy images |
| CORS origins | pulumi config set --path 'cors_allowed_origins[0]' 'https://...' | Set allowed origins |
Config Sourcing
infra/.env is sourced automatically by justfile recipes (set -a && source ../.env && set +a) before any Pulumi command. It provides AWS_PROFILE, PULUMI_CONFIG_PASSPHRASE, and S3_PULUMI_BACKEND_URL.
4. Application Configuration¶
Application configuration is loaded at runtime from S3 and optionally merged with overrides. Three layers work together.
4.1 S3ConfigRepository¶
File: libs/tradai-common/src/tradai/common/aws/s3_config_repository.py
Provides CRUD for YAML config files in S3. Uses OmegaConf for YAML parsing to maintain consistency with Hydra patterns.
| Operation | Method | Details |
|---|---|---|
| Read | download(config_name) | Downloads {prefix}{name}.yaml, parses via OmegaConf |
| Write | upload(config_name, config) | Serializes dict to YAML, uploads to S3 |
| List | list_configs() | Paginates S3 listing, strips .yaml/.yml extensions |
| Exists | exists(config_name) | HEAD object check |
| Merge+Write | merge_and_upload(config_name, updates) | Downloads, shallow-merges, re-uploads |
Key design decisions:
- Temp file pattern: Downloads to
tempfile, parses, then deletes (no local state) - DI-friendly: Optional
clientparameter for test injection - Sanitized errors: User input in error messages passes through
sanitize_for_displayto prevent XSS
4.2 ConfigMergeService¶
File: libs/tradai-common/src/tradai/common/config/merge.py
Orchestrates config loading, OmegaConf deep merge, and validation.
S3ConfigRepository.download() -> base dict
|
OmegaConf.merge(base, overrides)
|
validate_config()
/ \
required_fields Pydantic schema
(presence check) (type + constraint)
|
merged dict
- Deep merge: Nested dictionaries merge recursively; override values take precedence
- Dual validation: Required-field presence check AND optional Pydantic schema validation
- Thread-safe: Lazy repository initialization with double-checked locking
- Convenience method:
merge_configs()combines load + merge + validate in one call
4.3 StrategyConfigLoader¶
File: libs/tradai-common/src/tradai/common/config/loader.py
High-level facade wrapping ConfigMergeService with strategy-specific concerns:
| Method | Flow |
|---|---|
load(name) | S3 download -> validate -> StrategyConfig entity |
load_with_overrides(name, overrides) | S3 download -> deep merge -> validate -> entity |
load_from_mlflow(name, stage) | MLflow registry -> extract tags -> S3 download -> merge metadata -> entity |
load_with_fallback(name, stage) | Try MLflow first, fall back to direct S3 on failure |
StrategyConfig is a frozen Pydantic model with validated fields:
name(min 1 char),version(semver pattern),timeframe(pattern^\d+[mhd]$)pairs,stake_currency,exchange,parameters,buy_params,sell_params,freqai
MLflow integration uses a table-driven tag extractor pattern (_TAG_EXTRACTORS) to map model version tags to config overrides with type-specific parsers (str, csv_list, int, bool).
4.4 Config Versioning¶
File: libs/tradai-common/src/tradai/common/config/service.py
ConfigVersionService provides content-addressable versioning with DynamoDB persistence:
- Deduplication: SHA256 hash of normalized JSON; identical configs reuse existing versions
- Version IDs: Format
v{number}-{hash[:8]}(e.g.,v3-a1b2c3d4) - Atomic activation: Activating a version atomically deprecates the previous active version
- Repository:
ConfigVersionRepositoryuses composite key (strategy_name,config_id) with GSIs for hash lookup and status queries
5. Secret Management¶
File: libs/tradai-common/src/tradai/common/aws/secrets_manager.py
Secret Handling Rules
- Never hardcode secret names, ARNs, or values in source code
- Pass secret names as parameters or read from environment variables
- Use
get_secret(secret_name)which returns parsed JSON dict - All retrieval is thread-safe via
threading.Lock
5.1 Secret Retrieval Flow¶
Environment Variable (e.g., MLFLOW_SECRET_NAME)
|
v
get_secret(secret_name, client=None)
|
v
get_secrets_client() ──> get_aws_client("secretsmanager")
|
v
sm_client.get_secret_value(SecretId=...)
|
v
json.loads(response["SecretString"]) -> dict
5.2 Error Handling¶
| AWS Error Code | Raised Exception |
|---|---|
ResourceNotFoundException | ExternalServiceError("Secret not found: ...") |
AccessDeniedException | ExternalServiceError("Access denied: ...") |
InvalidRequestException | ExternalServiceError("Invalid request: ...") |
| JSON decode failure | ExternalServiceError("... not valid JSON") |
5.3 Rotation Status¶
get_rotation_status(secret_id) queries Secrets Manager for rotation metadata, returning:
rotation_enabled(bool),rotation_lambda_arn,rotation_interval_dayslast_rotated_date,next_rotation_date(ISO format strings)
5.4 RDS Managed Passwords¶
RDS credentials use AWS-managed password rotation configured in infrastructure:
# From infra config
RDS_CONFIG = {
"engine": "postgres",
"engine_version": "15.13",
"backup_retention": 7 if ENVIRONMENT != "prod" else 14,
# Password managed by Secrets Manager with automatic rotation
}
Injection Safety
The get_secret() function uses _client_factory.get_aws_client() with optional DI for testing. In production, the factory creates real boto3 clients. In tests, pass a mock client directly.
6. Service Settings Pattern¶
File: libs/tradai-common/src/tradai/common/base_settings.py
All services inherit from tradai.common.Settings (which extends Pydantic BaseSettings):
pydantic_settings.BaseSettings
|
tradai.common.Settings (frozen, extra="ignore")
- service_name, service_version, host, port, debug, log_level, cors_origins
- prepare_config() for S3/local config loading
- from_hydra_cfg() for Hydra integration
|
+-- MLflowSettingsMixin (mlflow_tracking_uri, credentials, verify_ssl)
+-- CognitoSettingsMixin (user_pool_id, client_id, region, auth_enabled)
|
BackendSettings (env_prefix="BACKEND_")
- environment, data_collection_url, strategy_service_url
- executor_mode, ecs_cluster, stepfunctions_state_machine_arn
- s3_config_bucket, dynamodb_table_name, aws_region
- Validators: URL format, SQS queue URL, S3 bucket name, ARN format
6.1 Key Design Decisions¶
| Decision | Implementation |
|---|---|
| Immutable | frozen=True -- settings cannot change after creation |
| Env prefix | Each service uses its own prefix (e.g., BACKEND_, STRATEGY_) |
| Alias chains | AliasChoices for flexible env var names (e.g., BACKEND_AWS_REGION or AWS_REGION) |
| Singleton | @lru_cache on get_settings() -- loaded once per process |
| Validation | Field validators for URLs, ARNs, bucket names, comma-separated lists |
| Environment safety | Model validator prevents LOCAL executor mode in non-dev environments |
| Multi-source | env_file=("services/backend/.env", ".env") -- service-local first, then root |
Config Loading Priority
Pydantic Settings resolution order (highest to lowest): Init kwargs > Environment variables > Env file > Field defaults. The S3_CONFIG env var can point to a remote YAML config downloaded at startup via Settings.prepare_config().
7. Config Version Lifecycle¶
File: libs/tradai-common/src/tradai/common/entities/config_version.py
stateDiagram-v2
[*] --> DRAFT: create_version()
DRAFT --> ACTIVE: activate()
DRAFT --> DEPRECATED: deprecate()
ACTIVE --> DEPRECATED: activate(newer) / deprecate()
DEPRECATED --> [*]: TTL auto-cleanup (90 days)
note right of DRAFT
Initial state.
Content-addressable via SHA256.
Deduplication prevents duplicate versions.
end note
note right of ACTIVE
Only ONE active version per strategy.
deployed_at timestamp set automatically.
Activation atomically deprecates previous.
end note
note right of DEPRECATED
Terminal state.
deprecated_at + superseded_by recorded.
DynamoDB TTL auto-deletes after 90 days.
end note 7.1 ConfigVersion Entity¶
| Field | Type | Description |
|---|---|---|
strategy_name | str | DynamoDB partition key |
config_id | str | Sort key, format v{N}-{hash[:8]} |
config_hash | str | SHA256 of normalized JSON (64 hex chars) |
config_data | dict | Frozen configuration content |
status | ConfigVersionStatus | DRAFT / ACTIVE / DEPRECATED |
version_number | int | Sequential per strategy (>= 1) |
created_by | str | User or system identifier |
deployed_at | datetime | None | Set when transitioning to ACTIVE |
deprecated_at | datetime | None | Set when transitioning to DEPRECATED |
superseded_by | str | None | config_id of the version that replaced this one |
ttl | int | None | Unix epoch for DynamoDB auto-cleanup (90 days after deprecation) |
The entity uses DynamoDBSerializableMixin for automatic serialization and the immutable update pattern via with_status().
7.2 DynamoDB Schema¶
| Access Pattern | Key / Index | Query |
|---|---|---|
| Get by ID | PK: strategy_name, SK: config_id | get_item |
| Get active | GSI status-index: PK strategy_name, SK status | query where status = "active" |
| Deduplicate | GSI config_hash-index: PK config_hash | query for existence check |
| List versions | PK: strategy_name | query with ScanIndexForward=False (newest first) |
8. Environment-Specific Overrides Table¶
| Setting | Dev | Staging | Prod |
|---|---|---|---|
| ECS desired_count (backend) | 1 | 1 | 2 |
| RDS instance class | db.t4g.micro | db.t4g.micro | db.t4g.small |
| RDS allocated storage | 20 GB | 20 GB | 50 GB |
| RDS multi-AZ | No | No | Yes |
| RDS backup retention | 7 days | 7 days | 14 days |
| Consolidated EC2 mode | Yes (cost savings) | Yes (cost savings) | No (ECS Fargate) |
| CloudWatch log retention | 30 days | 30 days | 90 days |
| CORS allowed origins | ["*"] | [] (none) | [] (none) |
| Live trading Spot | N/A (count=0) | N/A (count=0) | No (reliability) |
| Dry-run trading Spot | N/A (count=0) | N/A (count=0) | Yes (cost savings) |
| S3 results lifecycle | Glacier after 30d | Glacier after 30d | Glacier after 30d |
| S3 logs lifecycle | Delete after 90d | Delete after 90d | Delete after 90d |
| API throttle (default) | 100 req/s, burst 200 | 100 req/s, burst 200 | 100 req/s, burst 200 |
| API throttle (backtests) | 10 req/s, burst 20 | 10 req/s, burst 20 | 10 req/s, burst 20 |
| Local executor allowed | Yes | No (validator blocks) | No (validator blocks) |
| WS secret key | Auto-generated | Must be explicit | Must be explicit |
| Cognito MFA | Required | Required | Required |
Changelog¶
| Version | Date | Changes |
|---|---|---|
| 1.0.0 | 2026-03-28 | Initial creation — dual config system documented |
Dependencies¶
| If This Changes | Update This Doc |
|---|---|
infra/shared/tradai_infra_shared/config.py | Infrastructure config (Section 3) |
libs/tradai-common/src/tradai/common/config/ | Application config (Section 4) |
libs/tradai-common/src/tradai/common/aws/secrets_manager.py | Secret management (Section 5) |
| New Pydantic Settings class added | Service settings pattern (Section 6) |