TradAI Final Architecture - Security¶
Version: 10.0.0 | Date: 2026-03-28 | Source: infra/persistent/modules/, infra/compute/modules/iam.py, infra/edge/modules/waf.py, infra/shared/tradai_infra_shared/core/policy_builder.py
TL;DR: 5-layer defense: WAF (rate limiting + OWASP rules) at API Gateway, Cognito JWT auth (MFA required, TOTP), security groups per service tier, SSE-S3 encryption at rest with SSL-enforced RDS, and CloudTrail audit logging with S3 data events. IAM follows least-privilege via
PolicyBuilder.
Security Architecture¶
flowchart TD
Internet[Internet Traffic]
Internet --> WAF["1. WAF<br/>Rate Limit · OWASP · SQLi"]
WAF --> Cognito["2. Cognito JWT Auth<br/>MFA Required · TOTP"]
Cognito --> SGs["3. Security Groups<br/>ALB → ECS → RDS"]
SGs --> Encryption["4. Encryption<br/>SSE-S3 · RDS SSL · Secrets Manager"]
Encryption --> Audit["5. CloudTrail<br/>Management + S3 Data Events"]
style WAF fill:#f9f,stroke:#333
style Cognito fill:#bbf,stroke:#333
style SGs fill:#bfb,stroke:#333
style Encryption fill:#ffb,stroke:#333
style Audit fill:#fbb,stroke:#333 1. Authentication (Cognito)¶
User Pool Configuration¶
| Property | Value |
|---|---|
| Name | tradai-users-{env} |
| Username attribute | Email (case-insensitive) |
| MFA | Required (ON) -- TOTP only (no SMS) |
| Password min length | 12 |
| Require lowercase | Yes |
| Require uppercase | Yes |
| Require numbers | Yes |
| Require symbols | Yes |
| Temporary password validity | 7 days |
| Account recovery | Email only (verified_email, priority 1) |
| Auto-verified attributes | |
| Email sending | COGNITO_DEFAULT |
| Deletion protection | ACTIVE in prod, INACTIVE otherwise |
| Schema attributes | email (String, required, mutable) |
No custom attributes (custom:org, custom:role) are defined. No advanced security mode is enabled.
Public Client (Web/CLI)¶
| Property | Value |
|---|---|
| Name | tradai-api-client-{env} |
| Client secret | None (public client) |
| OAuth flows | code only (no implicit) |
| OAuth scopes | email, openid, profile |
| Auth flows | ALLOW_REFRESH_TOKEN_AUTH, ALLOW_USER_SRP_AUTH |
| Access token validity | 1 hour |
| ID token validity | 1 hour |
| Refresh token validity | 30 days |
| Callback URLs | http://localhost:3000/callback, http://localhost:8400/callback |
| Logout URLs | http://localhost:3000/logout, http://localhost:8400/logout |
| Identity providers | COGNITO only |
| Prevent user existence errors | Enabled |
Note: Development-only callback URL. Production user pool should have only HTTPS callback URLs.
Machine-to-Machine Client (CI/CD)¶
| Property | Value |
|---|---|
| Name | tradai-m2m-client-{env} |
| Client secret | Generated (stored in Secrets Manager for prod) |
| OAuth flows | client_credentials only |
| OAuth scopes | tradai-api/read, tradai-api/write, tradai-api/admin |
| Auth flows | None (empty list) |
| Access token validity | 1 hour |
| ID token validity | 1 hour |
| Identity providers | COGNITO only |
Resource Server¶
| Property | Value |
|---|---|
| Identifier | tradai-api |
| Scopes | read (read access), write (write access), admin (administrative access) |
User Pool Domain¶
Cognito hosted UI at tradai-{env}.auth.eu-central-1.amazoncognito.com.
Source: infra/persistent/modules/cognito.py
2. IAM Roles¶
All roles use PolicyBuilder for DRY policy creation (infra/shared/tradai_infra_shared/core/policy_builder.py). The builder provides fluent methods like .with_dynamodb_crud(), .with_s3_readwrite(), .with_secrets_read() that generate least-privilege statements with resource patterns scoped to tradai-*.
Role Inventory¶
| Role | Trust Principal | Purpose | Key Permissions |
|---|---|---|---|
tradai-ecs-execution-{env} | ecs-tasks.amazonaws.com | ECS agent operations | ECR pull, CloudWatch Logs, Secrets Manager read |
tradai-ecs-task-{env} | ecs-tasks.amazonaws.com | Container runtime | DynamoDB CRUD, S3 R/W, Secrets Manager, CloudWatch metrics, SNS publish, CodeArtifact read, RDS secrets |
tradai-lambda-role-{env} | lambda.amazonaws.com | All Lambda functions (shared) | Basic execution, VPC access, ECS RunTask, SQS, DynamoDB (scoped to used tables), S3, Secrets Manager, SNS, CloudWatch |
tradai-cli-ci-{env} | IAM users (same account, sts:ExternalId=tradai-cli) | CLI/CI strategy lifecycle | ECS service management, DynamoDB (tradai-deployments-*), ECR describe, iam:PassRole for task/execution roles |
tradai-consolidated-{env} | ec2.amazonaws.com | Consolidated EC2 (dev/staging) | ECR read-only, CloudWatch agent, SSM, DynamoDB CRUD, S3 R/W, SQS, Secrets Manager, RDS secrets, CloudWatch metrics, SNS, CodeArtifact, Service Discovery register, ASG lifecycle |
tradai-nat-role-{env} | ec2.amazonaws.com | NAT instance | ec2:AssociateAddress, ec2:ModifyInstanceAttribute, ec2:DescribeInstances |
tradai-nat-lambda-role-{env} | lambda.amazonaws.com | NAT route updater Lambda | CloudWatch Logs, ec2:CreateRoute/ReplaceRoute/DeleteRoute, autoscaling:CompleteLifecycleAction |
tradai-flow-logs-role-{env} | vpc-flow-logs.amazonaws.com | VPC Flow Logs delivery | CloudWatch Logs write |
tradai-cloudtrail-role-{env} | cloudtrail.amazonaws.com | CloudTrail delivery | CloudWatch Logs write |
Lambda Role Design¶
Lambda functions share a single execution role (tradai-lambda-role-{env}), not per-function roles. The role includes managed policies (AWSLambdaBasicExecutionRole, AWSLambdaVPCAccessExecutionRole) plus an inline policy granting ECS RunTask, SQS, DynamoDB (scoped to specific table names that are actually provisioned), S3, Secrets Manager, SNS, and CloudWatch access.
Shared Lambda Role = Wide Blast Radius
All Lambda functions share a single IAM execution role (tradai-lambda-role-{env}) with permissions across ECS, DynamoDB, S3, SQS, and SNS. A compromised Lambda (e.g., promote-model) could invoke ecs:RunTask or modify any tradai-* DynamoDB table. Consider per-function roles for production to reduce blast radius.
Consolidated Role (dev/staging only)¶
Only created when is_consolidated_mode() returns True. Combines ECS task role permissions with EC2-specific access (ECR read-only, SSM for Session Manager, SQS for backtest queue, Service Discovery registration, ASG lifecycle hooks). Uses PolicyBuilder to eliminate ~80 lines of duplicate JSON.
Source: infra/compute/modules/iam.py, infra/foundation/modules/nat_instance.py
3. Encryption¶
S3 Buckets¶
All buckets (configs, results, arcticdb, logs, mlflow) use:
| Property | Value |
|---|---|
| Encryption | SSE-S3 (AES256) with bucket key enabled |
| Public access | Fully blocked (all 4 public access block settings enabled) |
| Versioning | Enabled on all except logs |
Lifecycle rules are configured per bucket where applicable: - results: Glacier transition at 30 days - logs: Delete at 90 days - configs, arcticdb, mlflow: No lifecycle rules
RDS PostgreSQL¶
| Property | Value |
|---|---|
| Storage encryption | storage_encrypted=True |
| SSL enforcement | rds.force_ssl=1 via parameter group |
| Master password | Managed by Secrets Manager (manage_master_user_password=True) |
| Publicly accessible | False |
| Performance Insights | Enabled (7-day retention) |
| Log exports | PostgreSQL logs to CloudWatch |
| Parameter group | log_statement=ddl, log_min_duration_statement=1000 (slow queries >1s) |
DynamoDB¶
DynamoDB tables use AWS default encryption (AWS owned keys). No customer-managed KMS keys are configured.
Source: infra/persistent/modules/s3.py, infra/foundation/modules/rds.py
4. WAF (Web Application Firewall)¶
WAF Not Currently Effective
The WebACL is created but not associated with any resource. WAFv2 cannot parse the $default stage ARN of HTTP APIs. Until the API Gateway is migrated to REST API or the WAF is associated with the ALB, the WAF rules provide no protection. See 09-PULUMI-CODE.md Section 7 for details.
The WAF WebACL is created with scope REGIONAL and is intended for API Gateway association.
Rules¶
| Priority | Rule Name | Type | Action | Description |
|---|---|---|---|---|
| 1 | RateLimitRule | Rate-based | Block | 100 requests per 5 minutes per IP |
| 2 | AWSManagedRulesCommonRuleSet | Managed (AWS) | Override: none | OWASP Top 10 protection |
| 3 | AWSManagedRulesKnownBadInputsRuleSet | Managed (AWS) | Override: none | Known malicious input patterns |
| 4 | AWSManagedRulesSQLiRuleSet | Managed (AWS) | Override: none | SQL injection protection |
Default action: Allow (only matched rules block/count).
WAF Logging¶
| Property | Value |
|---|---|
| Log destination | CloudWatch Logs (aws-waf-logs-tradai-{env}) |
| Retention | 30 days (dev/staging), 90 days (prod) |
| Logged requests | All (blocked, allowed, counted) |
CloudWatch Metrics¶
All rules have CloudWatch metrics enabled with sampled requests. Metric names: RateLimitRule, CommonRuleSet, KnownBadInputs, SQLiRuleSet, plus overall tradai-waf-metrics.
Source: infra/edge/modules/waf.py
5. CloudTrail¶
Trail Configuration¶
| Property | Value |
|---|---|
| Trail name | tradai-audit-trail-{env} |
| S3 bucket | tradai-logs-{env} (prefix: cloudtrail/) |
| CloudWatch Logs | /aws/cloudtrail/tradai |
| Log retention | 30 days (dev/staging), 90 days (prod) |
| Multi-region | No (single-region trail saves ~$2/month vs multi-region. For compliance requirements or multi-region deployments, enable multi-region trail.) |
| Global service events | Yes |
| Log file validation | Enabled |
Event Selectors¶
| Type | Resources | Read/Write |
|---|---|---|
| Management events | All | All |
| S3 data events | tradai-configs-{env}/* | All |
| S3 data events | tradai-results-{env}/* | All |
No DynamoDB data events are configured. The code comment notes: "DynamoDB data events removed -- wildcards not supported by CloudTrail. Management events already track DynamoDB API calls. For item-level auditing, use DynamoDB Streams instead."
Insight Selectors¶
| Insight Type |
|---|
ApiCallRateInsight |
ApiErrorRateInsight |
S3 Bucket Policy¶
CloudTrail has a dedicated S3 bucket policy allowing s3:GetBucketAcl and s3:PutObject (with bucket-owner-full-control ACL condition) from the cloudtrail.amazonaws.com service principal, scoped to the specific trail ARN.
Source: infra/persistent/modules/cloudtrail.py
Verified Security Controls
- Cognito MFA is required (TOTP only, no SMS) with 12-character minimum passwords.
- WAF defines rate limiting (100 req/5min) plus 3 AWS managed rule sets (OWASP, bad inputs, SQLi), but is not currently associated with any resource (see Section 4).
- All S3 buckets have public access fully blocked and SSE-S3 encryption enabled.
- RDS enforces SSL via parameter group (
rds.force_ssl=1) with Secrets Manager password rotation. - CloudTrail logs management events + S3 data events with log file validation enabled.
- IAM roles use
PolicyBuilderfor least-privilege, scoped totradai-*resource patterns.
6. Network Security¶
Network security is covered in detail in 03-VPC-NETWORKING.md. Key points:
- Security groups enforce stateful rules: ALB -> ECS/Consolidated (ports 8000-8003, 5000), Lambda -> ECS/Consolidated (same ports), ECS/Lambda -> RDS (5432), NAT accepts ALL TCP from private subnets
- NACLs provide stateless defense-in-depth at the subnet boundary; database tier only allows PostgreSQL from private subnets
- VPC endpoints keep S3, DynamoDB, ECR, STS, Secrets Manager, CloudWatch, SSM, and SQS traffic within the AWS network
- VPC Flow Logs capture all traffic to CloudWatch Logs with 7-day retention
7. What's NOT Implemented¶
Known Security Gaps
The following features are documented as planned but not yet implemented in infrastructure code. Prioritize security headers and ALB access logs for production readiness.
The following security features are documented as planned but are not present in the current infrastructure code:
| Gap | Description |
|---|---|
| Security headers middleware | No X-Content-Type-Options, X-Frame-Options, Strict-Transport-Security, Content-Security-Policy headers are added by application middleware or ALB |
| Advanced Cognito security | user_pool_add_ons with ENFORCED advanced security mode is not configured (no adaptive authentication, compromised credential checks, or risk-based MFA) |
| Security alarms | No CloudWatch Alarms for security events (e.g., unauthorized API calls, root account usage, console sign-in without MFA) |
| ALB access logs | ALB does not have access_logs configured to an S3 bucket |
| DisableExecuteApiEndpoint | API Gateway disable_execute_api_endpoint is not set, so the default execute-api endpoint remains accessible alongside any custom domain |
| KMS customer-managed keys | S3 uses SSE-S3 (AES256), not KMS CMKs; DynamoDB uses AWS-owned keys; no envelope encryption for sensitive config values |
Changelog¶
| Version | Date | Changes |
|---|---|---|
| 10.0.0 | 2026-03-28 | Full regeneration. Corrected Cognito (TOTP only), WAF (API Gateway not ALB), honest gaps section |
Dependencies¶
| If This Changes | Update This Doc |
|---|---|
infra/persistent/modules/cognito.py | Authentication section |
infra/edge/modules/waf.py | WAF rules section |
infra/persistent/modules/cloudtrail.py | Audit section |
infra/compute/modules/iam.py | IAM roles section |