Security Incident Response Runbook¶

Procedures for handling security incidents including credential exposure, unauthorized access, and security policy violations.

Severity Classification¶

Level	Description	Response Time	Examples
P1	Active breach, data exposure	Immediate (<15 min)	Compromised AWS creds, unauthorized trading
P2	Suspected compromise	< 1 hour	Suspicious API activity, failed auth spike
P3	Policy violation	< 4 hours	Exposed secrets in logs, missing encryption

API Key/Secret Exposure¶

Symptoms¶

Secrets found in logs, code commits, or public repositories
Unexpected API calls from unknown sources
Exchange reporting unusual activity

Immediate Actions (P1)¶

Identify exposed credentials:

# Check what was exposed
# - Exchange API keys (Binance, etc.)
# - AWS credentials
# - Database passwords
# - JWT signing keys

Rotate exchange API keys immediately:

# 1. Create new API key on exchange (manually via exchange UI)
# 2. Update Secrets Manager
aws secretsmanager put-secret-value \
  --secret-id tradai/${ENVIRONMENT}/exchange-keys \
  --secret-string '{"api_key":"NEW_KEY","api_secret":"NEW_SECRET"}'

# 3. Restart services to pick up new credentials
aws ecs update-service \
  --cluster tradai-${ENVIRONMENT} \
  --service tradai-strategy-service-${ENVIRONMENT} \
  --force-new-deployment

Revoke old exchange keys (via exchange UI)

Check for unauthorized activity:

# Review exchange order history for unauthorized trades
# Review API call logs for suspicious patterns

AWS Credential Rotation¶

If AWS credentials exposed:

Disable compromised credentials:

# For IAM user access keys
aws iam update-access-key \
  --user-name $USER_NAME \
  --access-key-id $COMPROMISED_KEY_ID \
  --status Inactive

# For IAM user
aws iam delete-login-profile --user-name $USER_NAME

Create new credentials:

aws iam create-access-key --user-name $USER_NAME

Review CloudTrail for unauthorized actions:

aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=AccessKeyId,AttributeValue=$COMPROMISED_KEY_ID \
  --start-time $(date -u -v-24H +%Y-%m-%dT%H:%M:%SZ)

Delete compromised credentials (after verification):

aws iam delete-access-key \
  --user-name $USER_NAME \
  --access-key-id $COMPROMISED_KEY_ID

JWT Token Invalidation¶

If JWT signing key compromised:

Cognito security configuration (canonical source: infra/shared/tradai_infra_shared/config.py):

Setting	Value
MFA	Required (TOTP)
Password minimum length	12 characters
Password requirements	Uppercase + lowercase + numbers + symbols
Access token lifetime	1 hour
Refresh token lifetime	30 days

Rotate Cognito app client secret:

# Generate new client secret
aws cognito-idp update-user-pool-client \
  --user-pool-id $USER_POOL_ID \
  --client-id $CLIENT_ID \
  --generate-secret

Force sign-out all users:

aws cognito-idp admin-user-global-sign-out \
  --user-pool-id $USER_POOL_ID \
  --username $USERNAME

Verify MFA is enforced (should always be TOTP):

aws cognito-idp describe-user-pool \
  --user-pool-id $USER_POOL_ID \
  --query 'UserPool.MfaConfiguration'

CodeArtifact Credential Rotation¶

CodeArtifact auth tokens expire after 12 hours. If credentials are suspected compromised or need rotation:

Revoke existing tokens by generating a new one (previous tokens are invalidated):

aws codeartifact get-authorization-token \
  --domain tradai \
  --domain-owner ${AWS_ACCOUNT_ID} \
  --query authorizationToken \
  --output text

Refresh all CI/CD pipeline credentials:

# Local development
just codeartifact-login dev

# Verify access
pip config get global.index-url

Rotate the domain policy if domain-level access is compromised:

# Review current domain policy
aws codeartifact get-domain-permissions-policy \
  --domain tradai \
  --domain-owner ${AWS_ACCOUNT_ID}

# Update policy to restrict access
aws codeartifact put-domain-permissions-policy \
  --domain tradai \
  --domain-owner ${AWS_ACCOUNT_ID} \
  --policy-document file://updated-domain-policy.json

Unauthorized Access Detected¶

Symptoms¶

Unexpected IAM activity in CloudTrail
Unknown IP addresses in access logs
Privilege escalation attempts
Resource creation outside normal patterns

Diagnosis¶

Review CloudTrail events:

# Recent suspicious events
aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=EventName,AttributeValue=CreateAccessKey \
  --start-time $(date -u -v-24H +%Y-%m-%dT%H:%M:%SZ) \
  --query 'Events[*].{Time:EventTime,User:Username,Event:EventName,Source:EventSource}'

Check for new IAM users/roles:

# List recently created users
aws iam list-users \
  --query "Users[?CreateDate>='$(date -u -v-7d +%Y-%m-%dT%H:%M:%SZ)']"

# List recently created roles
aws iam list-roles \
  --query "Roles[?CreateDate>='$(date -u -v-7d +%Y-%m-%dT%H:%M:%SZ)']"

Check API Gateway access logs: (Not currently configured in infrastructure. API Gateway access logging is not enabled.)

aws logs filter-log-events \
  --log-group-name /aws/api-gateway/tradai-${ENVIRONMENT} \
  --start-time $(date -u -v-24H +%s)000 \
  --filter-pattern '{ $.status = 401 || $.status = 403 }'

ECS Task Isolation¶

If a container is suspected compromised:

Stop the suspicious task:

aws ecs stop-task \
  --cluster tradai-${ENVIRONMENT} \
  --task $TASK_ARN \
  --reason "Security incident investigation"

Update security group to deny all egress (preserve for forensics):

# Create isolated security group
aws ec2 create-security-group \
  --group-name tradai-isolated-${ENVIRONMENT} \
  --description "Isolated for security investigation" \
  --vpc-id $VPC_ID

# No ingress or egress rules - fully isolated

Capture task logs for forensics:

aws logs get-log-events \
  --log-group-name /ecs/tradai/${ENVIRONMENT}/services \
  --log-stream-name $LOG_STREAM \
  --start-time $(date -u -v-24H +%s)000 \
  --output json > forensics-logs-$(date +%Y%m%d).json

Network Security Breach¶

Symptoms¶

Unexpected outbound connections
Data exfiltration attempts in VPC Flow Logs
Unusual traffic patterns

Diagnosis¶

Check VPC Flow Logs:

aws logs filter-log-events \
  --log-group-name /aws/vpc/tradai-flow-logs \
  --start-time $(date -u -v-24H +%s)000 \
  --filter-pattern '[version, accountid, interfaceid, srcaddr, dstaddr, srcport, dstport, protocol, packets, bytes, start, end, action=REJECT, logstatus]'

Look for unexpected destinations:

# Check for traffic to unexpected IPs
aws logs filter-log-events \
  --log-group-name /aws/vpc/tradai-flow-logs \
  --start-time $(date -u -v-1H +%s)000 \
  --filter-pattern '[version, accountid, interfaceid, srcaddr, dstaddr != 10.*, srcport, dstport, protocol, packets, bytes, start, end, action=ACCEPT, logstatus]'

Containment¶

Block suspicious IPs via WAF:

# Get WAF Web ACL
WAF_ACL_ID=$(aws wafv2 list-web-acls --scope REGIONAL --query "WebACLs[?Name=='tradai-${ENVIRONMENT}-waf'].Id" --output text)

# Add IP to block list (requires updating WAF rule)

Update security group to block IP:

# Add deny rule to NACL for immediate block
aws ec2 create-network-acl-entry \
  --network-acl-id $NACL_ID \
  --rule-number 50 \
  --protocol -1 \
  --rule-action deny \
  --egress \
  --cidr-block $SUSPICIOUS_IP/32

CloudTrail Audit Analysis¶

Common Queries¶

Failed API calls:

aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=ReadOnly,AttributeValue=false \
  --start-time $(date -u -v-24H +%Y-%m-%dT%H:%M:%SZ) \
  --query 'Events[?ErrorCode!=`null`].{Time:EventTime,Event:EventName,Error:ErrorCode,User:Username}'

Root account activity (should be rare):

aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=Username,AttributeValue=root \
  --start-time $(date -u -v-7d +%Y-%m-%dT%H:%M:%SZ)

IAM policy changes:

aws cloudtrail lookup-events \
  --start-time $(date -u -v-24H +%Y-%m-%dT%H:%M:%SZ) \
  --query 'Events[?contains(EventName, `Policy`)].{Time:EventTime,Event:EventName,User:Username}'

Security group changes:

aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=EventSource,AttributeValue=ec2.amazonaws.com \
  --start-time $(date -u -v-24H +%Y-%m-%dT%H:%M:%SZ) \
  --query 'Events[?contains(EventName, `SecurityGroup`)].{Time:EventTime,Event:EventName,User:Username}'

Post-Incident Actions¶

Immediate (within 1 hour)¶

Contain the incident (rotate credentials, isolate resources)
Preserve evidence (logs, configurations, artifacts)
Notify stakeholders (team lead, security contact)
Document initial findings

Short-term (within 24 hours)¶

Complete root cause analysis
Verify all compromised credentials rotated
Review all related resources for tampering
Check for persistence mechanisms (backdoors, new users/keys)
Update WAF/security group rules as needed

Long-term (within 7 days)¶

Write incident report with timeline
Identify and implement preventive measures
Update runbooks with lessons learned
Review and update monitoring/alerting
Conduct team debrief

Emergency Contacts¶

Role	Contact	When to Escalate
On-call Engineer	[PagerDuty]	P1/P2 incidents
Security Lead	[Contact info]	All P1, suspected breaches
AWS Support	[Support case]	AWS resource compromise
Exchange Support	[Exchange contact]	Trading account compromise

Verification Checklist¶

After security incident resolution:

All compromised credentials rotated
Unauthorized access terminated
No persistent backdoors found
CloudTrail showing normal activity
VPC Flow Logs clean
Services operating normally
Monitoring alerts configured for recurrence
Incident documented and reported