Security Incident Response Runbook¶
Procedures for handling security incidents including credential exposure, unauthorized access, and security policy violations.
Severity Classification¶
| Level | Description | Response Time | Examples |
|---|---|---|---|
| P1 | Active breach, data exposure | Immediate (<15 min) | Compromised AWS creds, unauthorized trading |
| P2 | Suspected compromise | < 1 hour | Suspicious API activity, failed auth spike |
| P3 | Policy violation | < 4 hours | Exposed secrets in logs, missing encryption |
API Key/Secret Exposure¶
Symptoms¶
- Secrets found in logs, code commits, or public repositories
- Unexpected API calls from unknown sources
- Exchange reporting unusual activity
Immediate Actions (P1)¶
-
Identify exposed credentials:
-
Rotate exchange API keys immediately:
# 1. Create new API key on exchange (manually via exchange UI) # 2. Update Secrets Manager aws secretsmanager put-secret-value \ --secret-id tradai/${ENVIRONMENT}/exchange-keys \ --secret-string '{"api_key":"NEW_KEY","api_secret":"NEW_SECRET"}' # 3. Restart services to pick up new credentials aws ecs update-service \ --cluster tradai-${ENVIRONMENT} \ --service tradai-strategy-service-${ENVIRONMENT} \ --force-new-deployment -
Revoke old exchange keys (via exchange UI)
-
Check for unauthorized activity:
AWS Credential Rotation¶
If AWS credentials exposed:
-
Disable compromised credentials:
-
Create new credentials:
-
Review CloudTrail for unauthorized actions:
-
Delete compromised credentials (after verification):
JWT Token Invalidation¶
If JWT signing key compromised:
-
Rotate Cognito app client secret:
-
Force sign-out all users:
Unauthorized Access Detected¶
Symptoms¶
- Unexpected IAM activity in CloudTrail
- Unknown IP addresses in access logs
- Privilege escalation attempts
- Resource creation outside normal patterns
Diagnosis¶
-
Review CloudTrail events:
-
Check for new IAM users/roles:
-
Check API Gateway access logs:
ECS Task Isolation¶
If a container is suspected compromised:
-
Stop the suspicious task:
-
Update security group to deny all egress (preserve for forensics):
-
Capture task logs for forensics:
Network Security Breach¶
Symptoms¶
- Unexpected outbound connections
- Data exfiltration attempts in VPC Flow Logs
- Unusual traffic patterns
Diagnosis¶
-
Check VPC Flow Logs:
-
Look for unexpected destinations:
# Check for traffic to unexpected IPs aws logs filter-log-events \ --log-group-name /aws/vpc/flowlogs/tradai-${ENVIRONMENT} \ --start-time $(date -u -v-1H +%s)000 \ --filter-pattern '[version, accountid, interfaceid, srcaddr, dstaddr != 10.*, srcport, dstport, protocol, packets, bytes, start, end, action=ACCEPT, logstatus]'
Containment¶
-
Block suspicious IPs via WAF:
-
Update security group to block IP:
CloudTrail Audit Analysis¶
Common Queries¶
Failed API calls:
aws cloudtrail lookup-events \
--lookup-attributes AttributeKey=ReadOnly,AttributeValue=false \
--start-time $(date -u -v-24H +%Y-%m-%dT%H:%M:%SZ) \
--query 'Events[?ErrorCode!=`null`].{Time:EventTime,Event:EventName,Error:ErrorCode,User:Username}'
Root account activity (should be rare):
aws cloudtrail lookup-events \
--lookup-attributes AttributeKey=Username,AttributeValue=root \
--start-time $(date -u -v-7d +%Y-%m-%dT%H:%M:%SZ)
IAM policy changes:
aws cloudtrail lookup-events \
--start-time $(date -u -v-24H +%Y-%m-%dT%H:%M:%SZ) \
--query 'Events[?contains(EventName, `Policy`)].{Time:EventTime,Event:EventName,User:Username}'
Security group changes:
aws cloudtrail lookup-events \
--lookup-attributes AttributeKey=EventSource,AttributeValue=ec2.amazonaws.com \
--start-time $(date -u -v-24H +%Y-%m-%dT%H:%M:%SZ) \
--query 'Events[?contains(EventName, `SecurityGroup`)].{Time:EventTime,Event:EventName,User:Username}'
Post-Incident Actions¶
Immediate (within 1 hour)¶
- [ ] Contain the incident (rotate credentials, isolate resources)
- [ ] Preserve evidence (logs, configurations, artifacts)
- [ ] Notify stakeholders (team lead, security contact)
- [ ] Document initial findings
Short-term (within 24 hours)¶
- [ ] Complete root cause analysis
- [ ] Verify all compromised credentials rotated
- [ ] Review all related resources for tampering
- [ ] Check for persistence mechanisms (backdoors, new users/keys)
- [ ] Update WAF/security group rules as needed
Long-term (within 7 days)¶
- [ ] Write incident report with timeline
- [ ] Identify and implement preventive measures
- [ ] Update runbooks with lessons learned
- [ ] Review and update monitoring/alerting
- [ ] Conduct team debrief
Emergency Contacts¶
| Role | Contact | When to Escalate |
|---|---|---|
| On-call Engineer | [PagerDuty] | P1/P2 incidents |
| Security Lead | [Contact info] | All P1, suspected breaches |
| AWS Support | [Support case] | AWS resource compromise |
| Exchange Support | [Exchange contact] | Trading account compromise |
Verification Checklist¶
After security incident resolution:
- [ ] All compromised credentials rotated
- [ ] Unauthorized access terminated
- [ ] No persistent backdoors found
- [ ] CloudTrail showing normal activity
- [ ] VPC Flow Logs clean
- [ ] Services operating normally
- [ ] Monitoring alerts configured for recurrence
- [ ] Incident documented and reported