pulumi-drift-detector¶
Monitors infrastructure drift by running pulumi preview --expect-no-changes against each configured Pulumi stack and alerting when drift is detected.
Overview¶
| Property | Value |
|---|---|
| Trigger | EventBridge scheduled event |
| Runtime | Python 3.11 |
| Timeout | 300 seconds |
| Memory | 512 MB |
| Settings class | PulumiDriftSettings |
How It Works¶
The handler runs the Pulumi CLI directly via subprocess.run() (not an S3 state check). For each configured stack it:
- Logs in to the S3 backend with
pulumi login {PULUMI_BACKEND_URL} - Runs
pulumi preview --stack {stack_name} --expect-no-changes --json --non-interactive - Parses the JSON output to count resource changes by operation (
create,update,delete,same) - A non-zero exit code from
pulumi previewindicates drift
Both commands run with a working directory of /var/task/infra and environment variables including PULUMI_CONFIG_PASSPHRASE (from Secrets Manager) and PULUMI_SKIP_UPDATE_CHECK=true.
Input Schema¶
{
"stacks": ["tradai-foundation-prod", "tradai-compute-prod"], # Override default stacks
"dry_run": false # If true, skip Pulumi operations (for testing)
}
Output Schema¶
{
"success": true,
"data": {
"summary": {
"stacks_checked": 2,
"drifted": 1,
"errors": 0
},
"results": [
{
"stack_name": "tradai-foundation-prod",
"status": "checked",
"has_drift": true,
"resources_to_create": 0,
"resources_to_update": 2,
"resources_to_delete": 0,
"resources_unchanged": 45,
"drift_details": "{...}",
"timestamp": "2024-02-07T12:00:00+00:00"
}
]
},
"environment": "dev"
}
Error Results¶
Individual stack checks that fail return error results without halting other stacks:
{
"stack_name": "tradai-compute-prod",
"status": "error",
"error": "Pulumi preview timed out",
"timestamp": "2024-02-07T12:05:00+00:00"
}
Settings: PulumiDriftSettings¶
Extends DynamoDBSettings with Pulumi-specific configuration.
| Setting | Env Var | Default | Description |
|---|---|---|---|
pulumi_backend_url | PULUMI_BACKEND_URL | - | S3 backend URL for Pulumi state |
pulumi_config_passphrase_secret_arn | PULUMI_CONFIG_PASSPHRASE_SECRET_ARN | - | Secrets Manager ARN for Pulumi passphrase |
stacks_to_check | STACKS_TO_CHECK | dev,staging,prod | Comma-separated list of stack names |
alert_on_drift | ALERT_ON_DRIFT | true | Whether to send SNS alerts on drift |
dynamodb_table_name | INFRA_DRIFT_STATE_TABLE | - | State table for drift tracking |
Plus inherited LambdaSettings fields: ENVIRONMENT, SNS_ALERTS_TOPIC_ARN, LOG_LEVEL.
Stack Filtering¶
Stacks to check are determined by: 1. event["stacks"] if provided (overrides settings) 2. settings.get_stacks() which parses STACKS_TO_CHECK comma-separated list
Pulumi Passphrase¶
Retrieved from AWS Secrets Manager using the ARN in PULUMI_CONFIG_PASSPHRASE_SECRET_ARN. If retrieval fails, the handler returns an error response without checking any stacks.
Drift State Tracking¶
Uses DynamoDBStateRepository with InfraDriftState entity to track drift state per stack:
InfraDriftState(
stack_name="dev",
has_drift=True,
resources_to_create=0,
resources_to_update=2,
resources_to_delete=0,
resources_unchanged=45,
drift_detected_at="2024-02-07T12:00:00+00:00",
last_check="2024-02-07T12:00:00+00:00",
drift_details="{...}"
)
Alert deduplication: Only alerts on transition from no-drift to drifted state (prevents repeated alerts for the same drift). If state tracking fails, the alert is still sent as a safety measure.
SNS Alert Logic¶
When drift is newly detected and alert_on_drift=True:
- Subject:
[{ENV}] Infrastructure Drift Detected: {stack_name} - Body includes resource change counts and remediation recommendations
- Message attributes:
stack,environment,drift_type=infrastructure
CloudWatch Metrics¶
Namespace suffix: InfraDrift
| Metric | Dimensions | Description |
|---|---|---|
DriftDetected | Stack, Environment | 1.0 if drift detected, 0.0 otherwise |
ResourcesToCreate | Stack, Environment | Count of resources to create |
ResourcesToUpdate | Stack, Environment | Count of resources to update |
ResourcesToDelete | Stack, Environment | Count of resources to delete |
CheckSuccess | Stack, Environment | 1.0 if check succeeded, 0.0 on error |
EventBridge Schedule¶
{
"ScheduleExpression": "rate(6 hours)",
"Targets": [{
"Arn": "arn:aws:lambda:...:pulumi-drift-detector",
"Input": "{}"
}]
}
See Also¶
Related Lambdas:
- Cleanup Resources - Cleans up orphaned infrastructure
- Orphan Scanner - Scans for orphaned cloud resources
- Health Check - Infrastructure health monitoring
Architecture:
- Pulumi Code - Infrastructure as Code
- Architecture Overview - System design
Guides:
- Pulumi Operations - Pulumi operational runbooks
- Infrastructure Issues - Infrastructure incident response