Incident Response Runbook
Overview
Section titled “Overview”Operational companion to the Incident Response Plan. GitHub Issues are the system of record for all incident data. Slack is the coordination and post-mortem authoring surface. Incident channels: #incidents (live coordination), #security-alerts (automated alerts), #incident-YYYY-MM-DD-brief (dedicated SEV-1/2).
1. Detection and Triage
Section titled “1. Detection and Triage”1.1 How Incidents Are Created
Section titled “1.1 How Incidents Are Created”GCP Cloud Monitoring creates alerts automatically when uptime checks or alert policies detect a failure. Notification channels route alerts to Slack #incidents — no manual severity assignment needed for monitored systems.
Upon alert firing, GCP Cloud Monitoring:
- Evaluates alert policy conditions and fires the notification channel
- Posts to
#incidentsvia Slack notification channel - On-call responder acknowledges and begins response
User-reported incidents not detected by monitoring: Open a GitHub Issue using the Incident Response template with title [SEV-?] Brief description.
1.2 Severity Reference
Section titled “1.2 Severity Reference”| Observation | Severity |
|---|---|
| Confirmed data breach / active exploitation / complete outage | SEV-1 |
| Partial outage / suspected compromise / critical vuln in the wild | SEV-2 |
| Degraded performance / single account compromise / failed brute-force | SEV-3 |
| Suspicious but unconfirmed / policy violation / informational | SEV-4 |
For monitored incidents, severity is set by Catalog metadata. For user-reported incidents, assign within 15 min (SEV-1/2) or 1 hour (SEV-3/4).
1.3 Alert Sources
Section titled “1.3 Alert Sources”| Source | Where to Check |
|---|---|
| GCP Cloud Monitoring | console.cloud.google.com → Monitoring → Alerting |
| CrowdStrike | falcon.crowdstrike.com → Detections |
| Google Workspace | admin.google.com → Reporting → Audit |
| GitHub | github.com → Security → Code scanning |
| 1Password | 1password.com → Watchtower |
| Manual report | #security-alerts Slack |
1.4 First Response (first 5 minutes)
Section titled “1.4 First Response (first 5 minutes)”- Acknowledge the alert in the
#incidentsSlack thread. - Open the GCP Cloud Monitoring alert link for full context on the affected system.
- Begin Slack thread in
#incidentsfor SEV-1/2/3 coordination.
2. Communication
Section titled “2. Communication”2.1 Internal Notification
Section titled “2.1 Internal Notification”GCP Cloud Monitoring alert policies handle primary notification routing via Slack and email channels. The table below describes what happens automatically and what requires human action.
| Severity | Automatic (GCP Cloud Monitoring + Slack) | Human Action |
|---|---|---|
| SEV-1 | Slack #incidents notification + CISO follow-up after 15 min | Open #incident-YYYY-MM-DD-brief; phone CTO; if customer data at risk, call legal |
| SEV-2 | Slack #incidents notification + CISO follow-up after 15 min | Post updates every 2 hours in #incidents thread |
| SEV-3 | Slack #incidents notification | Post daily update in #incidents thread |
| SEV-4 | None automatic | CISO logs in GitHub Issue if tracking is warranted; no broader notification |
Status update format for #incidents thread:
Cadence: SEV-1 every 30 min, SEV-2 every 2 hours, SEV-3 daily.
2.2 External Communication
Section titled “2.2 External Communication”Only CTO or CISO may authorize external communications. All must be reviewed by legal before release.
| Event | Action | Timing |
|---|---|---|
| Customer data breach | CTO + Legal via email | Within 72 hours of confirmation |
| Status page update | CTO or CISO authorizes update via GCP Cloud Monitoring status dashboard | When customer-visible impact occurs |
| Regulatory notification | Legal leads, coordinated with CISO | Per applicable regulation |
3. Containment by System
Section titled “3. Containment by System”Preserve evidence (Section 4) before destructive containment actions.
3.1 Google Workspace
Section titled “3.1 Google Workspace”- Account compromise: Admin console → Users → Suspend; revoke all active sessions; reset password; export 30-day audit log.
- Unauthorized OAuth grants: Security → API controls → App access control → revoke suspicious grants.
3.2 GitHub
Section titled “3.2 GitHub”- Compromised PAT or deploy key: Revoke immediately at github.com/settings/tokens or repo → Deploy keys. Check audit log for actions taken.
- If
GH_PAT_ADMINis compromised: Revoke, generate new,doppler secrets set GH_PAT_ADMIN -p m7-security -c prd, verify evidence workflows. - Malicious workflow: Disable via GitHub UI (Actions → workflow → disable); review recent runs.
3.3 CrowdStrike
Section titled “3.3 CrowdStrike”- Endpoint threat: Endpoint Security → Detections → review details → Hosts → “Contain host” (isolates from network, preserves forensics). Export detection report. Do not reimage until investigation is complete.
3.4 Cloudflare
Section titled “3.4 Cloudflare”- Active attack: Toggle “Under Attack Mode” → Security → WAF → create block rule. Export security events before making changes.
- DNS tampering: DNS → compare to known-good state → revert unauthorized changes. If token compromised: revoke, create replacement,
doppler secrets set CF_SECURITY_API_TOKEN -p m7-security -c prd.
3.5 1Password
Section titled “3.5 1Password”- Compromised account: Admin console → People → Suspend; review vault access; rotate all credentials the account could reach.
- Service account compromise: Admin console → Service Accounts → revoke token → generate replacement.
3.6 Doppler
Section titled “3.6 Doppler”- Compromised secret: Rotate at source →
doppler secrets set SECRET_NAME -p m7-security -c prd. Check Activity log for unauthorized reads. - Compromised
DOPPLER_SA_TOKEN: Revoke in Workplace → Service Accounts; generate new; updateDOPPLER_TOKENin GitHub Actions secrets; rotate all secrets it had read access to.
3.7 Supabase
Section titled “3.7 Supabase”- Unauthorized database access: Roll
service_rolekey in API Settings →doppler secrets set SB_SERVICE_ROLE_KEY -p m7-security -c prd. Check auth.audit_log_entries and Postgres logs. - Leaked
anonkey: Dashboard → API Settings → Roll anon key. Updates all application deployments immediately. - Emergency lockdown: Settings → API → disable “Enable API” (takes application offline; last resort only).
3.8 GCP Cloud Run (web app)
Section titled “3.8 GCP Cloud Run (web app)”- Compromised deployment: GCP Console → Cloud Run → web service → Revisions → select last known-good → route 100% traffic. Check Cloud Logging for the service for request/error logs.
- Compromised service account: IAM → locate service account → remove all role bindings → create replacement → update Doppler.
3.9 GCP Cloud Run (agents)
Section titled “3.9 GCP Cloud Run (agents)”- Compromised deployment: GCP Console → Cloud Run → agent service → Revisions → select previous known-good → route 100% traffic.
- Compromised service account: IAM → locate service account → remove all role bindings → create replacement → update Doppler.
4. Evidence Preservation
Section titled “4. Evidence Preservation”Capture within the first hour, before destructive containment. Nightly automation collects GCP Cloud Monitoring alert data (policies, notification channels, uptime check state) — manual collection below is for system-specific artifacts and mid-incident snapshots.
- Screenshots of triggering alert with timestamps
- Current state of affected system
- Log exports (at least 24 hours before the alert)
- GitHub audit log (if access-related)
- Doppler activity log (if secrets may be compromised)
| System | Log Location | Export Method |
|---|---|---|
| GCP Cloud Monitoring | console.cloud.google.com → Monitoring → Alerting | Nightly automated collection via API |
| GCP Cloud Logging | console.cloud.google.com → Logging → Log Explorer | Filter by service, export to Cloud Storage or CSV |
| CrowdStrike | falcon.crowdstrike.com → Detections | Export detection report |
| Google Workspace | admin.google.com → Reporting → Audit | Export to CSV |
| GitHub | github.com/orgs/Meridian7-io/audit-log | gh api /orgs/Meridian7-io/audit-log |
| Cloudflare | dash.cloudflare.com → Security → Events | Export security events |
| Supabase | Supabase dashboard → Logs | SQL query + screenshot |
| Doppler | doppler.com → Activity | Screenshot |
Store incident-specific artifacts in evidence/incidents/YYYY/YYYY-MM-DD-incident-brief/. Naming: YYYY-MM-DD-HH-MM-[system]-[artifact-type].[ext]. Commit with a signed commit.
5. Post-Incident Review
Section titled “5. Post-Incident Review”5.1 Postmortem Requirement
Section titled “5.1 Postmortem Requirement”| Severity | Required | Due |
|---|---|---|
| SEV-1 | Yes | 5 business days |
| SEV-2 | Yes | 5 business days |
| SEV-3 | CISO discretion | 10 business days if written |
| SEV-4 | No | — |
5.2 Postmortem Authoring Flow
Section titled “5.2 Postmortem Authoring Flow”For SEV-1/2, after resolution the incident-postmortem Edge Function receives a GCP Cloud Monitoring notification and posts a “Write Post-Mortem” button to the #incidents Slack thread. Clicking it opens a structured modal. The submitted content is stored as a GitHub incident issue comment — this is the system-of-record entry picked up by nightly evidence automation.
Template for the modal (and for manually authored post-mortems):
Process: draft via Slack modal (incident commander) → submitted as GitHub incident issue comment → share in #incidents for 24h async review → 30-min sync with CTO + CISO (SEV-1/2) → create GitHub issues for all action items → mark incident resolved.
6. Escalation Matrix
Section titled “6. Escalation Matrix”Internal
Section titled “Internal”| Condition | Escalate To | Method |
|---|---|---|
| On-call no ack in 15 min | CISO (backup) | Slack DM |
| Both on-call and backup unreachable | Next available | Phone |
| Active data breach suspected | CTO + Legal | Phone immediately |
GCP Cloud Monitoring routes SEV-1/2 alerts to Slack #incidents with CISO follow-up after 15 min without acknowledgment. SEV-3/4 routes to Slack #incidents only.
External
Section titled “External”| Condition | Contact | When |
|---|---|---|
| Confirmed customer data breach | Legal counsel | Immediately |
| Customer notification required | Legal + CTO | Within 24h of confirmed breach |
| Regulatory notification | Legal counsel | Per applicable regulation |
| Criminal activity suspected | Legal, then law enforcement | After legal consultation |
| Active intrusion beyond internal capability | CrowdStrike Overwatch | As needed |
Vendor Emergency Contacts
Section titled “Vendor Emergency Contacts”| Vendor | Contact |
|---|---|
| CrowdStrike | support.crowdstrike.com |
| Google Cloud / GCP | cloud.google.com/support |
| Google Workspace | Admin console → Support |
| Cloudflare | dash.cloudflare.com → Support |
| 1Password | support.1password.com |
| Supabase | support.supabase.com |
Meridian Seven — Confidential