Skip to content

Incident Response Runbook

Operational companion to the Incident Response Plan. GitHub Issues are the system of record for all incident data. Slack is the coordination and post-mortem authoring surface. Incident channels: #incidents (live coordination), #security-alerts (automated alerts), #incident-YYYY-MM-DD-brief (dedicated SEV-1/2).


GCP Cloud Monitoring creates alerts automatically when uptime checks or alert policies detect a failure. Notification channels route alerts to Slack #incidents — no manual severity assignment needed for monitored systems.

Upon alert firing, GCP Cloud Monitoring:

  1. Evaluates alert policy conditions and fires the notification channel
  2. Posts to #incidents via Slack notification channel
  3. On-call responder acknowledges and begins response

User-reported incidents not detected by monitoring: Open a GitHub Issue using the Incident Response template with title [SEV-?] Brief description.

ObservationSeverity
Confirmed data breach / active exploitation / complete outageSEV-1
Partial outage / suspected compromise / critical vuln in the wildSEV-2
Degraded performance / single account compromise / failed brute-forceSEV-3
Suspicious but unconfirmed / policy violation / informationalSEV-4

For monitored incidents, severity is set by Catalog metadata. For user-reported incidents, assign within 15 min (SEV-1/2) or 1 hour (SEV-3/4).

SourceWhere to Check
GCP Cloud Monitoringconsole.cloud.google.com → Monitoring → Alerting
CrowdStrikefalcon.crowdstrike.com → Detections
Google Workspaceadmin.google.com → Reporting → Audit
GitHubgithub.com → Security → Code scanning
1Password1password.com → Watchtower
Manual report#security-alerts Slack
  1. Acknowledge the alert in the #incidents Slack thread.
  2. Open the GCP Cloud Monitoring alert link for full context on the affected system.
  3. Begin Slack thread in #incidents for SEV-1/2/3 coordination.

GCP Cloud Monitoring alert policies handle primary notification routing via Slack and email channels. The table below describes what happens automatically and what requires human action.

SeverityAutomatic (GCP Cloud Monitoring + Slack)Human Action
SEV-1Slack #incidents notification + CISO follow-up after 15 minOpen #incident-YYYY-MM-DD-brief; phone CTO; if customer data at risk, call legal
SEV-2Slack #incidents notification + CISO follow-up after 15 minPost updates every 2 hours in #incidents thread
SEV-3Slack #incidents notificationPost daily update in #incidents thread
SEV-4None automaticCISO logs in GitHub Issue if tracking is warranted; no broader notification

Status update format for #incidents thread:

[TIME] Status: [Investigating / Contained / Remediating / Resolved]
What we know: [1-2 sentences]
Current action: [what is happening now]
Next update: [time]

Cadence: SEV-1 every 30 min, SEV-2 every 2 hours, SEV-3 daily.

Only CTO or CISO may authorize external communications. All must be reviewed by legal before release.

EventActionTiming
Customer data breachCTO + Legal via emailWithin 72 hours of confirmation
Status page updateCTO or CISO authorizes update via GCP Cloud Monitoring status dashboardWhen customer-visible impact occurs
Regulatory notificationLegal leads, coordinated with CISOPer applicable regulation

Preserve evidence (Section 4) before destructive containment actions.

  • Account compromise: Admin console → Users → Suspend; revoke all active sessions; reset password; export 30-day audit log.
  • Unauthorized OAuth grants: Security → API controls → App access control → revoke suspicious grants.
  • Compromised PAT or deploy key: Revoke immediately at github.com/settings/tokens or repo → Deploy keys. Check audit log for actions taken.
  • If GH_PAT_ADMIN is compromised: Revoke, generate new, doppler secrets set GH_PAT_ADMIN -p m7-security -c prd, verify evidence workflows.
  • Malicious workflow: Disable via GitHub UI (Actions → workflow → disable); review recent runs.
  • Endpoint threat: Endpoint Security → Detections → review details → Hosts → “Contain host” (isolates from network, preserves forensics). Export detection report. Do not reimage until investigation is complete.
  • Active attack: Toggle “Under Attack Mode” → Security → WAF → create block rule. Export security events before making changes.
  • DNS tampering: DNS → compare to known-good state → revert unauthorized changes. If token compromised: revoke, create replacement, doppler secrets set CF_SECURITY_API_TOKEN -p m7-security -c prd.
  • Compromised account: Admin console → People → Suspend; review vault access; rotate all credentials the account could reach.
  • Service account compromise: Admin console → Service Accounts → revoke token → generate replacement.
  • Compromised secret: Rotate at source → doppler secrets set SECRET_NAME -p m7-security -c prd. Check Activity log for unauthorized reads.
  • Compromised DOPPLER_SA_TOKEN: Revoke in Workplace → Service Accounts; generate new; update DOPPLER_TOKEN in GitHub Actions secrets; rotate all secrets it had read access to.
  • Unauthorized database access: Roll service_role key in API Settings → doppler secrets set SB_SERVICE_ROLE_KEY -p m7-security -c prd. Check auth.audit_log_entries and Postgres logs.
  • Leaked anon key: Dashboard → API Settings → Roll anon key. Updates all application deployments immediately.
  • Emergency lockdown: Settings → API → disable “Enable API” (takes application offline; last resort only).
  • Compromised deployment: GCP Console → Cloud Run → web service → Revisions → select last known-good → route 100% traffic. Check Cloud Logging for the service for request/error logs.
  • Compromised service account: IAM → locate service account → remove all role bindings → create replacement → update Doppler.
  • Compromised deployment: GCP Console → Cloud Run → agent service → Revisions → select previous known-good → route 100% traffic.
  • Compromised service account: IAM → locate service account → remove all role bindings → create replacement → update Doppler.

Capture within the first hour, before destructive containment. Nightly automation collects GCP Cloud Monitoring alert data (policies, notification channels, uptime check state) — manual collection below is for system-specific artifacts and mid-incident snapshots.

  • Screenshots of triggering alert with timestamps
  • Current state of affected system
  • Log exports (at least 24 hours before the alert)
  • GitHub audit log (if access-related)
  • Doppler activity log (if secrets may be compromised)
SystemLog LocationExport Method
GCP Cloud Monitoringconsole.cloud.google.com → Monitoring → AlertingNightly automated collection via API
GCP Cloud Loggingconsole.cloud.google.com → Logging → Log ExplorerFilter by service, export to Cloud Storage or CSV
CrowdStrikefalcon.crowdstrike.com → DetectionsExport detection report
Google Workspaceadmin.google.com → Reporting → AuditExport to CSV
GitHubgithub.com/orgs/Meridian7-io/audit-loggh api /orgs/Meridian7-io/audit-log
Cloudflaredash.cloudflare.com → Security → EventsExport security events
SupabaseSupabase dashboard → LogsSQL query + screenshot
Dopplerdoppler.com → ActivityScreenshot

Store incident-specific artifacts in evidence/incidents/YYYY/YYYY-MM-DD-incident-brief/. Naming: YYYY-MM-DD-HH-MM-[system]-[artifact-type].[ext]. Commit with a signed commit.


SeverityRequiredDue
SEV-1Yes5 business days
SEV-2Yes5 business days
SEV-3CISO discretion10 business days if written
SEV-4No

For SEV-1/2, after resolution the incident-postmortem Edge Function receives a GCP Cloud Monitoring notification and posts a “Write Post-Mortem” button to the #incidents Slack thread. Clicking it opens a structured modal. The submitted content is stored as a GitHub incident issue comment — this is the system-of-record entry picked up by nightly evidence automation.

Template for the modal (and for manually authored post-mortems):

Incident: [Title]
Date: YYYY-MM-DD  Severity: SEV-[N]  Duration: HH:MM
Incident Commander: [Role]  Responders: [Roles]

Summary
[1-2 sentences: what happened and impact]

Timeline
HH:MM — Alert triggered
HH:MM — Severity assigned
HH:MM — Containment taken
HH:MM — Root cause identified
HH:MM — Remediation complete

Root Cause
Immediate cause: ...  Root cause: ...

Impact
Systems affected: ...
Customer impact: [None / Description]
Data exposed: [None / Description]

What Went Well / What Could Be Improved
- ...

Action Items
[Action] | [Owner] | [Due Date]

Process: draft via Slack modal (incident commander) → submitted as GitHub incident issue comment → share in #incidents for 24h async review → 30-min sync with CTO + CISO (SEV-1/2) → create GitHub issues for all action items → mark incident resolved.


ConditionEscalate ToMethod
On-call no ack in 15 minCISO (backup)Slack DM
Both on-call and backup unreachableNext availablePhone
Active data breach suspectedCTO + LegalPhone immediately

GCP Cloud Monitoring routes SEV-1/2 alerts to Slack #incidents with CISO follow-up after 15 min without acknowledgment. SEV-3/4 routes to Slack #incidents only.

ConditionContactWhen
Confirmed customer data breachLegal counselImmediately
Customer notification requiredLegal + CTOWithin 24h of confirmed breach
Regulatory notificationLegal counselPer applicable regulation
Criminal activity suspectedLegal, then law enforcementAfter legal consultation
Active intrusion beyond internal capabilityCrowdStrike OverwatchAs needed
VendorContact
CrowdStrikesupport.crowdstrike.com
Google Cloud / GCPcloud.google.com/support
Google WorkspaceAdmin console → Support
Cloudflaredash.cloudflare.com → Support
1Passwordsupport.1password.com
Supabasesupport.supabase.com

Meridian Seven — Confidential