Audience: Owners, Admins, SRE, Support
Outcomes: Repeatable diligence; smooth incident handling
Incident playbooks (quick map)
Credential leak: force logout; rotate; notify; post-mortem
Webhook flood: throttle workers; ensure dedupe; drain queue
Duplicate payout: check release reference constraint; reconcile; issue refund/adjustment
SSE outage: disable proxy buffering; verify CORS/credentials; restart stream workers
Common fixes
CORS error → add exact origin (never
*
with credentials)“Invalid signature” → ensure raw body + correct secret; check clock skew
“Pending forever” → replay webhook; handler logs; DLQ reprocess
Excess 401/403 → session expiry or KYC gate; policy logs
QA checklist
Dry-run incident drills at least once per quarter with timing captured.
Every P1 produces a written post-mortem with actions & owners.