Audience: SecOps, SRE
Outcomes: Detections that catch real issues, not every blip
High-value detections
Credential stuffing: multiple failed logins from new ASN → alert & temporary CAPTCHA.
Privilege change anomalies: role escalations outside maintenance window → page.
Webhook signature spike: potential key leak → rotate & quarantine.
Export spikes: potential exfiltration → freeze exports & review.
Example KQL/SPL (pseudocode)
where action == "auth.failed" | stats count() by ip, asn, 5m | where count > 20 and asn is new where action == "rbac.change" | where hour not in maint_window where action == "webhook.sig_fail" | stats count() by provider, 10m | where count > threshold
Alert hygiene
Page only when user-visible or money-impacting.
Everything else → ticket or Slack with cooldowns.
QA checklist
Simulate each rule; ensure one alert, not a storm.
Runbook link included in every alert.
Runbook: webhook secret leak
Rotate secret;
Temporarily ignore invalid signature alerts for old secret;
Re-verify recent events;
Post-mortem with timeline.