Skip to main content

Retries, Poison Events, Logs & Alerts

Keep pipelines moving under failure; detect drift early.

C
Written by Catalin Fetean
Updated over 2 weeks ago

Audience: Backend, SRE, Support
Outcomes: No lost events; quick root cause

Retries & backoff

  • Providers retry on non-2xx → your handler must be idempotent

  • Outbound provider calls: exponential backoff + jitter, capped

Poison events

  • Repeated failures → send to dead-letter queue, alert human

  • Provide “reprocess” in admin tooling

Observability

  • Logs: include x-correlation-id, event type, orderId, provider ref; redact PII/secrets

  • Metrics: webhook throughput/backlog & error rate; SSE clients & drops; payment success by rail

  • Alerts: backlog > threshold; no Stripe events for X minutes; release failures > N/hr; unexpected drop in SSE clients

QA checklist

  • Synthetic heartbeat events trigger expected alerts

  • Dead-lettered events are visible and reprocessable

Runbook: “Webhook 2xx but no change”

  • Handler swallowed error: raise log level; verify DB write in txn; add unit test.

Did this answer your question?