Webhook Retry Mechanism

Overview

We attempt to deliver each webhook message based on a retry schedule with exponential backoff. This ensures that temporary failures (like network issues or brief service outages) don't result in permanently lost webhook events.

Retry Schedule

Each message is attempted based on the following schedule, where each period is started following the failure of the preceding attempt:

  1. Immediately (first attempt)
  2. 5 seconds after first failure
  3. 5 minutes after second failure
  4. 30 minutes after third failure
  5. 2 hours after fourth failure
  6. 5 hours after fifth failure
  7. 10 hours after sixth failure
  8. 10 hours after seventh failure (final attempt)

Example Timeline

An attempt that fails three times before eventually succeeding will be delivered roughly 35 minutes and 5 seconds following the first attempt:

  • T+0: First attempt fails
  • T+5s: Second attempt fails
  • T+5m5s: Third attempt fails
  • T+35m5s: Fourth attempt succeeds ✓

What Triggers a Retry?

A webhook delivery is considered failed and will be retried if:

HTTP Response Codes

  • 4xx errors (except 410 Gone) - Client errors like 400, 404, 429
  • 5xx errors - Server errors like 500, 502, 503, 504
  • Network timeouts - No response within 15 seconds
  • Connection failures - DNS resolution failures, connection refused, etc.

Special Case: 410 Gone

  • 410 Gone responses are treated as permanent failures and will not be retried
  • Use this response code when you want to permanently disable webhook delivery to an endpoint

Success Indicators

A webhook delivery is considered successful when:

  • 2xx status codes (200-299) are returned
  • Response is received within 15 seconds

Important: We interpret any 2xx response as successful delivery, even if your response payload indicates a failure. Make sure to use the correct HTTP status codes to control retry behavior.

Automatic Endpoint Disabling

If all delivery attempts to a specific endpoint fail continuously for 5 consecutive days, the endpoint will be automatically disabled to prevent further failed attempts.

When an endpoint is disabled:

  • ❌ No new webhook deliveries will be attempted
  • ❌ The endpoint will not receive any events until manually re-enabled
  • ✅ You'll be notified about the disabled endpoint
  • ✅ You can re-enable it manually from the dashboard

Re-enabling a Disabled Endpoint

To re-enable a disabled endpoint:

  1. Go to the webhook dashboard
  2. Find the disabled endpoint in the list
  3. Click on the endpoint
  4. Select "Enable Endpoint" from the options menu

Manual Retries and Recovery

Single Message Retry

If you want to replay a specific event:

  1. Find the message in the webhook dashboard UI
  2. Click the options menu (⋯) next to any of the delivery attempts
  3. Click "Resend" to send the same message to your endpoint again

Bulk Recovery Options


Option 1: Recover All Failed Messages Since Date

  1. Go to your endpoint's details page
  2. Click "Options" → "Recover Failed Messages"
  3. Choose a time window to recover from
  4. All failed messages in that timeframe will be retried

Option 2: Recover from Specific Message

  1. Find any message on the endpoint page
  2. Click the options menu (⋯) next to the message
  3. Click "Replay..."
  4. Choose "Replay all failed messages since this time"

This method gives you more granular control over exactly which messages to retry.


Keep track of your webhook endpoint health by monitoring:

  • Success rate - Percentage of webhooks that succeed on first attempt
  • Retry rate - Percentage of webhooks that require retries
  • Average processing time - How long your endpoint takes to respond
  • Error patterns - Common error types and their frequency
  • Endpoint availability - Uptime of your webhook endpoints

Regular monitoring helps you identify and fix issues before they cause webhook endpoints to be disabled.