Troubleshooting & Failure Recovery
Common Webhook Failures
There are some common reasons why your webhook endpoint might be failing. Understanding these issues can help you debug and fix problems quickly.
Most Common Issues
1. Not Using the Raw Payload Body
This is the most common issue. When generating the signed content, we use the raw string body of the message payload.
Problem: If you convert JSON payloads into strings using methods like JSON.stringify()
, different implementations may produce different string representations of the JSON object, which can lead to discrepancies when verifying the signature.
Solution: It's crucial to verify the payload exactly as it was sent, byte-for-byte or string-for-string, to ensure accurate verification.
// ❌ WRONG - This will cause signature verification to fail
@Post('webhooks')
async handleWebhook(@Body() body: any, @Headers() headers: any) {
const payload = JSON.stringify(body); // Don't do this!
this.verifySignature(payload, headers);
}
// ✅ CORRECT - Use raw body for signature verification
@Post('webhooks')
async handleWebhook(@Req() req: RawBodyRequest<Request>, @Headers() headers: any) {
const payload = req.rawBody.toString(); // Use raw body
this.verifySignature(payload, headers);
}
2. Missing or Wrong Secret Key
Problem: Using the incorrect secret key or forgetting to configure it entirely.
Solution: Remember that signing secrets are unique to each endpoint. Double-check your endpoint's signing secret in the webhook dashboard.
// ❌ WRONG - Using a hardcoded or wrong secret
const secret = "wrong-secret-key";
// ✅ CORRECT - Get secret from environment/config
const secret = process.env.WEBHOOK_SECRET; // From your endpoint configuration
if (!secret) {
throw new Error('WEBHOOK_SECRET environment variable is required');
}
3. Sending the Wrong Response Codes
Problem: When we receive a response with a 2xx status code, we interpret that as a successful delivery even if you indicate a failure in the response payload.
Solution: Make sure to use the correct HTTP response status codes to control retry behavior.
@Post('webhooks')
async handleWebhook(@Body() event: WebhookEvent) {
try {
await this.processEvent(event);
// ✅ Success - return 2xx status
return { received: true };
} catch (error) {
// ❌ WRONG - Don't return 200 with error in payload
// return { error: 'Processing failed' }; // Still returns 200!
// ✅ CORRECT - Return appropriate error status codes
if (error.code === 'PERMANENT_FAILURE') {
throw new HttpException('Cannot process', HttpStatus.GONE); // 410 - Don't retry
} else {
throw new HttpException('Temporary failure', HttpStatus.INTERNAL_SERVER_ERROR); // 500 - Retry
}
}
}
4. Response Timeouts
Problem: We will consider any message that fails to send a response within 15 seconds a failed message.
Solution: If your endpoint is processing complicated workflows, it may timeout and result in failed messages. We suggest having your endpoint simply receive the message and add it to a queue to be processed asynchronously so you can respond promptly and avoid getting timed out.
@Post('webhooks')
async handleWebhook(@Body() event: WebhookEvent) {
try {
// ✅ CORRECT - Quick validation and queue for async processing
this.validateEvent(event);
// Add to queue for async processing
await this.webhookQueue.add('process-chargeback', event, {
delay: 0,
attempts: 3,
backoff: 'exponential'
});
// Respond immediately
return { received: true, queued: true };
} catch (error) {
console.error('Webhook queuing failed:', error);
throw new HttpException('Failed to queue webhook', HttpStatus.INTERNAL_SERVER_ERROR);
}
}
Advanced Troubleshooting
Network and Connectivity Issues
DNS Resolution Problems
# Test if your endpoint is accessible
curl -I https://your-endpoint.com/webhooks/chargebacks
# Check DNS resolution
nslookup your-endpoint.com
dig your-endpoint.com
SSL/TLS Certificate Issues
# Check SSL certificate validity
openssl s_client -connect your-endpoint.com:443 -servername your-endpoint.com
# Verify certificate chain
curl -vvI https://your-endpoint.com/webhooks/chargebacks
Firewall and Security Groups
- Ensure your server accepts incoming connections on the webhook port
- Check that webhook requests aren't being blocked by firewalls
- Verify security groups allow inbound HTTPS traffic (port 443)
Payload and Parsing Issues
JSON Parsing Errors
@Post('webhooks')
async handleWebhook(@Req() req: Request) {
let event: WebhookEvent;
try {
// Parse JSON safely
event = typeof req.body === 'string'
? JSON.parse(req.body)
: req.body;
} catch (error) {
console.error('JSON parsing failed:', error);
throw new HttpException('Invalid JSON payload', HttpStatus.BAD_REQUEST);
}
// Validate required fields
if (!event.event || !event.data || !event.webhookId) {
throw new HttpException('Missing required fields', HttpStatus.BAD_REQUEST);
}
// Process the event...
}
Character Encoding Issues
// Ensure proper UTF-8 handling
@Post('webhooks')
async handleWebhook(@Req() req: RawBodyRequest<Request>) {
// Make sure to use UTF-8 encoding
const payload = req.rawBody.toString('utf8');
// Process with proper encoding
await this.processWebhook(payload);
}
Database and Persistence Issues
Connection Pool Exhaustion
@Injectable()
export class WebhookService {
constructor(
@InjectRepository(ChargebackEntity)
private chargebackRepo: Repository<ChargebackEntity>
) {}
async processChargeback(data: ChargebackDto) {
// Use transactions for atomic operations
return await this.chargebackRepo.manager.transaction(async (manager) => {
try {
// Your database operations here
const chargeback = await manager.save(ChargebackEntity, data);
// Additional operations...
return chargeback;
} catch (error) {
// Transaction will be rolled back automatically
throw error;
}
});
}
}
Deadlock Prevention
// Process webhooks with proper locking to prevent deadlocks
async processChargebackUpdate(chargebackId: string, data: Partial<ChargebackDto>) {
return await this.chargebackRepo.manager.transaction(async (manager) => {
// Lock the record to prevent concurrent updates
const chargeback = await manager
.createQueryBuilder(ChargebackEntity, 'cb')
.setLock('pessimistic_write')
.where('cb.id = :id', { id: chargebackId })
.getOne();
if (!chargeback) {
throw new Error(`Chargeback ${chargebackId} not found`);
}
// Apply updates
Object.assign(chargeback, data);
return await manager.save(chargeback);
});
}
Failure Recovery Strategies
Re-enable a Disabled Endpoint
If all attempts to a specific endpoint fail for a period of 5 days, the endpoint will be disabled.
To re-enable a disabled endpoint:
- Go to the webhook dashboard
- Find the endpoint from the list
- Select "Enable Endpoint"
Recovering/Resending Failed Messages
Single Message Recovery
If you want to replay a single event:
- Find the message from the UI
- Click the options menu next to any of the attempts
- Click "resend" to have the same message send to your endpoint again
Bulk Recovery from Service Outage
If you need to recover from a service outage and want to replay all events since a given time:
- Go to the Endpoint details page
- Click "Options" → "Recover Failed Messages"
- Choose a time window to recover from
Granular Recovery
For more granular recovery (e.g., if you know the exact timestamp):
- Click the options menu on any message from the endpoint page
- Click "Replay..."
- Choose "Replay all failed messages since this time"
Emergency Procedures
Complete Service Outage Recovery
// 1. Fix your service issues first
// 2. Test with a single webhook to ensure it's working
// 3. Bulk recover failed messages from the outage period
// 4. Monitor recovery progress
@Injectable()
export class WebhookRecoveryService {
async handleRecoveryPeriod() {
// Log recovery start
this.logger.log('Starting webhook recovery process');
// Temporarily increase processing capacity
await this.scaleUpProcessingWorkers();
// Monitor recovery progress
this.startRecoveryMonitoring();
}
private async scaleUpProcessingWorkers() {
// Increase worker concurrency during recovery
this.webhookQueue.concurrency = 10; // Increase from normal 3
}
private startRecoveryMonitoring() {
// Monitor recovery metrics
setInterval(async () => {
const pendingJobs = await this.webhookQueue.waiting();
const failedJobs = await this.webhookQueue.failed();
this.logger.log(`Recovery progress: ${pendingJobs} pending, ${failedJobs} failed`);
// Alert if recovery is stalling
if (pendingJobs > 1000) {
await this.alertingService.sendAlert({
type: 'recovery_stalling',
pendingJobs,
failedJobs
});
}
}, 30000); // Check every 30 seconds
}
}
Debugging Checklist
When troubleshooting webhook failures, work through this checklist:
✅ Basic Connectivity
- Endpoint URL is accessible via HTTPS
- DNS resolves correctly
- SSL certificate is valid and not expired
- Firewall allows inbound HTTPS traffic
- Server is running and responding to requests
✅ Request Handling
- Endpoint accepts POST requests
- Content-Type: application/json is handled correctly
- Raw request body is preserved for signature verification
- Request processing completes within 15 seconds
- Proper HTTP status codes are returned
✅ Signature Verification
- Webhook signing secret is correctly configured
- Using the raw request body (not re-stringified JSON)
- Headers are being read correctly (webhook-id, webhook-timestamp, webhook-signature)
- Signature verification logic is implemented correctly
- Timestamp tolerance allows for reasonable clock skew
✅ Error Handling
- Transient errors return 5xx status codes (for retries)
- Permanent errors return 410 Gone (to stop retries)
- Success cases return 2xx status codes
- Proper logging for debugging failed requests
✅ Application Logic
- Webhook events are processed idempotently
- Database operations are atomic and handle concurrency
- Long-running operations are queued for async processing
- Memory and resource usage are within limits
Getting Help
If you're still experiencing issues after working through this troubleshooting guide:
- Check the webhook dashboard for detailed error logs and retry information
- Review your application logs for any error messages or stack traces
- Test with the webhook testing feature to isolate the issue
- Verify your implementation matches the code examples in this documentation
The webhook dashboard provides detailed logs, delivery attempts, and error messages that can help you pinpoint exactly where the failure is occurring.
Updated 2 days ago