Guide 25 mins

Claude in Production: Webhook Architectures

Production-ready webhook architectures for Claude deployments. Learn patterns, failure scenarios, verification, retries, and real code examples.

The PADISO Team ·2026-06-16

Why Webhooks Matter for Claude in Production
Webhook Architecture Fundamentals
Security and Signature Verification
Retry Logic and Delivery Guarantees
Reference Diagram and Flow
Code Patterns and Implementation
Failure Scenarios and How to Prevent Them
Monitoring and Observability
Scaling Webhook Infrastructure
Next Steps and Production Readiness

Why Webhooks Matter for Claude in Production {#why-webhooks-matter}

When you deploy Claude at scale—whether you’re running agentic AI workflows, automating customer support triage, or orchestrating multi-step business processes—you need a way for Claude to notify your systems of state changes, completions, and errors in real time. Webhooks are the production-grade pattern for this.

Unlike polling (where you repeatedly ask “Is the work done yet?”), webhooks let Claude push events to your infrastructure. This means lower latency, reduced database load, and the ability to react to events as they happen. For teams building AI-powered products, this translates directly into faster user feedback loops and more responsive automation.

At PADISO, we’ve deployed Claude webhooks across dozens of production systems—from fintech platforms automating compliance workflows to e-commerce engines generating product descriptions at scale. The pattern we’ve standardised here reflects real production constraints: verification must be cryptographically sound, retries must be idempotent, and failure modes must be observable.

This guide covers the complete architecture: how to subscribe to Claude webhooks, verify signatures, handle retries, prevent failure cascades, and scale reliably. We’ll walk through code, diagrams, and the specific failure scenarios that catch teams in production.

Webhook Architecture Fundamentals {#webhook-fundamentals}

What Is a Webhook?

A webhook is an HTTP callback—a way for one system (Claude) to notify another system (your application) when something happens. Instead of your code asking Claude “Is the message processed?” every second, Claude pushes a message to a URL you control when processing is complete.

For Claude deployments, webhooks typically fire when:

A managed agent completes a task
An API request reaches a terminal state (success, error, timeout)
A long-running inference finishes
A streaming session closes

The webhook event is a JSON payload sent via HTTP POST to your endpoint. Your endpoint receives it, processes it, and returns a 2xx status code to confirm receipt.

Webhook vs. Polling: Why It Matters

In a polling architecture, your code runs on a schedule—every 5 seconds, every minute—asking “Is the work done?” This works for low-volume workloads but breaks at scale:

Database load: Every poll is a query. 1,000 concurrent tasks polling every 5 seconds = 200 queries per second.
Latency: You don’t know work is done until the next poll interval. If polls run every 60 seconds, you wait up to 60 seconds to react.
Cost: API calls and database queries add up fast.

Webhooks flip this: Claude calls you when work is done. No polling. No wasted queries. Latency drops to milliseconds.

For teams running AI automation at PADISO’s platform development centres—whether in Sydney, San Francisco, or across multiple regions—this difference is the difference between a system that scales smoothly and one that becomes a bottleneck.

Event-Driven Architecture

Webhooks are the foundation of event-driven architecture. When Claude fires a webhook, your system can:

Update a database record
Trigger downstream workflows
Send notifications to users
Log metrics and analytics
Enqueue background jobs

This decoupling means Claude doesn’t need to wait for your entire response chain to complete. Claude fires the webhook and moves on. Your system processes asynchronously.

Security and Signature Verification {#security-verification}

Why Verification Is Non-Negotiable

Any HTTP endpoint on the internet can be called by anyone. Without verification, an attacker could:

Forge webhook events to trigger false alerts
Mark completed work as failed to trigger retries
Inject malicious payloads into your processing pipeline

Production webhook architectures must verify that events actually came from Claude. This is done via cryptographic signatures.

How Claude Signs Webhooks

When you subscribe to Claude webhooks via the Claude API Docs — Subscribe to webhooks, you receive a signing key. Claude uses this key to generate an HMAC-SHA256 signature of the webhook payload and includes it in the request headers.

Your endpoint must:

Extract the signature from the request header
Compute the HMAC-SHA256 of the raw request body using your signing key
Compare the two signatures (using constant-time comparison to prevent timing attacks)
Reject the request if signatures don’t match

This is the same pattern used by Stripe Docs — Webhooks, GitHub Docs — Webhooks, and other production systems. The pattern is battle-tested.

Implementation: Signature Verification in Python

import hmac
import hashlib
import json
from flask import Flask, request

app = Flask(__name__)

# Your Claude webhook signing key (store in environment variable)
CLAUDE_WEBHOOK_KEY = "whsec_..."

def verify_webhook_signature(payload: bytes, signature: str, key: str) -> bool:
    """
    Verify a Claude webhook signature using constant-time comparison.
    
    Args:
        payload: Raw request body as bytes
        signature: Signature from X-Claude-Signature header
        key: Your webhook signing key
    
    Returns:
        True if signature is valid, False otherwise
    """
    # Compute expected signature
    expected_signature = hmac.new(
        key.encode(),
        payload,
        hashlib.sha256
    ).hexdigest()
    
    # Use constant-time comparison to prevent timing attacks
    return hmac.compare_digest(signature, expected_signature)

@app.route('/webhooks/claude', methods=['POST'])
def handle_claude_webhook():
    """
    Handle incoming Claude webhook.
    """
    # Get raw body (before Flask parses it)
    raw_body = request.get_data()
    signature = request.headers.get('X-Claude-Signature')
    
    # Verify signature
    if not signature or not verify_webhook_signature(raw_body, signature, CLAUDE_WEBHOOK_KEY):
        return {'error': 'Unauthorized'}, 401
    
    # Parse and process event
    event = json.loads(raw_body)
    process_claude_event(event)
    
    return {'status': 'received'}, 200

def process_claude_event(event: dict):
    """
    Process a Claude webhook event.
    """
    event_type = event.get('type')
    event_id = event.get('id')
    
    if event_type == 'agent.task.completed':
        handle_task_completed(event)
    elif event_type == 'agent.task.failed':
        handle_task_failed(event)
    elif event_type == 'agent.task.timeout':
        handle_task_timeout(event)

HTTPS Requirement

Claude only sends webhooks to HTTPS endpoints. This is non-negotiable. Your webhook endpoint must:

Use a valid TLS certificate (self-signed certificates are rejected)
Have a certificate chain that validates against standard root CAs
Support TLS 1.2 or higher

For local development, use tools like ngrok or localtunnel to expose a local HTTP server via a public HTTPS URL.

Retry Logic and Delivery Guarantees {#retry-logic}

Understanding Claude’s Retry Behavior

Claude uses exponential backoff with jitter to retry failed webhook deliveries. According to the Claude API Docs — Subscribe to webhooks, if your endpoint returns a non-2xx status code, Claude will retry with increasing delays.

This is good news and bad news:

Good: You get multiple chances to receive the event. If your service is temporarily down, Claude will keep trying.

Bad: You might receive the same event multiple times. Your code must be idempotent.

Idempotency: The Core Requirement

Idempotency means processing the same webhook multiple times produces the same result as processing it once. This is critical because:

Network failures can cause duplicate deliveries
Your endpoint might crash after processing but before returning a 2xx response
Clock skew or retry logic bugs can cause re-deliveries

To achieve idempotency:

Use event IDs: Every webhook has a unique id. Store processed event IDs in a database.
Check before processing: Before processing an event, check if its ID has been seen before.
Use database transactions: Ensure the check and update are atomic.

Implementation: Idempotent Webhook Handler

import json
from datetime import datetime, timedelta
from sqlalchemy import Column, String, DateTime, create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker

Base = declarative_base()
engine = create_engine('postgresql://...')
Session = sessionmaker(bind=engine)

class ProcessedWebhookEvent(Base):
    """Track processed webhook events to prevent duplicates."""
    __tablename__ = 'processed_webhook_events'
    
    event_id = Column(String, primary_key=True)
    event_type = Column(String)
    received_at = Column(DateTime, default=datetime.utcnow)
    processed_at = Column(DateTime)

def handle_claude_webhook_idempotent(event: dict):
    """
    Process a webhook event idempotently.
    """
    event_id = event.get('id')
    event_type = event.get('type')
    
    session = Session()
    try:
        # Check if event has been processed
        existing = session.query(ProcessedWebhookEvent).filter(
            ProcessedWebhookEvent.event_id == event_id
        ).first()
        
        if existing:
            # Already processed; return success without reprocessing
            return {'status': 'already_processed'}
        
        # Process the event
        if event_type == 'agent.task.completed':
            result = handle_task_completed(event)
        elif event_type == 'agent.task.failed':
            result = handle_task_failed(event)
        else:
            result = None
        
        # Record that we've processed this event
        processed = ProcessedWebhookEvent(
            event_id=event_id,
            event_type=event_type,
            processed_at=datetime.utcnow()
        )
        session.add(processed)
        session.commit()
        
        return {'status': 'processed', 'result': result}
    
    except Exception as e:
        session.rollback()
        raise
    finally:
        session.close()

Cleanup and Retention

Processed event records accumulate over time. Implement a cleanup job to remove old records (e.g., older than 90 days) to prevent the table from growing unbounded:

def cleanup_old_webhook_events(days_to_keep: int = 90):
    """
    Remove processed webhook events older than the retention period.
    """
    session = Session()
    cutoff = datetime.utcnow() - timedelta(days=days_to_keep)
    
    deleted = session.query(ProcessedWebhookEvent).filter(
        ProcessedWebhookEvent.processed_at < cutoff
    ).delete()
    
    session.commit()
    session.close()
    
    return deleted

Comparison with Other Platforms

The idempotency pattern we’ve outlined mirrors best practices from OpenAI Platform Docs — Webhooks guide and Stripe Docs — Webhooks. All production webhook systems require idempotent handlers.

Reference Diagram and Flow {#reference-diagram}

High-Level Architecture

┌──────────────────────────────────────────────────────────────┐
│                    Your Application                           │
├──────────────────────────────────────────────────────────────┤
│                                                                │
│  ┌─────────────────┐         ┌──────────────────────┐        │
│  │  API Client     │         │  Webhook Endpoint    │        │
│  │  (initiate task)│         │  (receive events)    │        │
│  └────────┬────────┘         └──────────┬───────────┘        │
│           │                             ▲                     │
│           │                             │                     │
│           ▼                             │                     │
│  ┌──────────────────────────┐   ┌───────┴─────────────┐      │
│  │  Claude API              │   │  Event Processing   │      │
│  │  (task_id: abc123)       │   │  - Verify signature │      │
│  └──────────────────────────┘   │  - Check idempotency│      │
│           │                     │  - Update database  │      │
│           │                     │  - Trigger workflows│      │
│           │                     └─────────────────────┘      │
│           │                                                   │
│           └──────────────────────────────────────────────────┤
│                      (HTTPS POST to                           │
│                   /webhooks/claude)                          │
│                                                                │
│  ┌─────────────────────────────────────────────────────┐    │
│  │  Database                                           │    │
│  │  - Tasks table (status, result)                     │    │
│  │  - Processed events (deduplication)                 │    │
│  │  - Audit logs                                       │    │
│  └─────────────────────────────────────────────────────┘    │
│                                                                │
└──────────────────────────────────────────────────────────────┘
                           ▲
                           │
                    HTTPS (TLS 1.2+)
                           │
┌──────────────────────────────────────────────────────────────┐
│                  Claude (Anthropic)                           │
├──────────────────────────────────────────────────────────────┤
│                                                                │
│  ┌──────────────────────┐                                    │
│  │  Task Execution      │                                    │
│  │  - Process request   │                                    │
│  │  - Generate response │                                    │
│  │  - Determine outcome │                                    │
│  └──────────┬───────────┘                                    │
│             │                                                 │
│             ▼                                                 │
│  ┌──────────────────────────────────────────────┐            │
│  │  Webhook Event Generation                    │            │
│  │  - Build event payload                       │            │
│  │  - Sign with HMAC-SHA256                     │            │
│  │  - Prepare headers                           │            │
│  └──────────┬───────────────────────────────────┘            │
│             │                                                 │
│             ▼                                                 │
│  ┌──────────────────────────────────────────────┐            │
│  │  Retry Queue                                 │            │
│  │  - Initial attempt                           │            │
│  │  - Exponential backoff on failure            │            │
│  │  - Max 10 retries (configurable)             │            │
│  └──────────────────────────────────────────────┘            │
│                                                                │
└──────────────────────────────────────────────────────────────┘

Event Flow Sequence

1. Client initiates task
   → POST /api/v1/agents/tasks
   → Returns {task_id: "abc123"}

2. Claude processes task
   → Inference runs
   → Result determined

3. Claude generates webhook event
   → Event ID: "evt_xyz789"
   → Type: "agent.task.completed"
   → Payload includes task_id, result, metadata

4. Claude signs webhook
   → Computes HMAC-SHA256 of payload
   → Includes signature in X-Claude-Signature header

5. Claude sends HTTPS POST
   → Target: https://yourapp.com/webhooks/claude
   → Headers: X-Claude-Signature, Content-Type
   → Body: JSON event payload

6. Your endpoint receives request
   → Verifies signature (constant-time comparison)
   → Checks event ID against processed_webhook_events table
   → If new, processes event
   → Returns 200 OK

7. If your endpoint returns non-2xx
   → Claude retries with exponential backoff
   → Retry delays: 1s, 2s, 4s, 8s, 16s, 32s, 64s, 128s, 256s, 512s
   → After 10 retries, event is abandoned

8. If your endpoint returns 200 OK
   → Claude marks delivery as successful
   → No further retries

Code Patterns and Implementation {#code-patterns}

Complete Flask Implementation

Here’s a production-ready Flask application that handles Claude webhooks end-to-end:

import os
import json
import hmac
import hashlib
import logging
from datetime import datetime
from flask import Flask, request, jsonify
from sqlalchemy import Column, String, DateTime, Text, create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from sqlalchemy.exc import IntegrityError

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Flask app
app = Flask(__name__)

# Database setup
Base = declarative_base()
engine = create_engine(os.environ['DATABASE_URL'])
Session = sessionmaker(bind=engine)

class ProcessedWebhookEvent(Base):
    __tablename__ = 'processed_webhook_events'
    event_id = Column(String, primary_key=True)
    event_type = Column(String)
    payload = Column(Text)
    received_at = Column(DateTime, default=datetime.utcnow)
    processed_at = Column(DateTime, default=datetime.utcnow)

class TaskResult(Base):
    __tablename__ = 'task_results'
    task_id = Column(String, primary_key=True)
    status = Column(String)  # 'pending', 'completed', 'failed', 'timeout'
    result = Column(Text)
    error = Column(Text)
    completed_at = Column(DateTime)

Base.metadata.create_all(engine)

def verify_webhook_signature(payload: bytes, signature: str, key: str) -> bool:
    """
    Verify Claude webhook signature using constant-time comparison.
    """
    expected = hmac.new(key.encode(), payload, hashlib.sha256).hexdigest()
    return hmac.compare_digest(signature, expected)

@app.route('/webhooks/claude', methods=['POST'])
def handle_webhook():
    """
    Main webhook handler for Claude events.
    """
    try:
        # Get raw body and signature
        raw_body = request.get_data()
        signature = request.headers.get('X-Claude-Signature')
        webhook_key = os.environ.get('CLAUDE_WEBHOOK_KEY')
        
        # Verify signature
        if not signature or not verify_webhook_signature(raw_body, signature, webhook_key):
            logger.warning('Invalid webhook signature')
            return jsonify({'error': 'Unauthorized'}), 401
        
        # Parse event
        event = json.loads(raw_body)
        event_id = event.get('id')
        event_type = event.get('type')
        
        logger.info(f'Received webhook event: {event_id} ({event_type})')
        
        # Check for duplicate
        session = Session()
        try:
            existing = session.query(ProcessedWebhookEvent).filter(
                ProcessedWebhookEvent.event_id == event_id
            ).first()
            
            if existing:
                logger.info(f'Event {event_id} already processed')
                return jsonify({'status': 'already_processed'}), 200
            
            # Process based on event type
            if event_type == 'agent.task.completed':
                process_task_completed(event, session)
            elif event_type == 'agent.task.failed':
                process_task_failed(event, session)
            elif event_type == 'agent.task.timeout':
                process_task_timeout(event, session)
            else:
                logger.warning(f'Unknown event type: {event_type}')
            
            # Record processed event
            processed = ProcessedWebhookEvent(
                event_id=event_id,
                event_type=event_type,
                payload=raw_body.decode('utf-8')
            )
            session.add(processed)
            session.commit()
            
            logger.info(f'Successfully processed event {event_id}')
            return jsonify({'status': 'processed'}), 200
        
        except IntegrityError:
            session.rollback()
            logger.info(f'Event {event_id} was processed by another worker')
            return jsonify({'status': 'processed'}), 200
        
        except Exception as e:
            session.rollback()
            logger.error(f'Error processing event {event_id}: {str(e)}')
            # Return 5xx to trigger Claude retry
            return jsonify({'error': 'Processing failed'}), 500
        
        finally:
            session.close()
    
    except json.JSONDecodeError:
        logger.error('Invalid JSON in webhook body')
        return jsonify({'error': 'Invalid JSON'}), 400
    
    except Exception as e:
        logger.error(f'Unexpected error: {str(e)}')
        return jsonify({'error': 'Internal error'}), 500

def process_task_completed(event: dict, session):
    """
    Handle task completion event.
    """
    task_id = event.get('data', {}).get('task_id')
    result = event.get('data', {}).get('result')
    
    if not task_id:
        logger.warning('No task_id in completed event')
        return
    
    # Update database
    task = session.query(TaskResult).filter(
        TaskResult.task_id == task_id
    ).first()
    
    if not task:
        task = TaskResult(task_id=task_id)
        session.add(task)
    
    task.status = 'completed'
    task.result = json.dumps(result) if result else None
    task.completed_at = datetime.utcnow()
    
    logger.info(f'Task {task_id} completed')

def process_task_failed(event: dict, session):
    """
    Handle task failure event.
    """
    task_id = event.get('data', {}).get('task_id')
    error = event.get('data', {}).get('error')
    
    if not task_id:
        logger.warning('No task_id in failed event')
        return
    
    task = session.query(TaskResult).filter(
        TaskResult.task_id == task_id
    ).first()
    
    if not task:
        task = TaskResult(task_id=task_id)
        session.add(task)
    
    task.status = 'failed'
    task.error = error
    task.completed_at = datetime.utcnow()
    
    logger.error(f'Task {task_id} failed: {error}')

def process_task_timeout(event: dict, session):
    """
    Handle task timeout event.
    """
    task_id = event.get('data', {}).get('task_id')
    
    if not task_id:
        logger.warning('No task_id in timeout event')
        return
    
    task = session.query(TaskResult).filter(
        TaskResult.task_id == task_id
    ).first()
    
    if not task:
        task = TaskResult(task_id=task_id)
        session.add(task)
    
    task.status = 'timeout'
    task.error = 'Task exceeded maximum execution time'
    task.completed_at = datetime.utcnow()
    
    logger.warning(f'Task {task_id} timed out')

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000, debug=False)

Node.js Implementation with Express

For teams preferring Node.js, here’s an equivalent Express implementation:

const express = require('express');
const crypto = require('crypto');
const { Pool } = require('pg');

const app = express();
app.use(express.raw({ type: 'application/json' }));

const pool = new Pool({
  connectionString: process.env.DATABASE_URL
});

const CLAUDE_WEBHOOK_KEY = process.env.CLAUDE_WEBHOOK_KEY;

function verifyWebhookSignature(payload, signature, key) {
  const expected = crypto
    .createHmac('sha256', key)
    .update(payload)
    .digest('hex');
  
  // Constant-time comparison
  return crypto.timingSafeEqual(
    Buffer.from(signature),
    Buffer.from(expected)
  );
}

app.post('/webhooks/claude', async (req, res) => {
  try {
    const rawBody = req.body;
    const signature = req.headers['x-claude-signature'];
    
    // Verify signature
    if (!signature || !verifyWebhookSignature(rawBody, signature, CLAUDE_WEBHOOK_KEY)) {
      console.warn('Invalid webhook signature');
      return res.status(401).json({ error: 'Unauthorized' });
    }
    
    // Parse event
    const event = JSON.parse(rawBody.toString('utf-8'));
    const eventId = event.id;
    const eventType = event.type;
    
    console.log(`Received webhook event: ${eventId} (${eventType})`);
    
    // Check for duplicate
    const existing = await pool.query(
      'SELECT event_id FROM processed_webhook_events WHERE event_id = $1',
      [eventId]
    );
    
    if (existing.rows.length > 0) {
      console.log(`Event ${eventId} already processed`);
      return res.status(200).json({ status: 'already_processed' });
    }
    
    // Process based on event type
    if (eventType === 'agent.task.completed') {
      await processTaskCompleted(event);
    } else if (eventType === 'agent.task.failed') {
      await processTaskFailed(event);
    } else if (eventType === 'agent.task.timeout') {
      await processTaskTimeout(event);
    }
    
    // Record processed event
    await pool.query(
      'INSERT INTO processed_webhook_events (event_id, event_type, payload) VALUES ($1, $2, $3)',
      [eventId, eventType, rawBody.toString('utf-8')]
    );
    
    console.log(`Successfully processed event ${eventId}`);
    return res.status(200).json({ status: 'processed' });
  
  } catch (error) {
    console.error('Webhook processing error:', error);
    // Return 5xx to trigger Claude retry
    return res.status(500).json({ error: 'Processing failed' });
  }
});

async function processTaskCompleted(event) {
  const taskId = event.data?.task_id;
  const result = event.data?.result;
  
  if (!taskId) return;
  
  await pool.query(
    'INSERT INTO task_results (task_id, status, result, completed_at) VALUES ($1, $2, $3, NOW()) ON CONFLICT (task_id) DO UPDATE SET status = $2, result = $3, completed_at = NOW()',
    [taskId, 'completed', JSON.stringify(result)]
  );
  
  console.log(`Task ${taskId} completed`);
}

async function processTaskFailed(event) {
  const taskId = event.data?.task_id;
  const error = event.data?.error;
  
  if (!taskId) return;
  
  await pool.query(
    'INSERT INTO task_results (task_id, status, error, completed_at) VALUES ($1, $2, $3, NOW()) ON CONFLICT (task_id) DO UPDATE SET status = $2, error = $3, completed_at = NOW()',
    [taskId, 'failed', error]
  );
  
  console.error(`Task ${taskId} failed: ${error}`);
}

async function processTaskTimeout(event) {
  const taskId = event.data?.task_id;
  
  if (!taskId) return;
  
  await pool.query(
    'INSERT INTO task_results (task_id, status, error, completed_at) VALUES ($1, $2, $3, NOW()) ON CONFLICT (task_id) DO UPDATE SET status = $2, error = $3, completed_at = NOW()',
    [taskId, 'timeout', 'Task exceeded maximum execution time']
  );
  
  console.warn(`Task ${taskId} timed out`);
}

app.listen(3000, () => {
  console.log('Webhook server listening on port 3000');
});

Failure Scenarios and How to Prevent Them {#failure-scenarios}

Production deployments fail in predictable ways. Here are the failure modes we see most often, how they manifest, and how to prevent them.

Scenario 1: Signature Verification Bypass

What happens: Your code skips signature verification or implements it incorrectly. An attacker forges webhook events to trigger false alerts or corrupt data.

How it happens:

Developer uses string comparison instead of constant-time comparison (signature == expected instead of hmac.compare_digest)
Signing key is logged or exposed in error messages
Signature is verified against the wrong key (e.g., a test key in production)

Prevention:

Always use constant-time comparison (built into hmac.compare_digest in Python, crypto.timingSafeEqual in Node.js)
Store signing keys in environment variables, never in code
Rotate signing keys periodically
Log verification failures but never log the actual key or signature
Unit test signature verification with both valid and forged signatures

Scenario 2: Duplicate Event Processing

What happens: The same webhook event is processed multiple times, leading to duplicate records, double-charged transactions, or repeated notifications.

How it happens:

Your endpoint processes the event but crashes before returning a 2xx response
Claude retries, and your code processes it again
Multiple instances of your application process the same event simultaneously

Prevention:

Implement idempotent event handling using event IDs
Use database transactions to make the “check if processed” and “mark as processed” operations atomic
Use a distributed lock (Redis, DynamoDB) if running multiple instances
Always return 2xx status codes even if processing fails (use 200 OK for transient failures that Claude should retry, 400 Bad Request for permanent failures)

Scenario 3: Endpoint Timeout

What happens: Your webhook endpoint takes too long to respond, Claude times out, and retries the event.

How it happens:

Your event handler makes synchronous database queries that lock
You call external APIs without timeouts
You process the entire event synchronously instead of queueing work

Prevention:

Keep webhook handlers fast. Target sub-100ms response times.
Offload heavy processing to background jobs (Celery, Bull, etc.)
Set timeouts on all external API calls
Use connection pooling to avoid database connection exhaustion
Monitor webhook endpoint latency and alert on p99 > 500ms

Scenario 4: Database Connection Exhaustion

What happens: Your webhook handler opens database connections but doesn’t close them. Eventually, all available connections are consumed, and new webhooks fail.

How it happens:

Exception handling doesn’t close connections
Connection pooling is misconfigured (pool size too small)
A single webhook handler spawns multiple database connections

Prevention:

Use connection pooling with appropriate pool size (typically 10-20 per CPU core)
Always close connections in finally blocks or use context managers
Monitor connection pool usage and alert when utilisation exceeds 80%
Set connection timeouts to prevent hung connections from blocking the pool

Scenario 5: Silent Failures

What happens: Your webhook handler catches exceptions and returns 200 OK without actually processing the event. Claude thinks delivery succeeded and doesn’t retry.

How it happens:

@app.route('/webhooks/claude', methods=['POST'])
def handle_webhook():
    try:
        event = parse_event(request)
        process_event(event)
        return {'status': 'ok'}, 200
    except Exception:
        # BUG: Silently swallow the error
        return {'status': 'ok'}, 200

Prevention:

Return 5xx status codes for transient failures (500 Internal Server Error, 502 Bad Gateway, 503 Service Unavailable)
Return 4xx status codes only for permanent failures (400 Bad Request for malformed events)
Log all exceptions with full stack traces
Use structured logging (JSON logs) so exceptions are searchable
Set up alerts for 5xx webhook responses

Scenario 6: Clock Skew and Replay Attacks

What happens: An attacker captures a webhook event and replays it hours or days later. Your code processes it as if it were new.

How it happens:

Your idempotency check only uses event ID, not timestamp
You don’t validate event timestamps
Webhook events are stored indefinitely, allowing old events to be replayed

Prevention:

Include timestamp in the event payload (Claude does this)
Reject events older than a threshold (e.g., > 5 minutes)
Combine event ID and timestamp for deduplication
Clean up processed events after a retention period (90 days is reasonable)

def verify_event_timestamp(event: dict, max_age_seconds: int = 300) -> bool:
    """
    Verify that event timestamp is recent.
    """
    event_timestamp = datetime.fromisoformat(event.get('timestamp'))
    age = (datetime.utcnow() - event_timestamp).total_seconds()
    return 0 <= age <= max_age_seconds

Monitoring and Observability {#monitoring}

Key Metrics to Track

Production webhook systems require comprehensive observability. Track these metrics:

Delivery rate: Percentage of events successfully delivered (target: > 99.9%)
Latency: Time from webhook dispatch to endpoint response (target: p99 < 500ms)
Error rate: Percentage of requests returning non-2xx status (target: < 0.1%)
Duplicate rate: Percentage of events processed multiple times (target: 0%)
Retry rate: Percentage of events requiring retries (target: < 1%)
Processing time: Time to process event after signature verification (target: p99 < 100ms)

Structured Logging

Use structured logging (JSON) so logs are machine-readable and searchable:

import json
import logging
from datetime import datetime

class JSONFormatter(logging.Formatter):
    def format(self, record):
        log_data = {
            'timestamp': datetime.utcnow().isoformat(),
            'level': record.levelname,
            'logger': record.name,
            'message': record.getMessage(),
            'module': record.module,
            'function': record.funcName,
            'line': record.lineno
        }
        if record.exc_info:
            log_data['exception'] = self.formatException(record.exc_info)
        return json.dumps(log_data)

handler = logging.StreamHandler()
handler.setFormatter(JSONFormatter())
logger = logging.getLogger()
logger.addHandler(handler)

Example log output:

{
  "timestamp": "2024-01-15T10:23:45.123Z",
  "level": "INFO",
  "logger": "webhook",
  "message": "Webhook event processed successfully",
  "event_id": "evt_abc123",
  "event_type": "agent.task.completed",
  "task_id": "task_xyz789",
  "processing_time_ms": 45
}

Alerting Rules

Set up alerts for these conditions:

Webhook endpoint returning 5xx: Immediate alert (indicates service degradation)
Error rate > 1%: 5-minute alert (something is wrong)
Latency p99 > 1 second: 10-minute alert (performance degradation)
Duplicate processing detected: Daily digest (indicates idempotency issues)
Webhook delivery backlog: 15-minute alert (Claude is queuing events)

Distributed Tracing

For complex systems, use distributed tracing to correlate webhook events with downstream actions:

from opentelemetry import trace

tracer = trace.get_tracer(__name__)

@app.route('/webhooks/claude', methods=['POST'])
def handle_webhook():
    with tracer.start_as_current_span('handle_webhook') as span:
        raw_body = request.get_data()
        event = json.loads(raw_body)
        
        span.set_attribute('event.id', event.get('id'))
        span.set_attribute('event.type', event.get('type'))
        
        with tracer.start_as_current_span('verify_signature'):
            verify_webhook_signature(raw_body, request.headers.get('X-Claude-Signature'), WEBHOOK_KEY)
        
        with tracer.start_as_current_span('process_event'):
            process_claude_event(event)
        
        return {'status': 'processed'}, 200

Scaling Webhook Infrastructure {#scaling}

Horizontal Scaling

As webhook volume grows, you’ll need to scale horizontally. Key considerations:

Load balancing: Distribute incoming webhooks across multiple instances
Idempotency database: Shared across all instances (e.g., PostgreSQL, DynamoDB)
Distributed locks: Prevent duplicate processing when multiple instances receive the same event

Here’s a pattern using Redis for distributed locking:

import redis
from redis.lock import Lock

redis_client = redis.Redis(host='localhost', port=6379)

def handle_webhook_with_lock(event_id: str, event_type: str, process_fn):
    """
    Process webhook with distributed locking to prevent duplicates.
    """
    lock_key = f'webhook:lock:{event_id}'
    lock = Lock(redis_client, lock_key, timeout=30, blocking=True)
    
    try:
        if not lock.acquire(blocking_timeout=5):
            # Another instance is processing this event
            logger.info(f'Event {event_id} is being processed by another instance')
            return {'status': 'processing'}
        
        # Check if already processed
        if redis_client.exists(f'webhook:processed:{event_id}'):
            logger.info(f'Event {event_id} already processed')
            return {'status': 'already_processed'}
        
        # Process event
        process_fn(event_id, event_type)
        
        # Mark as processed
        redis_client.setex(f'webhook:processed:{event_id}', 86400, '1')
        
        return {'status': 'processed'}
    
    finally:
        lock.release()

Queue-Based Architecture

For very high-volume deployments, decouple webhook ingestion from processing:

┌─────────────────────────────────────────────────────┐
│  Webhook Ingestion Layer                            │
│  - Verify signature                                 │
│  - Check signature                                  │
│  - Enqueue to message queue                         │
│  - Return 200 OK immediately                        │
└────────────────┬────────────────────────────────────┘
                 │
                 ▼
        ┌────────────────┐
        │ Message Queue  │
        │ (Kafka, RabbitMQ, SQS) │
        └────────┬───────┘
                 │
        ┌────────▼──────────┐
        │ Worker Pool       │
        │ - Consume events  │
        │ - Process events  │
        │ - Update DB       │
        │ - Trigger actions │
        └───────────────────┘

This pattern allows you to:

Respond to webhooks immediately (ingestion layer returns 200 OK)
Scale processing independently (add more workers without adding webhook endpoints)
Retry failed processing without Claude’s retry mechanism
Process events in parallel with configurable concurrency

Database Optimization

As webhook volume grows, optimize your database for high throughput:

Partition processed_webhook_events table: Partition by event timestamp to keep indexes small
Batch inserts: Insert processed events in batches of 100-1000
Async writes: Use write-ahead logging (WAL) for PostgreSQL
Connection pooling: Use PgBouncer or similar to manage connections
Monitoring: Track query latency, index usage, and table bloat

-- Partition processed_webhook_events by month
CREATE TABLE processed_webhook_events (
    event_id VARCHAR PRIMARY KEY,
    event_type VARCHAR,
    payload TEXT,
    received_at TIMESTAMP,
    processed_at TIMESTAMP
) PARTITION BY RANGE (processed_at);

-- Create partitions for each month
CREATE TABLE processed_webhook_events_2024_01 PARTITION OF processed_webhook_events
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

CREATE TABLE processed_webhook_events_2024_02 PARTITION OF processed_webhook_events
    FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

Next Steps and Production Readiness {#next-steps}

Pre-Launch Checklist

Before deploying webhook infrastructure to production:

Testing Strategy

Implement comprehensive tests for webhook handling:

import pytest
import json
import hmac
import hashlib

def test_valid_webhook_signature():
    """
    Test that valid webhooks are accepted.
    """
    payload = json.dumps({'id': 'evt_123', 'type': 'agent.task.completed'})
    key = 'test_key'
    signature = hmac.new(key.encode(), payload.encode(), hashlib.sha256).hexdigest()
    
    response = client.post(
        '/webhooks/claude',
        data=payload,
        headers={'X-Claude-Signature': signature}
    )
    
    assert response.status_code == 200

def test_invalid_webhook_signature():
    """
    Test that invalid webhooks are rejected.
    """
    payload = json.dumps({'id': 'evt_123', 'type': 'agent.task.completed'})
    invalid_signature = 'invalid_signature'
    
    response = client.post(
        '/webhooks/claude',
        data=payload,
        headers={'X-Claude-Signature': invalid_signature}
    )
    
    assert response.status_code == 401

def test_duplicate_event_handling():
    """
    Test that duplicate events are handled idempotently.
    """
    payload = json.dumps({'id': 'evt_123', 'type': 'agent.task.completed'})
    key = 'test_key'
    signature = hmac.new(key.encode(), payload.encode(), hashlib.sha256).hexdigest()
    
    # First request
    response1 = client.post(
        '/webhooks/claude',
        data=payload,
        headers={'X-Claude-Signature': signature}
    )
    assert response1.status_code == 200
    
    # Second request (duplicate)
    response2 = client.post(
        '/webhooks/claude',
        data=payload,
        headers={'X-Claude-Signature': signature}
    )
    assert response2.status_code == 200
    
    # Verify event was only processed once
    assert db.query('SELECT COUNT(*) FROM task_results').scalar() == 1

Observability Setup

For teams operating at scale, we recommend Cloudflare Developers — Webhook signature verification example as a reference for edge-based verification, and AWS EventBridge User Guide — Invoking targets with events for understanding event-driven patterns at scale.

Also consider Microsoft Learn — HTTP-triggered workflows and endpoints for webhook integration with enterprise automation platforms.

Getting Help

If you’re building production Claude deployments and need fractional CTO leadership, platform engineering expertise, or help scaling webhook infrastructure, PADISO’s Services include CTO as a Service and Platform Design & Engineering tailored to your architecture. We’ve deployed webhook systems across Sydney, San Francisco, and other regions.

For a diagnostic of your current Claude readiness and infrastructure, consider our AI Quickstart Audit—a fixed-scope, two-week engagement that identifies production risks and prioritises what to ship first.

Key Takeaways

Webhooks are essential for production Claude deployments: They enable real-time event-driven architectures without polling.
Security is non-negotiable: Always verify signatures using constant-time comparison, and store keys in environment variables.
Idempotency is mandatory: Use event IDs and database transactions to ensure the same event is never processed twice.
Failure modes are predictable: Implement the patterns we’ve outlined to prevent signature bypass, duplicates, timeouts, and silent failures.
Observability is critical: Monitor delivery rate, latency, error rate, and duplicate detection from day one.
Scale horizontally with care: Use distributed locking and message queues to handle high volumes reliably.

Webhook architectures are mature, battle-tested patterns. Following the guidance in this guide—signature verification, idempotency, error handling, monitoring—will give you a production-grade foundation for Claude deployments at any scale.

Additional Resources

For reference implementations and deeper technical guidance, review the official documentation from Claude API Docs — Subscribe to webhooks, OpenAI Platform Docs — Webhooks guide, and Stripe Docs — Webhooks. All three platforms use similar patterns for security and reliability.

For edge-based verification and advanced routing, Cloudflare Developers — Webhook signature verification example shows how to verify signatures at the edge before routing to origin servers. For enterprise event orchestration, GitHub Docs — Webhooks and Sentry Docs — Webhooks provide additional reference patterns.

If you’re building multi-region or event-driven infrastructure, AWS EventBridge User Guide — Invoking targets with events covers managed event routing at scale, and Microsoft Learn — HTTP-triggered workflows and endpoints shows how to integrate webhooks with enterprise automation platforms.

Want to talk through your situation?

Book a 30-minute call with Kevin (Founder/CEO). No pitch - direct advice on what to do next.

Book a 30-min call

Claude in Production: Webhook Architectures

Table of Contents

Why Webhooks Matter for Claude in Production {#why-webhooks-matter}

Webhook Architecture Fundamentals {#webhook-fundamentals}

What Is a Webhook?

Webhook vs. Polling: Why It Matters

Event-Driven Architecture

Security and Signature Verification {#security-verification}

Why Verification Is Non-Negotiable

How Claude Signs Webhooks

Implementation: Signature Verification in Python

HTTPS Requirement

Retry Logic and Delivery Guarantees {#retry-logic}

Understanding Claude’s Retry Behavior

Idempotency: The Core Requirement

Implementation: Idempotent Webhook Handler

Cleanup and Retention

Comparison with Other Platforms

Reference Diagram and Flow {#reference-diagram}

High-Level Architecture

Event Flow Sequence

Code Patterns and Implementation {#code-patterns}

Complete Flask Implementation

Node.js Implementation with Express

Failure Scenarios and How to Prevent Them {#failure-scenarios}

Scenario 1: Signature Verification Bypass

Scenario 2: Duplicate Event Processing

Scenario 3: Endpoint Timeout

Scenario 4: Database Connection Exhaustion

Scenario 5: Silent Failures

Scenario 6: Clock Skew and Replay Attacks

Monitoring and Observability {#monitoring}

Key Metrics to Track

Structured Logging

Alerting Rules

Distributed Tracing

Scaling Webhook Infrastructure {#scaling}

Horizontal Scaling

Queue-Based Architecture

Database Optimization

Next Steps and Production Readiness {#next-steps}

Pre-Launch Checklist

Testing Strategy

Observability Setup

Getting Help

Key Takeaways

Additional Resources

Want to talk through your situation?