Retries and delivery

Webhook delivery is at-least-once with exponential backoff retries on failure. A successful delivery is any 2xx response within 15 seconds.

Retry timeline at a glance

If your endpoint isn’t reachable, we retry on a backoff that totals roughly seven days before giving up. Visually:

flowchart LR
A(["Initial<br/>delivery"]) --> B(["Retry 1<br/>~30s"])
B --> C(["Retry 2<br/>~2m"])
C --> D(["Retry 3<br/>~10m"])
D --> E(["Retry 4<br/>~30m"])
E --> F(["Retry 5<br/>~2h"])
F --> G(["Retry 6<br/>~6h"])
G --> H(["Retry 7<br/>~18h"])
H --> I(["Retry 8<br/>~48h"])
I --> J(["Retry 9<br/>~96h"])
J --> K{{"Endpoint<br/>marked dead<br/>~7 days"}}
classDef stage fill:#E2E6EE,stroke:#2A3A5E,color:#0F172A;
classDef terminal fill:#F8E5E1,stroke:#C64A3B,color:#8B3023;
class A,B,C,D,E,F,G,H,I,J stage
class K terminal

Times shown are approximate cumulative offsets from the initial delivery. After roughly seven days of failure we mark the endpoint dead and email you.

What counts as failure

We retry if:

The request times out (no response after 15 seconds).
We receive a 408, 409, 425, 429, or any 5xx response.
The TLS handshake fails.
The connection is refused or dropped mid-stream.

We do not retry if:

We receive any other 4xx response. We assume your endpoint is rejecting deliberately (signature failure, schema rejection, etc.) and that retrying won’t help.
We receive a 2xx response. Even if your handler errored after responding, we consider the delivery successful.

Retry schedule

Exponential backoff with jitter:

Attempt	Earliest retry
1 (initial)	immediate
2	~30 seconds
3	~2 minutes
4	~10 minutes
5	~30 minutes
6	~2 hours
7	~6 hours
8	~18 hours
9	~48 hours
10	~96 hours

After 10 failed attempts (totaling roughly 7 days), the endpoint is marked dead and disabled. You’ll receive an email; the dashboard shows the failure.

Replay and recovery

Even after an endpoint goes dead, the events are not lost. We retain delivered and undelivered event payloads for 30 days. You can replay any event from the dashboard or via the API:

await qp.webhooks.events.replay("evt_01HZX", { endpoint_id: "whk_01HZX" });

To bulk-replay everything an endpoint missed during downtime:

await qp.webhooks.endpoints.replayMissed(endpointId, { since: "2026-05-18T00:00:00Z" });

These rate-limit aggressively to avoid overwhelming your recovering endpoint.

Designing your endpoint for at-least-once delivery

Two requirements your handler should satisfy:

Acknowledge fast, process async. Reply 200 OK as soon as you’ve durably enqueued the event for processing (e.g., wrote it to a queue). The 15-second budget is generous, but you should not spend it doing CPU-bound work or database writes that might block.
Deduplicate by event.id. The same event id can arrive twice (rarely, but it happens). Persist the id with a unique constraint, or use an idempotent upsert keyed by id.

A canonical pattern:

export async function handler(req: Request) {
  const event = verifyAndParse(req); // throws on invalid signature

  await db.webhookEvents.insertIgnoreConflict({
    id: event.id,
    received_at: new Date(),
    payload: event,
  });

  await queue.enqueue("process-event", { id: event.id });

  return new Response("ok");
}

The downstream worker reads from the queue, fetches the event by id, processes it, and marks done. Re-deliveries hit the insertIgnoreConflict and short-circuit harmlessly.

Scaling your endpoint

If you’re processing high-volume events (e.g., card.transaction.authorized at peak hours):

Use a queue between receipt and processing — at-least-once delivery from QairoPay plus retries inside your system gives you a clean separation.
Run the receipt handler in an autoscaling environment (serverless or a long-lived service with HPA).
Watch your endpoint’s p99 response time in the QairoPay dashboard. If it creeps toward 15s, you’re heading for retry storms.

Event ordering

Events for different resources can arrive in any order. Events for the same resource are best-effort ordered by created time, but at-least-once delivery means you may occasionally see a pass.updated arrive before the pass.created it depends on (if the latter was retried).

Resolve this by treating every event as a state update on a snapshot, using previous_attributes to detect ordering anomalies, and refetching the resource from the API when in doubt. Don’t rely on event order for correctness.