Command Palette

Search for a command to run...

GitHub
Blog
PreviousNext

indepth-webhooks

Hey, recently worked on designing a scalable webhook and will be discussing its actual implementation, reason for each decision taking, costing and where to go from here(means on >10M req hit).

lets get started!

A bit of history:- before webhooks became common, most systems relied heavily on polling. A service would repeatedly ask another system:
“Do you have new data?”
“Any updates?”
“Any new orders/events?” our server makes connection to third-party integration server and constantly ask for any update on sequential timer. if connection breaks request gone forever no recovery, after each new update the connection resets

It worked… but at scale it created unnecessary traffic, delayed updates, wasted compute, and inefficient integrations. not reliable for low latency and high throughput.

Webhooks changed that model completely. Instead of constantly asking for updates, systems can now react to events in real time:
“Something happened → notify immediately.” in initial request/ first request our server sets webhook, usually the webhook tag or action/ behavior and our endpoint to send that event on.

payload = {
	"webhook": {
			"topic": topic, # for which event/ topic to set it up
			"address": webhook_url, # adderess for your server
			"format": "json"
		}
	}

if run an loop for all the events/ topic to set webhook

it generally also expects to be responded under less than 5 sec time limit with 200 status code. to be know that this webhook event is been delivered and marked completed.

Most webhook systems work fine… until scale, retries, duplicate deliveries, and downstream dependencies start piling up.

in this blog we designing a scalable webhook processing architecture focused on reliability, observability, and asynchronous execution. built in a way that can work for any platform, not tied to a single provider.


1) Entry point

firstly we have to make entry point where the webhook will hit we cant just make it to hit our server as it will be in private subnet in VPC and cant expose to even 3rd party networks so we have 2 options:

bastion/ proxy server: an server out side of our private subnet which will have public IP and every webhook call will go to it, this server will first verify the request by HMAC and pre-created secret code (optional mostly HMAC is enough) and will respond back immediately with status 200 as received webhook event and than will forward that payload and request to our main server in private subnet

down-side:

  1. extra server spin-up
  2. should be running 24/7 means cost will increase (have to pay sleep cost)

recommended 2nd option is serverless functions (eg. aws lamda): works same as proxy server allow public invoke, first verify HMAC code and respond immediately and sends req and payload to main server. its serverless so need to setup and manage + we only pay per use which reduces the bill, also cold start is taken care of as we will be using node which can be cold start in avg 200 milliseconds and we just have to verify the request and respond immediately which will take us around 1-2 seconds avg. (can setup WAL for it)

remember: Never let external systems wait on heavy business logic. also in both the cases our core server in private subnet which again verifies the HMAC code and only after that process and performance business logic.

2) adding middle layer queue (eg. aws sqs)

our server can become SPOF and can be down for an moment or an hours and that lead to webhook event miss and cascading failures.

Instead of processing requests synchronously, events are pushed into a queue for async workers to consume.

This isolates traffic spikes and prevents webhook event miss & cascading failures.

our entry point push the request payload in queue instead of directly sending it to server. only lambda function will have write access to it

we will have 2 queue one for webhook event requests payload and another as DLQ, if normal queue request fails more than 3 retries it will push it in dlq(fallback mechanism)

VisibilityTimeout(Locking queue event) will be around 10-15 sec adding buffer times. its an parameter of queue visibility hide if any worker server is processing it. if worker A took queue event 1 it should process it within 10 sec and mark it done or it will be considered filled in that 10 sec other worker server cant see the queue event 1 and will not try it again prevents race condition and partially from Idempotency

MessageRetentionPeriod = 4 days (can be adjusted) means the queue event will stay till 4 days prevent wrongly processed worker and webhook event missing

dlq topic/ events can be monitored on preferred observability(eg. aws cloudwatch) and handled, set alert if multiple events pushed in dlq or at particular size to cross

also one question that why we not used any db instead of queue as we have to process request fast and respond queue is better option and provides fallback mechanism and reducing the load of db

Webhook → Queue → Worker → Business Logic

3) crossroad

now here we have 2 option to collect events from queue and process and both are correct and should be used as preferred architecture and needs

  1. worker node it can be another serverless function which will transfer requests from queue to server, can send bulk events in one time and handle requests distribution among multiple servers evenly (can be use for middle scale architecture or 10k/ sec) - improves scaling and fault isolation

  2. server as worker means that our server will fetch the events from queue it self, fetch on bulk/ chunk bases and update/ marks the status for the event, as the locking is implemented even after multiple server running the events will not be processed twice (can be use for pre to post running startups)

note: we will have an event id for all the unique webhook events in queue and server will use that to check idempotency and prevent duplicator completely.


Some key engineering decisions that made a big difference:

• Immediate ACK pattern
Webhook receivers should respond fast. Never let external systems wait on heavy business logic.
The API layer validates, authenticates, persists the event, and acknowledges quickly.

• Queue-first architecture
Instead of processing requests synchronously, events are pushed into a queue for async workers to consume.
This isolates traffic spikes and prevents cascading failures. Webhook → Queue → Worker → Business Logic

• Idempotency handling
Webhook providers retry aggressively.
Without idempotent consumers, duplicate events can silently corrupt state or trigger repeated side effects.

• Retry & dead-letter strategy
Transient failures are normal in distributed systems.
Exponential backoff, retry limits, and DLQs become essential once integrations grow. Transient failures should retry safely using:

  • exponential backoff
  • retry limits
  • dead-letter queues
  • replay mechanisms

• Worker isolation
Separating API services, workers, schedulers, and stateful dependencies improves scaling and fault isolation significantly.

  • API services
  • queue consumers
  • schedulers
  • background workers

• Event observability
One of the most underrated parts of webhook systems:

  • request tracing
  • structured logs
  • correlation IDs
  • retry visibility
  • processing latency metrics
  • queue depth monitoring

Without observability, debugging distributed event systems becomes painful very quickly.

• Security considerations
Webhook signature validation, replay attack prevention, secret rotation, rate limiting, and timestamp verification are critical for production-grade systems.

  • signature validation
  • replay protection
  • secret rotation
  • timestamp verification
  • rate limiting
  • payload verification

A webhook system is not just backend code it’s a distributed system problem.

One important lesson:
There is no “one-size-fits-all” webhook architecture.

A queue-based worker model works extremely well for many systems initially, but as throughput, fan-out requirements, and real-time event distribution grow, architectures often evolve toward pub/sub or streaming-based approaches for better scalability and decoupling.

Good architecture is rarely about choosing the “perfect” solution on day one.
It’s about choosing a design that can evolve safely as the system grows.


Check out & Support on: Hashnode Medium