# What You’ll Build and Why It Matters#
If your product emits events and n8n turns them into actions, your reliability is only as strong as the bridge between the two. Directly calling n8n webhooks from application code often fails in ways that are hard to recover from: timeouts, n8n downtime, transient network issues, and duplicate deliveries.
This guide shows a practical pattern for reliable integrations using Postgres as an outbox and queue, with n8n as the worker. The target is exactly-once-ish outcomes: deliver each event at least once, but ensure side effects happen once via idempotency and deduplication.
You’ll implement:
- An outbox table that stores events in the same database transaction as your business change.
- A queue/claiming mechanism so n8n can safely process events concurrently.
- Retry semantics, backoff, and a dead-letter flow for poison messages.
- Monitoring queries and operational guardrails for production.
For additional patterns and examples with managed Postgres stacks, also see n8n + Supabase/Postgres automation patterns. For failure handling inside n8n, refer to n8n error handling, retries, and alerting.
# Architecture Overview: Postgres as the Source of Truth#
The core idea: your app writes events to Postgres first, then n8n reads and processes them. That makes Postgres the durable boundary.
A reliable shape looks like this:
- 1App performs business transaction, and in the same transaction writes an outbox row.
- 2n8n periodically polls (or is triggered) to claim pending outbox rows.
- 3n8n processes each event, writes results, then marks the outbox row as done.
- 4If processing fails, the row remains pending and is retried with backoff until it succeeds or is dead-lettered.
Why this works:
- Postgres already provides durable storage and transactional guarantees.
- You get replayability by keeping events until they are confirmed processed.
- You can reason about reliability with measurable queue depth and event age.
🎯 Key Takeaway: Persist intent in Postgres before contacting anything external. Reliability starts with a durable write you control.
# Prerequisites and When This Pattern Fits#
This pattern fits best when:
- You already use Postgres as your primary DB, or you can provision a dedicated Postgres for integration events.
- Your workflows can tolerate seconds-level latency if polling is used.
- You need strong auditability and the ability to replay events.
It may not fit when:
- You require sub-100 millisecond latency end-to-end.
- You cannot tolerate polling load and you do not control database access patterns.
- Your event volume is high enough that a streaming platform is more appropriate.
Practical thresholds:
- Polling-based outboxes commonly handle hundreds to low thousands of events per minute on a modest Postgres instance when indexed well and claimed in batches.
- If you regularly exceed tens of thousands of events per minute, consider a dedicated queue or log-based pipeline. You can still keep an outbox as a source of truth, but consume it with a specialized worker.
# Data Model: Outbox and Optional Delivery Tracking#
You need two concepts:
- Outbox: what should be processed.
- Idempotency and deduplication keys: how to ensure exactly-once-ish outcomes.
A practical schema for many products is one table. For stricter idempotency, add a second table for consumers, but many teams do fine with one outbox plus idempotent downstream writes.
Outbox Table Schema#
The outbox stores event metadata, payload, and processing state. Use jsonb for flexibility, but keep important fields indexed.
create table if not exists integration_outbox (
id bigserial primary key,
event_type text not null,
aggregate_type text not null,
aggregate_id text not null,
event_key text not null, -- deterministic idempotency key
payload jsonb not null,
status text not null default 'pending', -- pending, processing, done, failed
attempts int not null default 0,
available_at timestamptz not null default now(),
locked_at timestamptz,
locked_by text,
last_error text,
created_at timestamptz not null default now(),
processed_at timestamptz
);
create unique index if not exists integration_outbox_event_key_uq
on integration_outbox (event_key);
create index if not exists integration_outbox_pending_idx
on integration_outbox (status, available_at, created_at);
create index if not exists integration_outbox_aggregate_idx
on integration_outbox (aggregate_type, aggregate_id, created_at);Design notes:
event_keyis the heart of exactly-once-ish behavior. Make it deterministic, not random.available_atsupports backoff without needing a separate scheduler.locked_atandlocked_bysupport safe claiming and concurrent workers.
ℹ️ Note:
status = processingis optional if you use a single SQL statement that both claims and returns rows. Keeping it explicit helps with debugging and dashboards.
What Should event_key Be?#
Use a stable key that represents the side effect you want to happen once. Examples:
order.paid:orderId:paymentIdinvoice.sent:invoiceId:versioncrm.upsert:customerId:updatedAtEpoch
If you use a random UUID, you can still do at-least-once delivery, but you lose deduplication for duplicate publishes of the same intent.
Schema Options and Tradeoffs#
| Option | Pros | Cons | When to use |
|---|---|---|---|
Single outbox table with event_key unique | Simple, easy to operate, works for most teams | Requires careful event_key design | Default choice |
Outbox + separate integration_deliveries per target system | Per-consumer tracking, easier multi-sink routing | More schema and logic | Many workflows consuming the same event |
| Outbox + partitioning by time | Faster pruning, better performance at scale | More operational work | Large volumes, long retention |
# Writing to the Outbox: Transactional Publishing#
The point of the outbox is atomicity: business state and event intent must commit together.
A simplified approach:
- 1Update business rows.
- 2Insert outbox row with deterministic
event_key. - 3Commit.
Here is a Postgres example where paying an order emits an event:
begin;
update orders
set status = 'paid', paid_at = now()
where id = $1 and status = 'pending';
insert into integration_outbox (
event_type, aggregate_type, aggregate_id, event_key, payload
) values (
'order.paid',
'order',
$1::text,
'order.paid:' || $1::text,
jsonb_build_object(
'orderId', $1,
'paidAt', now()
)
)
on conflict (event_key) do nothing;
commit;This pattern prevents duplicates from repeated calls or retries at the application layer. The on conflict do nothing makes the publishing path idempotent.
⚠️ Warning: Avoid writing to the outbox in a separate transaction after the business update. If the second transaction fails, you have paid orders with no event, and you will be forced into manual reconciliation.
# Consuming the Outbox with n8n: Poll, Claim, Process, Ack#
n8n is not a message broker. Treat it as a workflow engine and worker, and let Postgres play the role of queue.
A robust consumer loop has these steps:
- 1Poll for available events.
- 2Claim a batch using row-level locking.
- 3Process each event.
- 4On success, mark done.
- 5On failure, schedule retry with backoff, track error, and possibly dead-letter.
Polling Trigger: Why It’s the Default#
Polling is boring, and that is why it’s reliable.
- If n8n is down, Postgres keeps events.
- If a workflow crashes mid-run, events remain pending or become claimable again.
- You can scale by running multiple n8n workers, each claiming different rows.
Latency: with a 10-second poll interval, typical end-to-end latency is 10 to 20 seconds for most events. For many business workflows like CRM sync, invoicing, and notifications, that is acceptable.
Claiming Rows Safely with FOR UPDATE SKIP LOCKED#
This is the key to concurrency.
A common approach is a single SQL statement that:
- selects pending, available rows
- locks them so other workers skip them
- updates lock metadata
- returns the claimed rows
with cte as (
select id
from integration_outbox
where status = 'pending'
and available_at <= now()
order by created_at
limit 50
for update skip locked
)
update integration_outbox o
set status = 'processing',
locked_at = now(),
locked_by = $1
from cte
where o.id = cte.id
returning o.*;In n8n, you typically execute this with a Postgres node and then iterate the returned rows.
Operational advice:
- Start with
limit 50and adjust based on average processing time. - Keep one workflow run short. Large batches can exceed n8n execution timeouts or memory limits.
💡 Tip: Set
locked_byto the n8n instance name plus workflow ID. It makes stuck locks diagnosable without guessing which worker claimed the row.
Processing and Acknowledging Success#
After you process an event, update the outbox row:
update integration_outbox
set status = 'done',
processed_at = now(),
locked_at = null,
locked_by = null,
last_error = null
where id = $1;If your workflow triggers side effects in third-party systems, store idempotency keys there too where possible. For example, Stripe, Slack, and many CRMs support external IDs or idempotency headers.
# Retry Semantics: Backoff, Jitter, and Dead Letters#
Retries are where most “reliable integration” designs fail. They either retry too aggressively and overload dependencies, or they hide failures until customers complain.
A practical retry policy:
- Max attempts: 10
- Backoff: exponential with a cap, plus jitter
- Dead-letter: after max attempts, mark as failed and alert
A simple backoff schedule:
| Attempt | Delay |
|---|---|
| 1 | 30 seconds |
| 2 | 2 minutes |
| 3 | 5 minutes |
| 4 | 15 minutes |
| 5 | 30 minutes |
| 6+ | 60 minutes |
You can implement this using available_at.
Marking Failure and Scheduling the Next Retry#
update integration_outbox
set status = 'pending',
attempts = attempts + 1,
available_at = now() + ($2::int || ' seconds')::interval,
last_error = left($3, 2000),
locked_at = null,
locked_by = null
where id = $1;Where $2 is a delay in seconds computed in n8n, based on attempts.
Dead-Lettering Poison Messages#
When attempts reaches a threshold, mark as failed:
update integration_outbox
set status = 'failed',
processed_at = now(),
locked_at = null,
locked_by = null,
last_error = left($2, 2000)
where id = $1;You can later replay failed events by setting them back to pending after fixing the root cause.
For a deeper n8n-focused error strategy, including alert routing and failure workflows, use n8n error handling, retries, and alerting.
# Polling vs Webhooks: Tradeoffs and Hybrid Designs#
Teams often start with “just call an n8n webhook”. Webhooks are fine, but only if you still persist events, otherwise you are coupling business correctness to webhook delivery.
Decision Matrix#
| Approach | Reliability | Latency | Complexity | Best for |
|---|---|---|---|---|
| Direct app to n8n webhook only | Low | Very low | Low | Non-critical notifications |
| Outbox + polling consumer | High | Medium | Medium | Most business-critical automations |
| Outbox + webhook nudge + polling fallback | Very high | Low | Higher | Critical + low-latency needs |
Hybrid: Webhook Nudge with Postgres as the Source of Truth#
Use Postgres outbox as the durable record, and a webhook just to wake up n8n quickly.
Flow:
- 1App writes outbox row in transaction.
- 2App calls n8n webhook with
event_keyonly. - 3n8n webhook triggers a workflow that immediately claims from Postgres.
If the webhook call fails, the polling loop still picks up the event.
This gives:
- low latency when everything is healthy
- durable recovery when it is not
⚠️ Warning: Do not pass the full payload via webhook as the primary source of truth. Keep the payload in Postgres and have n8n read it from there, otherwise your “replay” becomes impossible.
# Exactly-Once-ish Delivery: What You Can and Cannot Guarantee#
Strict exactly-once delivery requires coordinated transactions across systems, which you do not have when n8n calls third-party APIs. What you can achieve is:
- at-least-once delivery from Postgres to n8n
- exactly-once effects via idempotency and deduplication
Where Duplicates Come From#
Duplicates usually happen because:
- n8n times out after calling a third-party API, so the run is marked failed, then retried
- worker crashes mid-run
- network errors cause ambiguous outcomes
- the same business intent is published twice due to application retries
Your defenses:
- 1Unique
event_keyin the outbox prevents duplicate intent. - 2Idempotent operations in the destination system prevent repeated side effects.
- 3A local “idempotency table” can prevent repeats when the destination lacks idempotency.
Optional: Local Idempotency Ledger#
If a destination system does not support idempotency, store a ledger row keyed by event_key and the target action. Use a unique constraint to enforce exactly-once-ish behavior even across retries.
create table if not exists integration_idempotency (
id bigserial primary key,
event_key text not null,
action text not null,
created_at timestamptz not null default now()
);
create unique index if not exists integration_idempotency_uq
on integration_idempotency (event_key, action);In n8n, before performing the external side effect, attempt to insert. If it conflicts, skip.
# Building the n8n Workflow: Practical Node-Level Guidance#
You typically implement this as two workflows:
- 1Poller workflow that claims events and processes them.
- 2Alert workflow that monitors failures and backlog.
Workflow 1: Poll and Process#
Recommended nodes and steps:
- 1Schedule Trigger every 10 to 30 seconds.
- 2Postgres node: claim batch with the CTE update returning rows.
- 3Split in Batches: process each row.
- 4Switch on
event_type: route to handlers. - 5For each handler:
- call external API
- write any audit rows if needed
- mark outbox row done
- 6Error path:
- compute backoff delay from
attempts - update outbox row with retry or failed status
Keep each handler idempotent. If you cannot make the third-party call idempotent, add the local idempotency ledger.
Backoff Computation Example#
This is a compact function you can implement in an n8n Code node. Keep it short and deterministic.
const attempts = $json.attempts || 0;
const schedule = [30, 120, 300, 900, 1800, 3600];
const base = schedule[Math.min(attempts, schedule.length - 1)];
// jitter between 0 and 10 percent
const jitter = Math.floor(base * (Math.random() * 0.1));
return [{ delaySeconds: base + jitter }];Use delaySeconds as the $2 parameter to the retry update query.
# Monitoring and Alerting: What to Measure in Postgres#
If you cannot see backlog growth and stuck rows, you do not have reliable integrations. Monitoring is part of the design, not an add-on.
Core Metrics#
Track these four:
- 1Queue depth: count of pending rows.
- 2Oldest pending age: how long the oldest pending has been waiting.
- 3Failure rate: failed rows per hour.
- 4Retry pressure: attempts distribution.
These map to actionable alerts.
Monitoring Queries#
Queue depth:
select count(*) as pending_count
from integration_outbox
where status = 'pending'
and available_at <= now();Oldest pending age:
select now() - min(created_at) as oldest_pending_age
from integration_outbox
where status = 'pending';Failures last 24 hours by event type:
select event_type, count(*) as failed_count
from integration_outbox
where status = 'failed'
and created_at >= now() - interval '24 hours'
group by event_type
order by failed_count desc;Stuck processing locks, for example older than 15 minutes:
select id, event_type, locked_by, locked_at, attempts
from integration_outbox
where status = 'processing'
and locked_at < now() - interval '15 minutes'
order by locked_at asc;Operational SLOs You Can Actually Use#
Pick thresholds that match your business impact. Examples:
- Alert if
oldest_pending_ageis greater than 5 minutes for customer-facing notifications. - Alert if pending backlog is greater than 1000 for more than 10 minutes.
- Alert if any failed rows exist for payment-related
event_type.
💡 Tip: Define an “automation latency SLO” like
P95 time from created_at to processed_at less than 2 minutes. Then you can optimize based on a number, not gut feeling.
# Operational Guidance: Lock Recovery, Retention, and Performance#
Recovering Stuck Rows#
If n8n crashes, rows may stay processing with old locked_at. Add a periodic “reaper” step, either inside the poller workflow or as a separate job:
- find rows in processing where
locked_atis older than a timeout - set them back to pending with
available_at = now()
update integration_outbox
set status = 'pending',
locked_at = null,
locked_by = null,
available_at = now()
where status = 'processing'
and locked_at < now() - interval '15 minutes';This turns crashes into retry events.
Retention and Pruning#
If you keep every row forever, the table will grow and indexes will slow. Most teams keep:
- done rows: 7 to 30 days
- failed rows: 30 to 90 days, depending on audit needs
Prune in small batches to avoid heavy locks:
delete from integration_outbox
where status = 'done'
and processed_at < now() - interval '30 days'
limit 5000;If your Postgres does not allow delete ... limit, delete via a subquery selecting IDs.
Indexing and Hotspots#
The most important index is the pending selector. This guide uses:
(status, available_at, created_at)to support “pending and due, oldest first”- unique index on
event_key
If you see contention, common fixes:
- reduce batch size
- run more frequent polls with smaller batches
- avoid heavy payload reads if not needed by selecting only required columns for routing
Scaling n8n Consumers#
This pattern scales horizontally:
- run multiple n8n instances
- each claims rows using
skip locked - no coordination needed
Be realistic about concurrency:
- If one event handler calls a third-party API with a rate limit of 10 requests per second, your maximum safe throughput is bounded there, not in Postgres.
- Use event-type-specific throttling in n8n when needed.
For production automation services and implementation help, see Samioda automation.
# Common Pitfalls and How to Avoid Them#
Pitfall 1: Treating n8n as a Queue#
n8n stores execution history, not a durable event log for your domain. If you rely on n8n to hold the only copy of the event, you lose replayability and you increase coupling.
Avoid it by storing events in Postgres first, always.
Pitfall 2: No Idempotency in Downstream Systems#
At-least-once delivery without idempotency equals duplicated side effects. Typical symptoms are duplicate invoices, duplicate Slack messages, or repeated CRM updates that flip fields unexpectedly.
Avoid it with event_key, destination idempotency keys, and a local idempotency ledger when required.
Pitfall 3: Retrying Everything the Same Way#
Not all failures are transient. For example, “invalid API key” will never succeed with retries and will just amplify noise.
Avoid it by classifying errors inside n8n:
- transient: retry with backoff
- permanent: mark failed immediately and alert
Pitfall 4: No Monitoring Until It Breaks#
Teams often add monitoring after the first incident. By then, you have lost time and trust.
Avoid it by shipping at least backlog and oldest pending alerts on day one.
# Key Takeaways#
- Use a Postgres outbox table written in the same transaction as your business change to prevent lost events.
- Claim events in batches with
FOR UPDATE SKIP LOCKEDto enable safe parallel n8n workers without double-processing. - Aim for exactly-once-ish outcomes by combining at-least-once delivery with deterministic
event_keydeduplication and idempotent downstream actions. - Implement retries with backoff using
available_at, and send poison messages to a failed state with clear alerting. - Monitor queue depth, oldest pending age, stuck locks, and failure counts, and set SLO-driven alerts before going live.
# Conclusion#
Reliable integrations are rarely about one clever webhook. They are about designing a durable boundary, replayability, and operational visibility. Postgres gives you transactional guarantees and observability, and n8n gives you flexible workflow execution. Together, with an outbox and queue semantics, you can ship integrations that survive downtime, retries, and scaling.
If you want this pattern implemented end-to-end, including event modeling, retry policies, dashboards, and production hardening, Samioda can help you design and build it with n8n, Postgres, and your stack. Start here: https://samioda.com/en/automation.
FAQ
Founder & Senior Developer at Samioda. 8+ years building React, Next.js, Flutter and n8n automation solutions for clients across Europe.
More in Business Automation
All →Idempotent n8n Workflows: Concurrency, Locking, and Preventing Duplicate Side Effects
A practical 2026 guide to n8n idempotency under concurrency: why duplicates happen and how to prevent double charges, double emails, and double writes using dedupe keys, DB locks, upserts, and the outbox pattern.
Document Processing Automation with n8n: OCR, Classification, Extraction, and Routing (Production-Ready Guide for 2026)
Build a production-grade n8n document processing automation pipeline for inbound PDFs and images: OCR, classification, field extraction, validation, human review, audit trails, and routing to CRM and accounting tools.
Automated Reporting with n8n: Build Weekly KPI Digests from GA4, Stripe, and Postgres
A practical guide to automated reporting with n8n: pull weekly KPIs from GA4, Stripe, and Postgres, validate data quality, generate a concise narrative summary, and send it to Slack and email with retries and maintainable structure.
Need help with your project?
We build custom solutions using the technologies discussed in this article. Senior team, fixed prices.
Related Articles
Document Processing Automation with n8n: OCR, Classification, Extraction, and Routing (Production-Ready Guide for 2026)
Build a production-grade n8n document processing automation pipeline for inbound PDFs and images: OCR, classification, field extraction, validation, human review, audit trails, and routing to CRM and accounting tools.
n8n + Supabase/Postgres Automation Patterns: Webhooks, RLS-Safe Writes, and Reliable Sync
A practical guide to n8n Supabase Postgres automation patterns: webhook ingestion, idempotency keys, upserts, RLS-safe writes, and reliable two-way sync for SaaS back-office workflows.
10 E-Commerce Automation Workflows That Save Hours Every Week (n8n Examples)
A practical guide to ecommerce automation workflows: 10 proven automations for order processing, inventory alerts, reviews, abandoned carts, support, and analytics — with n8n workflow examples you can copy.