Business Automation
n8nPostgresAutomationIntegrationsOutbox PatternReliability Engineering

Reliable Integrations with n8n and Postgres: Queue Tables, the Outbox Pattern, and Exactly-Once-ish Delivery

AO
Adrijan Omićević
·17 min read

# What You’ll Build and Why It Matters#

If your product emits events and n8n turns them into actions, your reliability is only as strong as the bridge between the two. Directly calling n8n webhooks from application code often fails in ways that are hard to recover from: timeouts, n8n downtime, transient network issues, and duplicate deliveries.

This guide shows a practical pattern for reliable integrations using Postgres as an outbox and queue, with n8n as the worker. The target is exactly-once-ish outcomes: deliver each event at least once, but ensure side effects happen once via idempotency and deduplication.

You’ll implement:

  • An outbox table that stores events in the same database transaction as your business change.
  • A queue/claiming mechanism so n8n can safely process events concurrently.
  • Retry semantics, backoff, and a dead-letter flow for poison messages.
  • Monitoring queries and operational guardrails for production.

For additional patterns and examples with managed Postgres stacks, also see n8n + Supabase/Postgres automation patterns. For failure handling inside n8n, refer to n8n error handling, retries, and alerting.

# Architecture Overview: Postgres as the Source of Truth#

The core idea: your app writes events to Postgres first, then n8n reads and processes them. That makes Postgres the durable boundary.

A reliable shape looks like this:

  1. 1
    App performs business transaction, and in the same transaction writes an outbox row.
  2. 2
    n8n periodically polls (or is triggered) to claim pending outbox rows.
  3. 3
    n8n processes each event, writes results, then marks the outbox row as done.
  4. 4
    If processing fails, the row remains pending and is retried with backoff until it succeeds or is dead-lettered.

Why this works:

  • Postgres already provides durable storage and transactional guarantees.
  • You get replayability by keeping events until they are confirmed processed.
  • You can reason about reliability with measurable queue depth and event age.

🎯 Key Takeaway: Persist intent in Postgres before contacting anything external. Reliability starts with a durable write you control.

# Prerequisites and When This Pattern Fits#

This pattern fits best when:

  • You already use Postgres as your primary DB, or you can provision a dedicated Postgres for integration events.
  • Your workflows can tolerate seconds-level latency if polling is used.
  • You need strong auditability and the ability to replay events.

It may not fit when:

  • You require sub-100 millisecond latency end-to-end.
  • You cannot tolerate polling load and you do not control database access patterns.
  • Your event volume is high enough that a streaming platform is more appropriate.

Practical thresholds:

  • Polling-based outboxes commonly handle hundreds to low thousands of events per minute on a modest Postgres instance when indexed well and claimed in batches.
  • If you regularly exceed tens of thousands of events per minute, consider a dedicated queue or log-based pipeline. You can still keep an outbox as a source of truth, but consume it with a specialized worker.

# Data Model: Outbox and Optional Delivery Tracking#

You need two concepts:

  • Outbox: what should be processed.
  • Idempotency and deduplication keys: how to ensure exactly-once-ish outcomes.

A practical schema for many products is one table. For stricter idempotency, add a second table for consumers, but many teams do fine with one outbox plus idempotent downstream writes.

Outbox Table Schema#

The outbox stores event metadata, payload, and processing state. Use jsonb for flexibility, but keep important fields indexed.

SQL
create table if not exists integration_outbox (
  id bigserial primary key,
  event_type text not null,
  aggregate_type text not null,
  aggregate_id text not null,
  event_key text not null, -- deterministic idempotency key
  payload jsonb not null,
 
  status text not null default 'pending', -- pending, processing, done, failed
  attempts int not null default 0,
  available_at timestamptz not null default now(),
  locked_at timestamptz,
  locked_by text,
 
  last_error text,
  created_at timestamptz not null default now(),
  processed_at timestamptz
);
 
create unique index if not exists integration_outbox_event_key_uq
  on integration_outbox (event_key);
 
create index if not exists integration_outbox_pending_idx
  on integration_outbox (status, available_at, created_at);
 
create index if not exists integration_outbox_aggregate_idx
  on integration_outbox (aggregate_type, aggregate_id, created_at);

Design notes:

  • event_key is the heart of exactly-once-ish behavior. Make it deterministic, not random.
  • available_at supports backoff without needing a separate scheduler.
  • locked_at and locked_by support safe claiming and concurrent workers.

ℹ️ Note: status = processing is optional if you use a single SQL statement that both claims and returns rows. Keeping it explicit helps with debugging and dashboards.

What Should event_key Be?#

Use a stable key that represents the side effect you want to happen once. Examples:

  • order.paid:orderId:paymentId
  • invoice.sent:invoiceId:version
  • crm.upsert:customerId:updatedAtEpoch

If you use a random UUID, you can still do at-least-once delivery, but you lose deduplication for duplicate publishes of the same intent.

Schema Options and Tradeoffs#

OptionProsConsWhen to use
Single outbox table with event_key uniqueSimple, easy to operate, works for most teamsRequires careful event_key designDefault choice
Outbox + separate integration_deliveries per target systemPer-consumer tracking, easier multi-sink routingMore schema and logicMany workflows consuming the same event
Outbox + partitioning by timeFaster pruning, better performance at scaleMore operational workLarge volumes, long retention

# Writing to the Outbox: Transactional Publishing#

The point of the outbox is atomicity: business state and event intent must commit together.

A simplified approach:

  1. 1
    Update business rows.
  2. 2
    Insert outbox row with deterministic event_key.
  3. 3
    Commit.

Here is a Postgres example where paying an order emits an event:

SQL
begin;
 
update orders
set status = 'paid', paid_at = now()
where id = $1 and status = 'pending';
 
insert into integration_outbox (
  event_type, aggregate_type, aggregate_id, event_key, payload
) values (
  'order.paid',
  'order',
  $1::text,
  'order.paid:' || $1::text,
  jsonb_build_object(
    'orderId', $1,
    'paidAt', now()
  )
)
on conflict (event_key) do nothing;
 
commit;

This pattern prevents duplicates from repeated calls or retries at the application layer. The on conflict do nothing makes the publishing path idempotent.

⚠️ Warning: Avoid writing to the outbox in a separate transaction after the business update. If the second transaction fails, you have paid orders with no event, and you will be forced into manual reconciliation.

# Consuming the Outbox with n8n: Poll, Claim, Process, Ack#

n8n is not a message broker. Treat it as a workflow engine and worker, and let Postgres play the role of queue.

A robust consumer loop has these steps:

  1. 1
    Poll for available events.
  2. 2
    Claim a batch using row-level locking.
  3. 3
    Process each event.
  4. 4
    On success, mark done.
  5. 5
    On failure, schedule retry with backoff, track error, and possibly dead-letter.

Polling Trigger: Why It’s the Default#

Polling is boring, and that is why it’s reliable.

  • If n8n is down, Postgres keeps events.
  • If a workflow crashes mid-run, events remain pending or become claimable again.
  • You can scale by running multiple n8n workers, each claiming different rows.

Latency: with a 10-second poll interval, typical end-to-end latency is 10 to 20 seconds for most events. For many business workflows like CRM sync, invoicing, and notifications, that is acceptable.

Claiming Rows Safely with FOR UPDATE SKIP LOCKED#

This is the key to concurrency.

A common approach is a single SQL statement that:

  • selects pending, available rows
  • locks them so other workers skip them
  • updates lock metadata
  • returns the claimed rows
SQL
with cte as (
  select id
  from integration_outbox
  where status = 'pending'
    and available_at <= now()
  order by created_at
  limit 50
  for update skip locked
)
update integration_outbox o
set status = 'processing',
    locked_at = now(),
    locked_by = $1
from cte
where o.id = cte.id
returning o.*;

In n8n, you typically execute this with a Postgres node and then iterate the returned rows.

Operational advice:

  • Start with limit 50 and adjust based on average processing time.
  • Keep one workflow run short. Large batches can exceed n8n execution timeouts or memory limits.

💡 Tip: Set locked_by to the n8n instance name plus workflow ID. It makes stuck locks diagnosable without guessing which worker claimed the row.

Processing and Acknowledging Success#

After you process an event, update the outbox row:

SQL
update integration_outbox
set status = 'done',
    processed_at = now(),
    locked_at = null,
    locked_by = null,
    last_error = null
where id = $1;

If your workflow triggers side effects in third-party systems, store idempotency keys there too where possible. For example, Stripe, Slack, and many CRMs support external IDs or idempotency headers.

# Retry Semantics: Backoff, Jitter, and Dead Letters#

Retries are where most “reliable integration” designs fail. They either retry too aggressively and overload dependencies, or they hide failures until customers complain.

A practical retry policy:

  • Max attempts: 10
  • Backoff: exponential with a cap, plus jitter
  • Dead-letter: after max attempts, mark as failed and alert

A simple backoff schedule:

AttemptDelay
130 seconds
22 minutes
35 minutes
415 minutes
530 minutes
6+60 minutes

You can implement this using available_at.

Marking Failure and Scheduling the Next Retry#

SQL
update integration_outbox
set status = 'pending',
    attempts = attempts + 1,
    available_at = now() + ($2::int || ' seconds')::interval,
    last_error = left($3, 2000),
    locked_at = null,
    locked_by = null
where id = $1;

Where $2 is a delay in seconds computed in n8n, based on attempts.

Dead-Lettering Poison Messages#

When attempts reaches a threshold, mark as failed:

SQL
update integration_outbox
set status = 'failed',
    processed_at = now(),
    locked_at = null,
    locked_by = null,
    last_error = left($2, 2000)
where id = $1;

You can later replay failed events by setting them back to pending after fixing the root cause.

For a deeper n8n-focused error strategy, including alert routing and failure workflows, use n8n error handling, retries, and alerting.

# Polling vs Webhooks: Tradeoffs and Hybrid Designs#

Teams often start with “just call an n8n webhook”. Webhooks are fine, but only if you still persist events, otherwise you are coupling business correctness to webhook delivery.

Decision Matrix#

ApproachReliabilityLatencyComplexityBest for
Direct app to n8n webhook onlyLowVery lowLowNon-critical notifications
Outbox + polling consumerHighMediumMediumMost business-critical automations
Outbox + webhook nudge + polling fallbackVery highLowHigherCritical + low-latency needs

Hybrid: Webhook Nudge with Postgres as the Source of Truth#

Use Postgres outbox as the durable record, and a webhook just to wake up n8n quickly.

Flow:

  1. 1
    App writes outbox row in transaction.
  2. 2
    App calls n8n webhook with event_key only.
  3. 3
    n8n webhook triggers a workflow that immediately claims from Postgres.

If the webhook call fails, the polling loop still picks up the event.

This gives:

  • low latency when everything is healthy
  • durable recovery when it is not

⚠️ Warning: Do not pass the full payload via webhook as the primary source of truth. Keep the payload in Postgres and have n8n read it from there, otherwise your “replay” becomes impossible.

# Exactly-Once-ish Delivery: What You Can and Cannot Guarantee#

Strict exactly-once delivery requires coordinated transactions across systems, which you do not have when n8n calls third-party APIs. What you can achieve is:

  • at-least-once delivery from Postgres to n8n
  • exactly-once effects via idempotency and deduplication

Where Duplicates Come From#

Duplicates usually happen because:

  • n8n times out after calling a third-party API, so the run is marked failed, then retried
  • worker crashes mid-run
  • network errors cause ambiguous outcomes
  • the same business intent is published twice due to application retries

Your defenses:

  1. 1
    Unique event_key in the outbox prevents duplicate intent.
  2. 2
    Idempotent operations in the destination system prevent repeated side effects.
  3. 3
    A local “idempotency table” can prevent repeats when the destination lacks idempotency.

Optional: Local Idempotency Ledger#

If a destination system does not support idempotency, store a ledger row keyed by event_key and the target action. Use a unique constraint to enforce exactly-once-ish behavior even across retries.

SQL
create table if not exists integration_idempotency (
  id bigserial primary key,
  event_key text not null,
  action text not null,
  created_at timestamptz not null default now()
);
 
create unique index if not exists integration_idempotency_uq
  on integration_idempotency (event_key, action);

In n8n, before performing the external side effect, attempt to insert. If it conflicts, skip.

# Building the n8n Workflow: Practical Node-Level Guidance#

You typically implement this as two workflows:

  1. 1
    Poller workflow that claims events and processes them.
  2. 2
    Alert workflow that monitors failures and backlog.

Workflow 1: Poll and Process#

Recommended nodes and steps:

  1. 1
    Schedule Trigger every 10 to 30 seconds.
  2. 2
    Postgres node: claim batch with the CTE update returning rows.
  3. 3
    Split in Batches: process each row.
  4. 4
    Switch on event_type: route to handlers.
  5. 5
    For each handler:
    • call external API
    • write any audit rows if needed
    • mark outbox row done
  6. 6
    Error path:
    • compute backoff delay from attempts
    • update outbox row with retry or failed status

Keep each handler idempotent. If you cannot make the third-party call idempotent, add the local idempotency ledger.

Backoff Computation Example#

This is a compact function you can implement in an n8n Code node. Keep it short and deterministic.

JavaScript
const attempts = $json.attempts || 0;
 
const schedule = [30, 120, 300, 900, 1800, 3600];
const base = schedule[Math.min(attempts, schedule.length - 1)];
 
// jitter between 0 and 10 percent
const jitter = Math.floor(base * (Math.random() * 0.1));
 
return [{ delaySeconds: base + jitter }];

Use delaySeconds as the $2 parameter to the retry update query.

# Monitoring and Alerting: What to Measure in Postgres#

If you cannot see backlog growth and stuck rows, you do not have reliable integrations. Monitoring is part of the design, not an add-on.

Core Metrics#

Track these four:

  1. 1
    Queue depth: count of pending rows.
  2. 2
    Oldest pending age: how long the oldest pending has been waiting.
  3. 3
    Failure rate: failed rows per hour.
  4. 4
    Retry pressure: attempts distribution.

These map to actionable alerts.

Monitoring Queries#

Queue depth:

SQL
select count(*) as pending_count
from integration_outbox
where status = 'pending'
  and available_at <= now();

Oldest pending age:

SQL
select now() - min(created_at) as oldest_pending_age
from integration_outbox
where status = 'pending';

Failures last 24 hours by event type:

SQL
select event_type, count(*) as failed_count
from integration_outbox
where status = 'failed'
  and created_at >= now() - interval '24 hours'
group by event_type
order by failed_count desc;

Stuck processing locks, for example older than 15 minutes:

SQL
select id, event_type, locked_by, locked_at, attempts
from integration_outbox
where status = 'processing'
  and locked_at < now() - interval '15 minutes'
order by locked_at asc;

Operational SLOs You Can Actually Use#

Pick thresholds that match your business impact. Examples:

  • Alert if oldest_pending_age is greater than 5 minutes for customer-facing notifications.
  • Alert if pending backlog is greater than 1000 for more than 10 minutes.
  • Alert if any failed rows exist for payment-related event_type.

💡 Tip: Define an “automation latency SLO” like P95 time from created_at to processed_at less than 2 minutes. Then you can optimize based on a number, not gut feeling.

# Operational Guidance: Lock Recovery, Retention, and Performance#

Recovering Stuck Rows#

If n8n crashes, rows may stay processing with old locked_at. Add a periodic “reaper” step, either inside the poller workflow or as a separate job:

  • find rows in processing where locked_at is older than a timeout
  • set them back to pending with available_at = now()
SQL
update integration_outbox
set status = 'pending',
    locked_at = null,
    locked_by = null,
    available_at = now()
where status = 'processing'
  and locked_at < now() - interval '15 minutes';

This turns crashes into retry events.

Retention and Pruning#

If you keep every row forever, the table will grow and indexes will slow. Most teams keep:

  • done rows: 7 to 30 days
  • failed rows: 30 to 90 days, depending on audit needs

Prune in small batches to avoid heavy locks:

SQL
delete from integration_outbox
where status = 'done'
  and processed_at < now() - interval '30 days'
limit 5000;

If your Postgres does not allow delete ... limit, delete via a subquery selecting IDs.

Indexing and Hotspots#

The most important index is the pending selector. This guide uses:

  • (status, available_at, created_at) to support “pending and due, oldest first”
  • unique index on event_key

If you see contention, common fixes:

  • reduce batch size
  • run more frequent polls with smaller batches
  • avoid heavy payload reads if not needed by selecting only required columns for routing

Scaling n8n Consumers#

This pattern scales horizontally:

  • run multiple n8n instances
  • each claims rows using skip locked
  • no coordination needed

Be realistic about concurrency:

  • If one event handler calls a third-party API with a rate limit of 10 requests per second, your maximum safe throughput is bounded there, not in Postgres.
  • Use event-type-specific throttling in n8n when needed.

For production automation services and implementation help, see Samioda automation.

# Common Pitfalls and How to Avoid Them#

Pitfall 1: Treating n8n as a Queue#

n8n stores execution history, not a durable event log for your domain. If you rely on n8n to hold the only copy of the event, you lose replayability and you increase coupling.

Avoid it by storing events in Postgres first, always.

Pitfall 2: No Idempotency in Downstream Systems#

At-least-once delivery without idempotency equals duplicated side effects. Typical symptoms are duplicate invoices, duplicate Slack messages, or repeated CRM updates that flip fields unexpectedly.

Avoid it with event_key, destination idempotency keys, and a local idempotency ledger when required.

Pitfall 3: Retrying Everything the Same Way#

Not all failures are transient. For example, “invalid API key” will never succeed with retries and will just amplify noise.

Avoid it by classifying errors inside n8n:

  • transient: retry with backoff
  • permanent: mark failed immediately and alert

Pitfall 4: No Monitoring Until It Breaks#

Teams often add monitoring after the first incident. By then, you have lost time and trust.

Avoid it by shipping at least backlog and oldest pending alerts on day one.

# Key Takeaways#

  • Use a Postgres outbox table written in the same transaction as your business change to prevent lost events.
  • Claim events in batches with FOR UPDATE SKIP LOCKED to enable safe parallel n8n workers without double-processing.
  • Aim for exactly-once-ish outcomes by combining at-least-once delivery with deterministic event_key deduplication and idempotent downstream actions.
  • Implement retries with backoff using available_at, and send poison messages to a failed state with clear alerting.
  • Monitor queue depth, oldest pending age, stuck locks, and failure counts, and set SLO-driven alerts before going live.

# Conclusion#

Reliable integrations are rarely about one clever webhook. They are about designing a durable boundary, replayability, and operational visibility. Postgres gives you transactional guarantees and observability, and n8n gives you flexible workflow execution. Together, with an outbox and queue semantics, you can ship integrations that survive downtime, retries, and scaling.

If you want this pattern implemented end-to-end, including event modeling, retry policies, dashboards, and production hardening, Samioda can help you design and build it with n8n, Postgres, and your stack. Start here: https://samioda.com/en/automation.

FAQ

Share
A
Adrijan OmićevićFounder & Senior Developer

Founder & Senior Developer at Samioda. 8+ years building React, Next.js, Flutter and n8n automation solutions for clients across Europe.

Need help with your project?

We build custom solutions using the technologies discussed in this article. Senior team, fixed prices.