What is the most reliable pattern for an n8n data sync workflow?

Use cursor-based pagination plus incremental loads with a high-watermark timestamp, add idempotency keys for writes, and persist dedup keys to prevent reprocessing across runs.

How do I avoid missing records when using incremental timestamps?

Use an overlap window, track both the last timestamp and a tie-breaker like ID, and only advance the watermark after a successful commit of a batch.

Can n8n handle CDC (Change Data Capture) without a message broker?

Yes. You can implement CDC-like behavior by consuming provider webhooks, reading change logs endpoints, or diffing snapshots, then applying idempotent upserts downstream.

Where should I store deduplication state for n8n syncs?

For production, store dedup keys and watermarks in an external database like Postgres or Redis. Use workflow static data only for low-volume or non-critical syncs.

Reliable Data Sync in n8n: Pagination, Incremental Loads, Deduplication, and CDC | Blog

# What This Guide Covers#

A reliable n8n data sync workflow is not just “fetch data, insert rows”. Production syncs fail due to pagination bugs, timestamp edge cases, duplicate deliveries, and silent drift that only shows up in dashboards weeks later.

This guide shows a robust design for:

Cursor-based pagination that never skips or duplicates pages
Incremental loads with safe high-watermarks and overlap windows
Idempotency keys and upserts for safe re-runs
Deduplication storage that survives restarts
CDC patterns in n8n, including webhook-based and poll-based
Monitoring metrics that detect drift before stakeholders do

If you need baseline API patterns first, read API Integration Guide. For reliability patterns like retries and alerts, see n8n Error Handling, Retries, and Alerting. For metrics and tracing concepts, use Web App Observability Guide.

# Data Sync Reliability Requirements#

Most teams discover reliability requirements after the first incident. Make them explicit up front.

What “Reliable Sync” Means in Practice#

A reliable sync should meet these conditions:

No data loss: every eligible record is eventually synced.
No duplicates: downstream does not double count events or entities.
Idempotent re-runs: you can replay any time window safely.
Deterministic pagination: new records appearing mid-sync do not reshuffle pages.
Detectable drift: you measure divergence between source and target.

Failure Modes You Must Design For#

Failure mode	Symptom	Root cause	Mitigation
Skipped records	Gaps in target	Offset pagination + concurrent inserts	Cursor pagination + stable sort
Duplicate records	Inflation in counts	Retries replay same page	Idempotency keys + upserts
Missed updates	Stale fields	Timestamp watermark moves too far	Overlap window + tie-breaker
Partial loads	Only some pages	Timeout, rate limits, memory	Batch commits + backoff + streaming pages
Silent drift	Slowly diverging totals	API filtering changes, schema changes	Metrics, reconciliation checks, alerts

🎯 Key Takeaway: Reliability is mostly about state and determinism: you need durable watermarks, stable pagination, and idempotent writes so retries become safe.

# Choose the Right Sync Strategy#

Different sources and APIs require different approaches. Pick the simplest strategy that guarantees correctness.

Snapshot, Incremental, or CDC#

Strategy	When to use	Pros	Cons
Full snapshot (replace)	Small datasets, daily refresh	Simple logic	Expensive, high risk of rate limits
Incremental load (poll)	APIs with `updated_at`	Efficient	Timestamp edge cases
CDC (webhook / log)	High-volume changes, low latency	Near-real-time	Requires change feed correctness

A practical hybrid is common: initial snapshot followed by incremental updates, plus webhooks if the provider supports them.

# Cursor-Based Pagination That Does Not Break#

Offset pagination is attractive but dangerous. With offset, if new rows are inserted at the beginning during your sync, your “page 2” shifts and you skip or duplicate records.

Cursor pagination avoids that by using a stable cursor token or last-seen field.

What to Ask of an API#

A robust paginated endpoint ideally supports:

Sorting by a stable field, typically updated_at then id
Returning a next_cursor token, or allowing starting_after style cursors
Consistent results for a given cursor
Explicit page size limits and rate limit headers

If you control the API, implement cursor pagination. If not, you can often emulate it using updated_at and a tie-breaker.

Example Pagination Loop in n8n#

This pattern uses:

A stored cursor state in a database table
Loop: fetch page, process items, update cursor, repeat until no cursor

You can implement the loop with Split In Batches or by re-invoking a subworkflow, but the cleanest approach is “request, then IF has next cursor, set cursor, repeat”.

Data model for cursor state

Field	Type	Meaning
`sync_name`	text	Identifier for this sync, like `crm_contacts`
`cursor`	text	Opaque cursor token or last-seen marker
`updated_at`	timestamp	Last successful advancement time

HTTP Request with cursor

JSON

{
  "method": "GET",
  "url": "https://api.example.com/v1/contacts",
  "qs": {
    "limit": 200,
    "cursor": "={{$json.cursor}}"
  }
}

Process response fields:

data: array of records
next_cursor: token or null

💡 Tip: Always log the cursor and the count processed per page. When debugging drift, you will want to correlate gaps to specific cursors and run IDs.

Deterministic Sorting for Cursor Emulation#

If the API does not return an opaque cursor token, you can build a cursor from fields:

Primary: updated_at
Tie-breaker: id

Cursor becomes a compound marker: last processed updated_at and last processed id for that timestamp.

Avoid using only updated_at. Many systems have second-level precision, so dozens of records can share the same timestamp. Without a tie-breaker, you can loop or skip.

# Incremental Loads with Safe Watermarks#

Incremental loads reduce cost, but the watermark logic must handle late updates, clock skew, and partial failures.

High-Watermark Design#

Track:

watermark_ts: last committed updated_at you fully processed
watermark_id: last committed id at watermark_ts (tie-breaker)
overlap_minutes: re-read a small window to catch late-arriving updates

A common overlap window is 5 to 15 minutes. If the source is eventually consistent or has delayed indexing, use a larger overlap.

Query window

Compute a start time:

start_ts = watermark_ts - overlap

Then request records where:

updated_at is greater than or equal to start_ts

Additionally filter in processing:

Skip records that are strictly before the committed watermark
For equal timestamps, only process id greater than watermark id

Example: Filter Logic in a Code node#

This keeps the watermark safe while allowing overlap.

JavaScript

const wmTs = $json.watermark_ts; // ISO string
const wmId = $json.watermark_id; // string or number
 
return items.filter((item) => {
  const ts = item.json.updated_at;
  const id = String(item.json.id);
 
  if (ts > wmTs) return true;
  if (ts < wmTs) return false;
  return id > String(wmId);
});

When to Advance the Watermark#

Advance the watermark only after:

1
You successfully wrote the processed batch to the destination.
2
You recorded dedup keys, if you rely on them.
3
You can guarantee the run has no partial commits remaining.

If you advance the watermark immediately after fetching, a write failure will cause data loss.

⚠️ Warning: Never store “last seen” based on the last item fetched. Store “last committed” based on the last item successfully written.

# Idempotency Keys and Deduplication#

Retries and overlaps will create duplicates unless you make writes idempotent and track what you already processed.

Idempotency Keys for Writes#

If you write to an API that supports idempotency headers, use them. If you write to a database, use upsert constraints.

Examples:

idempotency_key = source + ":" + entity + ":" + entity_id + ":" + updated_at
For events: include event type and event timestamp.

Make sure the key changes when the record changes. If you only use entity_id, you can accidentally block legitimate updates.

Upsert Patterns#

For a SQL destination, create a unique constraint on the natural key:

Contacts: source_contact_id
Invoices: source_invoice_id
Line items: source_invoice_id + source_line_id

Then use upsert so duplicates become updates.

Example Postgres upsert SQL:

SQL

INSERT INTO crm_contacts (source_id, email, name, updated_at)
VALUES ($1, $2, $3, $4)
ON CONFLICT (source_id)
DO UPDATE SET
  email = EXCLUDED.email,
  name = EXCLUDED.name,
  updated_at = EXCLUDED.updated_at;

In n8n, run this via Postgres node or a generic database node.

Deduplication Storage That Survives Restarts#

Dedup in memory is not enough. n8n can restart, runs can overlap, and retries can happen days later.

Use one of these state stores:

Store	Best for	Pros	Cons
Postgres table	Most production syncs	Durable, queryable, backups	Needs cleanup strategy
Redis set with TTL	High volume event dedup	Fast, built-in expiry	Operational dependency
n8n workflow static data	Small low-risk syncs	Simple	Not safe for high volume, tricky migrations

A practical schema for dedup keys:

Field	Type	Notes
`sync_name`	text	`crm_contacts`
`dedup_key`	text	hashed idempotency key
`first_seen_at`	timestamp	when processed
`expires_at`	timestamp	TTL cleanup

Dedup TTL depends on your overlap window and retry policy. If you overlap 15 minutes and can retry up to 24 hours, keep TTL at least 24 to 72 hours.

ℹ️ Note: If your destination upsert is truly correct, dedup becomes a cost optimization. If your destination cannot upsert cleanly, dedup becomes a correctness requirement.

# CDC in n8n: Practical Patterns#

CDC means you process changes, not snapshots. In n8n you can approximate CDC in three common ways.

Pattern 1: Provider Webhooks#

If the source supports webhooks, this is usually the most reliable low-latency option.

Workflow outline:

1
Webhook trigger receives event.
2
Validate signature and parse payload.
3
Dedup by event id.
4
Fetch full entity if payload is partial.
5
Upsert into destination.
6
Record checkpoint and metrics.

Key requirement: the provider must include a stable event id, delivery id, or sequence.

Pattern 2: Change Log Endpoint#

Some APIs expose endpoints like GET /changes?since=....

This is CDC-like and often more reliable than guessing with updated_at across many endpoints.

Design:

Store since_token or since_ts.
Page through changes with cursor pagination.
For each change, apply idempotent upsert or delete.

Pattern 3: Poll and Diff (Last Resort)#

If there is no updated_at and no change feed, you can periodically fetch a list and diff.

This is expensive and can be brittle, but can work for small datasets.

Use it only when:

Total entities are low enough to fetch fully.
The API has stable IDs.
You can tolerate more load and latency.

# Example Workflow Designs#

These examples are intentionally generic so you can adapt them to HubSpot, Stripe, Shopify, internal CRMs, or custom APIs.

Example A: Incremental Sync with Cursor Pagination and Upsert#

Goal: Sync contacts from an API to Postgres with no misses or duplicates.

Workflow steps:

1
Cron Trigger every 5 minutes.
2
Get state from Postgres sync_state table by sync_name.
3
Set query window: start_ts = watermark_ts - overlap.
4
HTTP Request fetch page with updated_at >= start_ts and cursor.
5
Filter using watermark tie-breaker logic.
6
Upsert into Postgres using source_id unique constraint.
7
Update dedup keys table if needed.
8
Advance cursor or watermark only after successful upsert.
9
Loop until next_cursor is empty.
10
Emit metrics: processed, inserted, updated, duplicates, duration.

Suggested state table:

Field	Example
`sync_name`	`crm_contacts`
`watermark_ts`	`2026-04-28T12:10:00Z`
`watermark_id`	`983244`
`cursor`	`eyJwYWdlIjoyfQ...`

Example B: Webhook CDC with Dedup and Backfill#

Goal: React to changes instantly, but still guarantee eventual consistency.

Workflow steps:

1
Webhook Trigger receives contact.updated events.
2
Verify signature using a shared secret.
3
Dedup check in Postgres dedup_keys on event_id.
4
Fetch entity from API to get full canonical state.
5
Upsert into destination.
6
Store dedup key with TTL.
7
If failed, send to a retry workflow or queue.

Add a daily backfill incremental sync using Example A to catch webhook delivery gaps. This is common because even good providers occasionally drop or delay webhooks.

💡 Tip: Combine CDC with periodic reconciliation. Webhooks optimize latency, incremental polls guarantee completeness.

# Handling Deletes and Hard-to-Detect Changes#

Many syncs look correct until someone deletes data.

Delete Strategies#

Source capability	Best approach	Implementation idea in n8n
Emits delete events	CDC delete handling	Webhook or changes feed triggers soft delete in target
Has `deleted_at`	Incremental with tombstones	Treat as update, set `is_deleted`
No delete signal	Reconciliation	Periodic full ID list compare and mark missing

For many business systems, soft deletes are safer than hard deletes. Keep a is_deleted flag and a deleted_at timestamp in your destination.

# Monitoring Metrics to Catch Drift#

If you do not measure drift, you will discover it from a finance report or a user complaint. Monitoring should be part of the workflow design, not an afterthought.

For observability fundamentals and metric naming, refer to Web App Observability Guide.

Metrics That Matter for Data Sync#

Track these per run and per sync:

Metric	Type	Why it matters	Alert suggestion
`records_fetched`	counter	Detect source API changes	Sudden drop to near zero
`records_processed`	counter	Validate filter logic	Drop or spike compared to baseline
`records_inserted`	counter	Detect growth	Unexpected spikes
`records_updated`	counter	Detect churn	Sustained zero may indicate failure
`duplicates_skipped`	counter	Detect overlap and retries	Spike indicates instability
`run_duration_seconds`	histogram	Performance regression	greater than 2x baseline
`api_429_count`	counter	Rate limiting	Any sustained 429s
`watermark_lag_seconds`	gauge	How far behind you are	Greater than SLA, like 900 seconds
`error_count`	counter	Reliability	Any non-zero with paging failures

If you are using n8n self-hosted, push metrics to Prometheus via a lightweight endpoint, or log structured JSON and derive metrics in your log pipeline.

Drift Detection Checks#

Add a reconciliation workflow daily or weekly:

Compare counts by day: source updates per day vs target updates per day
Compare sample entity hashes: pick 100 random IDs and compare canonical fields
Compare min and max updated_at and the watermark lag

Example reconciliation query idea:

Source: number of contacts updated in last 24 hours
Target: number of rows with updated_at in last 24 hours

If source is 10,000 and target is 8,000, you have drift and need investigation.

⚠️ Warning: Do not rely on “workflow succeeded” as a correctness signal. A workflow can succeed while silently skipping records due to filter or pagination bugs.

# Common Pitfalls and How to Avoid Them#

1
Using offset pagination on mutable datasets — switch to cursor pagination or stable sort by updated_at plus id.
2
Advancing watermark before commit — only commit watermark after destination write success.
3
Using timestamps without overlap — add an overlap window and tie-breaker to avoid missing late updates.
4
No idempotency on writes — use upserts or idempotency keys so retries are safe.
5
No durable state — store watermark and dedup in Postgres or Redis, not only in memory.
6
No drift monitoring — add reconciliation and metric-based alerts early.

If you need robust retry and alert patterns in n8n, implement them using the approach in n8n Error Handling, Retries, and Alerting.

# Key Takeaways#

Prefer cursor-based pagination or stable sorting by updated_at plus id to avoid skipped or duplicated pages.
Implement incremental loads with overlap and store a last committed watermark, not last fetched.
Make every write idempotent using upserts and idempotency keys so retries and replays are safe.
Persist deduplication keys in Postgres or Redis to survive restarts and prevent duplicate processing across runs.
Add monitoring and reconciliation using metrics like watermark lag, duplicates skipped, and daily count comparisons to catch drift early.

# Conclusion#

A production-grade n8n data sync workflow is a state machine: deterministic pagination, cautious watermarks, idempotent writes, and durable dedup storage turn inevitable retries into safe replays.

If you want Samioda to design and implement a reliable sync for your CRM, billing system, or internal platform, contact us with your source and destination details and your required sync SLA. We will propose a workflow architecture, state storage, and monitoring plan that prevents drift and scales with volume.

FAQ

Adrijan OmićevićSamioda Team

All articles →

More in Business Automation

All →

April 27, 2026·14 min read

Building an n8n Approval Workflow in 2026: Slack or Teams, Email, and Audit Trails

Learn how to build a production-ready n8n approval workflow with human-in-the-loop approvals, timeouts, reminders, escalation paths, and audit logging to prevent duplicate decisions.

n8nAutomationApprovalsSlackMicrosoft TeamsEmailAudit TrailWorkflow Design

Adrijan OmićevićRead Article →

April 9, 2026·16 min read

Lead-to-Cash Automation with n8n: From Form Submit to Invoice (End-to-End Workflow)

A practical lead to cash automation blueprint in n8n: capture leads, enrich data, route to sales, create deals, generate contracts, and trigger invoicing.

lead to cash automationn8nCRM automationinvoicingRevOps

Adrijan OmićevićRead Article →

April 7, 2026·17 min read

How to Self-Host n8n with Docker in 2026: Security, Backups, and Environment Setup

A practical step-by-step guide to self host n8n with Docker Compose, including persistence, secrets management, SSL, network isolation, and backup and restore procedures.

n8nDockerAutomationDevOpsSecurityBackupsSelf-Hosting

Adrijan OmićevićRead Article →

Need help with your project?

We build custom solutions using the technologies discussed in this article. Senior team, fixed prices.

n8n Workflow Automation Business Automation

April 27, 2026·14 min read

Building an n8n Approval Workflow in 2026: Slack or Teams, Email, and Audit Trails

Learn how to build a production-ready n8n approval workflow with human-in-the-loop approvals, timeouts, reminders, escalation paths, and audit logging to prevent duplicate decisions.

n8nAutomationApprovalsSlackMicrosoft TeamsEmailAudit TrailWorkflow Design

Adrijan OmićevićRead Article →

April 7, 2026·17 min read

How to Self-Host n8n with Docker in 2026: Security, Backups, and Environment Setup

A practical step-by-step guide to self host n8n with Docker Compose, including persistence, secrets management, SSL, network isolation, and backup and restore procedures.

n8nDockerAutomationDevOpsSecurityBackupsSelf-Hosting

Adrijan OmićevićRead Article →

April 6, 2026·13 min read

n8n Error Handling in Production: Retries, Dead-Letter Flows, and Alerting

A practical guide to n8n error handling in production — including retry strategies, idempotency, partial failure patterns, dead-letter flows, and Slack or email alerting you can reuse.

n8nAutomationReliabilityDevOpsWorkflows

Adrijan OmićevićRead Article →

Reliable Data Sync in n8n: Pagination, Incremental Loads, Deduplication, and CDC

# What This Guide Covers#

# Data Sync Reliability Requirements#

What “Reliable Sync” Means in Practice#

Failure Modes You Must Design For#

# Choose the Right Sync Strategy#

Snapshot, Incremental, or CDC#

# Cursor-Based Pagination That Does Not Break#

What to Ask of an API#

Example Pagination Loop in n8n#

Data model for cursor state

HTTP Request with cursor

Deterministic Sorting for Cursor Emulation#

# Incremental Loads with Safe Watermarks#

High-Watermark Design#

Query window

Example: Filter Logic in a Code node#

When to Advance the Watermark#

# Idempotency Keys and Deduplication#

Idempotency Keys for Writes#

Upsert Patterns#

Deduplication Storage That Survives Restarts#

# CDC in n8n: Practical Patterns#

Pattern 1: Provider Webhooks#

Pattern 2: Change Log Endpoint#

Pattern 3: Poll and Diff (Last Resort)#

# Example Workflow Designs#

Example A: Incremental Sync with Cursor Pagination and Upsert#

Example B: Webhook CDC with Dedup and Backfill#

# Handling Deletes and Hard-to-Detect Changes#

Delete Strategies#

# Monitoring Metrics to Catch Drift#

Metrics That Matter for Data Sync#

Drift Detection Checks#

# Common Pitfalls and How to Avoid Them#

# Key Takeaways#

# Conclusion#

FAQ

More in Business Automation

Building an n8n Approval Workflow in 2026: Slack or Teams, Email, and Audit Trails

Lead-to-Cash Automation with n8n: From Form Submit to Invoice (End-to-End Workflow)

How to Self-Host n8n with Docker in 2026: Security, Backups, and Environment Setup

Need help with your project?

Related Articles

Building an n8n Approval Workflow in 2026: Slack or Teams, Email, and Audit Trails

How to Self-Host n8n with Docker in 2026: Security, Backups, and Environment Setup

n8n Error Handling in Production: Retries, Dead-Letter Flows, and Alerting