What is web application observability in practice?

It is the ability to explain what your app is doing in production by correlating user-impacting errors, performance, and behavior across the frontend, backend, and infrastructure using logs, metrics, and traces.

What should we instrument first in a Next.js app?

Start with error tracking and high-signal performance metrics for core user flows, then add structured logs and distributed tracing to debug multi-step incidents faster.

Do we need full distributed tracing for a small app?

Not on day one. Many teams get 80 percent of the value from error tracking plus a few key metrics and structured logs, then add tracing when debugging time becomes a bottleneck.

How do we avoid high observability costs?

Sample low-value telemetry, keep high-cardinality fields out of metrics, set retention wisely, and focus dashboards on SLOs and top customer journeys rather than everything you can collect.

Web Application Observability: A Practical Guide to Logging, Metrics, and Tracing for React and Next.js | Blog

# What You’ll Build in This Guide#

This guide is a practical blueprint for web application observability in a modern React and Next.js stack. The goal is not “collect more data”, but shorten time to detect and time to resolve real production issues.

You’ll set up an end-to-end approach covering:

Error tracking for frontend and server-side failures
Performance monitoring for Web Vitals and backend latency
Structured logs you can actually query during incidents
Tracing to connect a slow page load to a slow database query
Dashboards and alerts that catch incidents users care about

You’ll also get a recommended stack, a phased instrumentation plan, and copy-pasteable examples.

# Why Web Application Observability Matters for React and Next.js#

React and Next.js apps fail in ways that traditional server apps did not. A single “page load” spans browser rendering, API calls, server-side rendering, edge caching, third-party scripts, and data stores.

Without observability, teams fall into expensive patterns:

Bug reports become guesswork because you cannot reproduce the user’s state, route, or network conditions.
Performance regressions ship silently, then conversions drop and you only see it in revenue analytics days later.
Incidents take longer because you cannot correlate “users saw a blank screen” with “SSR errors spiked” and “database p95 latency doubled”.

Industry data consistently shows the cost of bad detection: IBM’s widely cited estimate puts the average cost of a data breach at USD 4.45 million, and detection and escalation is the largest chunk of that lifecycle cost. Observability is not only an ops concern; it is also a security and risk control. Pair this with your security baseline from Web Application Security Checklist.

Performance is equally business-critical. Google’s Web Vitals research and ecosystem benchmarks repeatedly show that improvements in LCP and INP correlate with better engagement and conversion. If you care about speed, combine this guide with Website Performance Optimization.

# Observability 101: Logs, Metrics, Traces, and What Each Solves#

You need three pillars because each answers a different question:

Signal	Best for	Question it answers	Common anti-pattern
Logs	Debugging specific events	What happened and with what context	Unstructured console spam you cannot query
Metrics	Detecting trends and incidents	Is it getting worse and for whom	High-cardinality labels that explode cost
Traces	Debugging latency across services	Where is the time spent end-to-end	Tracing everything without sampling

A practical rule: metrics detect, traces explain, logs confirm.

🎯 Key Takeaway: Treat observability as an incident workflow tool, not a data collection project. Instrument what helps you detect and explain user impact.

# Recommended Observability Stack for Next.js and React#

There is no single perfect stack, but you want tools that can correlate signals and support both browser and server runtime.

A pragmatic, widely adopted stack#

Concern	Recommended	Why it’s practical for Next.js
Error tracking	Sentry	Excellent Next.js integration, source maps, session replay, performance spans
Metrics and dashboards	Grafana Cloud or Datadog	Strong alerting, SLOs, easy dashboards
Logs	Grafana Loki, Datadog Logs, or Elasticsearch	Structured JSON ingestion, queryable during incidents
Tracing	OpenTelemetry plus Grafana Tempo or Datadog APM	Vendor-neutral instrumentation with strong correlation
Uptime and synthetic checks	Better Uptime, Pingdom, or Datadog Synthetics	Catch outages and broken critical flows outside real-user traffic

If you want the lowest operational overhead for a small team, a single vendor suite can reduce integration pain. If you want maximum portability and control, standardize on OpenTelemetry and pick best-of-breed backends.

Sentry for error tracking and frontend performance visibility
OpenTelemetry for traces and server metrics
Grafana for dashboards and alert rules
Structured JSON logs routed to Loki or a managed log platform

This gives fast time-to-value while keeping the core instrumentation portable.

ℹ️ Note: For Next.js running on Vercel, you may need to align your logging and tracing approach with serverless constraints. You can still do structured logs and Sentry reliably, and add OpenTelemetry where runtime support is available.

# Instrumentation Order: What to Add First for Fast ROI#

Most teams fail by trying to instrument everything in week one. Instead, instrument in this order:

1
Error tracking with release and commit metadata
2
Frontend performance (Web Vitals and route transitions)
3
Server performance for your critical API and SSR routes
4
Structured logs with request correlation
5
Distributed tracing across services and data stores
6
SLO-based dashboards and actionable alerts

This order matches incident frequency. In typical production apps, the most common “stop the line” events are uncaught exceptions, broken deploys, and a single slow dependency causing global slowness.

💡 Tip: Define your top 3 user journeys before instrumentation. Examples: sign up, checkout, search. If you cannot measure these, you are blind even with perfect infrastructure metrics.

# Step 1: Error Tracking for React and Next.js#

Error tracking should answer: what broke, for how many users, after which release, and how do we reproduce it.

Set up Sentry with releases and source maps#

Key best practices:

Upload source maps on CI so stack traces map to real code
Tag events with release and environment
Capture user identifiers carefully to respect privacy and compliance
Track deploys so you can correlate spikes with releases

A minimal example for initializing Sentry on the client:

JavaScript

// sentry.client.config.js
import * as Sentry from "@sentry/nextjs";
 
Sentry.init({
  dsn: process.env.NEXT_PUBLIC_SENTRY_DSN,
  environment: process.env.NEXT_PUBLIC_APP_ENV,
  release: process.env.NEXT_PUBLIC_APP_RELEASE,
  tracesSampleRate: 0.1,
  replaysSessionSampleRate: 0.01,
  replaysOnErrorSampleRate: 0.1,
});

Keep sampling conservative at first. Increase for specific routes later.

What to capture for actionable errors#

Capture context that makes issues reproducible:

Context	Example	Why it matters
Route and params	`/checkout?plan=pro`	Bugs are often route-specific
App version	`2026.03.31-1a2b3c`	Fast rollback decisions
API error payload	Sanitized error code, not raw PII	Lets you classify failures
Device and browser	Safari iOS 17	Mobile-only issues are common
Feature flags	`newCheckout=true`	Prevents false attribution

⚠️ Warning: Do not attach raw request bodies or tokens to error events. It is a common privacy and security incident waiting to happen. Keep sensitive fields out and use allowlists.

Set up alerting for error spikes and new issues#

Configure alerts for:

New issue introduced in the last release
Error rate above a threshold, for example errors per minute greater than your baseline by 3 times
Crash-free sessions below a target, for example less than 99.5 percent for consumer apps, or higher for B2B SaaS

Make sure alerts page a human only when action is required. Everything else should go to a triage channel.

# Step 2: Performance Monitoring that Maps to UX and Revenue#

Performance monitoring is not “CPU and memory”. For web apps, it is user experience.

Track Web Vitals and route-level performance#

At minimum track:

LCP for loading experience
INP for interaction latency
CLS for visual stability
TTFB for server and cache responsiveness

Next.js supports reporting Web Vitals. You can forward these to your observability backend.

Example: capturing Web Vitals and sending to an API route:

JavaScript

// pages/_app.js
export function reportWebVitals(metric) {
  const body = JSON.stringify(metric);
  navigator.sendBeacon("/api/vitals", body);
}

Then store aggregated metrics server-side.

Set thresholds that reflect reality#

Use thresholds aligned with Google guidance and business requirements:

Metric	Good target	Investigate when	Common root causes
LCP	2.5 seconds or less	greater than 4 seconds	large hero images, render-blocking scripts, slow SSR
INP	200 ms or less	greater than 500 ms	heavy JS, long tasks, chat widgets
CLS	0.1 or less	greater than 0.25	late-loading fonts, dynamic banners
TTFB	800 ms or less	greater than 1.8 seconds	cold starts, DB latency, cache misses

If your app is content-heavy, LCP is usually the first win. If it is dashboard-like, INP tends to drive perceived quality.

For deeper optimization tactics, connect this to Website Performance Optimization.

Monitor real user flows, not just averages#

Averages hide pain. Instrument p75 and p95 for:

/login
/checkout
/search
any SSR route with data fetching

Most performance incidents show up in the tail. A small p95 regression can impact conversions disproportionately.

# Step 3: Structured Logging You Can Query During Incidents#

Logs should be structured, consistent, and correlated to a request or trace. Console strings are not enough.

Define a JSON log schema#

Start with a schema like this:

Field	Type	Example	Notes
`timestamp`	string	`2026-03-31T10:20:30Z`	ISO
`level`	string	`info`	info, warn, error
`message`	string	`Checkout API failed`	short, stable
`service`	string	`web`	useful in multi-service setups
`env`	string	`production`	prod, staging
`request_id`	string	`req_9f...`	correlate logs
`trace_id`	string	`4bf9...`	correlate with traces
`user_id`	string	`u_123`	avoid emails if possible
`route`	string	`/api/checkout`	route template preferred
`duration_ms`	number	`812`	for latency
`error_code`	string	`PAYMENT_TIMEOUT`	stable categorization

Implement a lightweight logger in Next.js#

Keep it simple and consistent:

JavaScript

// lib/logger.js
export function log(level, message, context = {}) {
  const entry = {
    timestamp: new Date().toISOString(),
    level,
    message,
    service: "web",
    env: process.env.APP_ENV,
    ...context,
  };
  console[level === "error" ? "error" : "log"](JSON.stringify(entry));
}

In API routes:

JavaScript

// pages/api/checkout.js
import { log } from "../../lib/logger";
 
export default async function handler(req, res) {
  const start = Date.now();
  const requestId = req.headers["x-request-id"] || `req_${Math.random().toString(16).slice(2)}`;
 
  try {
    // ... your logic
    log("info", "Checkout request", { request_id: requestId, route: "/api/checkout" });
    res.status(200).json({ ok: true });
  } catch (e) {
    log("error", "Checkout failed", {
      request_id: requestId,
      route: "/api/checkout",
      duration_ms: Date.now() - start,
      error_message: e?.message,
    });
    res.status(500).json({ ok: false });
  }
}

This produces logs you can filter by request_id, route, and duration.

💡 Tip: Prefer stable error_code values over raw exception messages. Error codes let you build reliable dashboards and alerts, and reduce noise from message variations.

# Step 4: Metrics That Actually Detect Production Issues#

Metrics should be low cardinality and aligned with user impact.

Core metrics to start with#

Metric	Type	Dimensions to keep	Why it detects real issues
Request rate	counter	route, method, status	Traffic changes and spikes
Error rate	counter	route, status, error_code	Broken deploys and dependency outages
Latency	histogram	route, status	Slow APIs and SSR regressions
Cache hit rate	gauge	cache layer	Unexpected cache misses
Queue lag	gauge	worker name	Backlog leading to timeouts
Third-party latency	histogram	provider name	Payment, email, maps failures

Avoid dimensions like full URL with ids, user ids, or session ids. Those belong in logs or traces.

Create an SLI and SLO per critical user journey#

Even if you do not roll out full SRE practices, define one SLO:

Checkout availability: at least 99.9 percent of requests return 200 or 201 in 30 days
Checkout latency: p95 less than 1.5 seconds for /api/checkout

SLOs keep dashboards focused. They also make alert tuning easier.

# Step 5: Distributed Tracing for Next.js and Downstream Services#

Tracing answers the question: where did the time go.

When tracing becomes mandatory#

Add tracing when you see these patterns:

You can detect slowdowns in metrics, but root cause takes hours
Problems span multiple services, for example SSR calls API, API calls DB and payment provider
You need to prove whether the bottleneck is app code or dependency latency

Use OpenTelemetry and propagate context#

The key is consistent context propagation with trace_id. Even if you start with server-only tracing, the ability to correlate slow requests to logs is a big win.

A simplified example of creating spans around a downstream call:

JavaScript

// lib/instrumentation.js
import { trace } from "@opentelemetry/api";
 
export async function tracedFetch(url, options = {}) {
  const tracer = trace.getTracer("web");
  return tracer.startActiveSpan("http.client", async (span) => {
    try {
      span.setAttribute("http.url", url);
      const res = await fetch(url, options);
      span.setAttribute("http.status_code", res.status);
      return res;
    } catch (e) {
      span.recordException(e);
      throw e;
    } finally {
      span.end();
    }
  });
}

Use this around your critical dependencies: payments, search, auth, and CMS.

⚠️ Warning: Trace data can become expensive fast. Sample aggressively and focus on critical routes. A common approach is 1 percent sampling globally and 10 percent for checkout.

# Step 6: Dashboards That Catch Real Incidents#

Dashboards should answer: are users impacted, where, and since when.

The three dashboards every Next.js team should have#

1) Executive user-impact dashboard

Focus: outcomes, not internals.

Panels to include:

Error rate over time, split by environment
p75 and p95 LCP and INP over time
p95 SSR latency for key routes
Checkout success rate and latency

2) Release health dashboard

Focus: “did the deploy break anything”.

Panels to include:

Errors per minute by release
New issues count by release
p95 latency change compared to previous release
Web Vitals deltas

Tie this into your delivery workflow from Web Development Process Step-by-Step.

3) Dependency health dashboard

Focus: upstream and downstream reliability.

Panels to include:

Payment provider latency and errors
Email provider latency and errors
Database p95 latency and connection saturation
Cache hit ratio and evictions

A simple, effective alert matrix#

Alerting should map to user harm:

Alert	Trigger	Severity	First action
Checkout error spike	error rate greater than 2 percent for 5 minutes	Page	rollback or disable feature flag
SSR 500 spike	500s greater than baseline by 3 times for 10 minutes	Page	check recent deploy and logs
p95 API latency regression	p95 increases by 50 percent for 15 minutes	Page during business hours	check dependency dashboard and traces
Web Vitals regression	LCP p75 worsens by 20 percent day-over-day	Ticket	investigate bundle and images
Background job backlog	queue lag greater than 5 minutes for 10 minutes	Page	scale workers or fix stuck jobs

Make sure each alert has an owner and a runbook link.

ℹ️ Note: Many teams set static latency thresholds and get alert fatigue. Prefer alerts based on change from baseline, combined with absolute guardrails.

# Step 7: What to Instrument First in a Real Project#

If you have one week, do this:

1
Sentry for frontend and Next.js server errors, with releases and source maps
2
Web Vitals reporting to your metrics backend
3
Structured JSON logs from API routes and SSR, with request ids
4
One dashboard: release health plus checkout journey
5
Two paging alerts: checkout errors and SSR 500 spikes

If you have one month, add:

OpenTelemetry traces for critical routes
Dependency metrics for payment, auth, and database
SLOs and error budget alerting
Synthetic monitoring for sign-in and checkout

This creates a compounding effect: each new signal becomes more valuable because you can correlate it.

# Common Pitfalls and How to Avoid Them#

1

Collecting too much data without questions
Start from failure modes. Example: “users cannot pay”, “homepage is slow on mobile”, “SSR returns 500”.
2

Putting high-cardinality data into metrics
Avoid user ids and full URLs in metrics labels. Keep them in logs and traces.
3

No correlation ids across logs, errors, and traces
A request_id and trace_id should show up everywhere. Without this, debugging becomes manual.
4

Alerts without runbooks
If an alert fires at 3 AM, the responder needs a next step in 30 seconds.
5

Ignoring security in telemetry
Telemetry often contains sensitive data. Apply the same rigor as in your security practices from Web Application Security Checklist.

# Key Takeaways#

Start your web application observability setup with error tracking and release health, then add performance, logs, and tracing in phases.
Instrument Web Vitals and critical user journeys using p75 and p95, not only averages.
Use structured JSON logs with stable error_code, request_id, and trace_id to make incidents searchable and fast to debug.
Keep metrics low cardinality and aligned with SLOs so dashboards and alerts reflect user impact.
Build three dashboards that map to reality: user impact, release health, and dependency health, then wire alerts to clear runbooks.

# Conclusion#

A solid web application observability foundation for React and Next.js is not complicated, but it must be intentional. Start with high-signal instrumentation, correlate everything with request and trace identifiers, and build dashboards and alerts around the user journeys that drive revenue.

If you want Samioda to implement an end-to-end observability stack for your Next.js app, including Sentry, OpenTelemetry, dashboards, and actionable alerts, contact us and we’ll ship a production-ready setup aligned with your release process and performance goals.

FAQ

Adrijan OmićevićFounder & Senior Developer

Founder & Senior Developer at Samioda. 8+ years building React, Next.js, Flutter and n8n automation solutions for clients across Europe.

About the author →LinkedIn GitHub

More in Web Development

All →

June 29, 2026·16 min read

React Table Virtualization & Infinite Scroll: Building Fast Data Grids with TanStack (2026 Guide)

Learn React table virtualization with TanStack Table, TanStack Virtual, and React Query: efficient rendering, infinite scroll, server-side sorting/filtering, URL sync, selection persistence, and optimistic updates.

ReactTanStack TableVirtualizationInfinite ScrollReact QueryPerformance

Adrijan OmićevićRead Article →

June 27, 2026·17 min read

Next.js Real-Time Features: WebSockets vs SSE vs Supabase Realtime (When to Use What)

A practical comparison of Next.js real-time options—WebSockets, Server-Sent Events, and Supabase Realtime—covering hosting constraints, scalability, auth, cost, and which to use for chat, dashboards, notifications, and collaborative editing.

Next.jsReal-timeWebSocketsSSESupabaseArchitectureScalability

Adrijan OmićevićRead Article →

June 25, 2026·16 min read

Next.js Rate Limiting & Bot Protection: Patterns for APIs, Server Actions, and Edge (2026 Guide)

Practical Next.js rate limiting patterns for Route Handlers, Server Actions, and Edge runtime — with token bucket strategies, Redis-backed limits, WAF/CDN rules, monitoring, and false-positive mitigation.

Next.jsSecurityRate LimitingBot ProtectionEdgeRedisVercelCloudflare

Adrijan OmićevićRead Article →

Need help with your project?

We build custom solutions using the technologies discussed in this article. Senior team, fixed prices.

Next.js Development Web & Mobile Development Pricing

June 13, 2026·16 min read

React Query at Scale: Cache Invalidation, Pagination, and Mutation Patterns for Real Apps

React Query cache invalidation best practices for real-world apps: scalable query key design, invalidation strategy, optimistic updates, infinite queries, and background refetching in Next.js App Router.

ReactReact QueryTanStack QueryNext.jsApp RouterCachingPerformanceFrontend Architecture

Adrijan OmićevićRead Article →

June 11, 2026·13 min read

React Performance in 2026: Profiling, Memoization, and Rendering Patterns That Actually Work

A practical step-by-step guide to React performance profiling and memoization in 2026: how to diagnose slow UIs with React DevTools Profiler and why-did-you-render, pick the right rendering patterns, and avoid premature optimization.

ReactPerformanceProfilingMemoizationNext.jsFrontend

Adrijan OmićevićRead Article →

May 7, 2026·14 min read

Feature Flags & A/B Testing in Next.js: Architecture, Tooling, and Safe Rollouts (2026 Guide)

A practical guide to implementing Next.js feature flags and A/B testing with server and client evaluation, Edge runtime considerations, analytics, and team rollout playbooks.

Next.jsFeature FlagsA/B TestingExperimentationEdgeAnalyticsDevOps

Adrijan OmićevićRead Article →

Web Application Observability: A Practical Guide to Logging, Metrics, and Tracing for React and Next.js

# What You’ll Build in This Guide#

# Why Web Application Observability Matters for React and Next.js#

# Observability 101: Logs, Metrics, Traces, and What Each Solves#

# Recommended Observability Stack for Next.js and React#

A pragmatic, widely adopted stack#

What we typically recommend at Samioda#

# Instrumentation Order: What to Add First for Fast ROI#

# Step 1: Error Tracking for React and Next.js#

Set up Sentry with releases and source maps#

What to capture for actionable errors#

Set up alerting for error spikes and new issues#

# Step 2: Performance Monitoring that Maps to UX and Revenue#

Track Web Vitals and route-level performance#

Set thresholds that reflect reality#

Monitor real user flows, not just averages#

# Step 3: Structured Logging You Can Query During Incidents#

Define a JSON log schema#

Implement a lightweight logger in Next.js#

# Step 4: Metrics That Actually Detect Production Issues#

Core metrics to start with#

Create an SLI and SLO per critical user journey#

# Step 5: Distributed Tracing for Next.js and Downstream Services#

When tracing becomes mandatory#

Use OpenTelemetry and propagate context#

# Step 6: Dashboards That Catch Real Incidents#

The three dashboards every Next.js team should have#

1) Executive user-impact dashboard

2) Release health dashboard

3) Dependency health dashboard

A simple, effective alert matrix#

# Step 7: What to Instrument First in a Real Project#

# Common Pitfalls and How to Avoid Them#

# Key Takeaways#

# Conclusion#

FAQ

More in Web Development

React Table Virtualization & Infinite Scroll: Building Fast Data Grids with TanStack (2026 Guide)

Next.js Real-Time Features: WebSockets vs SSE vs Supabase Realtime (When to Use What)

Next.js Rate Limiting & Bot Protection: Patterns for APIs, Server Actions, and Edge (2026 Guide)

Need help with your project?

Related Articles

React Query at Scale: Cache Invalidation, Pagination, and Mutation Patterns for Real Apps

React Performance in 2026: Profiling, Memoization, and Rendering Patterns That Actually Work

Feature Flags & A/B Testing in Next.js: Architecture, Tooling, and Safe Rollouts (2026 Guide)