ObservabilityNext.jsReactLoggingMetricsTracingSREDevOps

Web Application Observability: A Practical Guide to Logging, Metrics, and Tracing for React and Next.js

Adrijan Omičević··15 min read
Share

# What You’ll Build in This Guide#

This guide is a practical blueprint for web application observability in a modern React and Next.js stack. The goal is not “collect more data”, but shorten time to detect and time to resolve real production issues.

You’ll set up an end-to-end approach covering:

  • Error tracking for frontend and server-side failures
  • Performance monitoring for Web Vitals and backend latency
  • Structured logs you can actually query during incidents
  • Tracing to connect a slow page load to a slow database query
  • Dashboards and alerts that catch incidents users care about

You’ll also get a recommended stack, a phased instrumentation plan, and copy-pasteable examples.

# Why Web Application Observability Matters for React and Next.js#

React and Next.js apps fail in ways that traditional server apps did not. A single “page load” spans browser rendering, API calls, server-side rendering, edge caching, third-party scripts, and data stores.

Without observability, teams fall into expensive patterns:

  • Bug reports become guesswork because you cannot reproduce the user’s state, route, or network conditions.
  • Performance regressions ship silently, then conversions drop and you only see it in revenue analytics days later.
  • Incidents take longer because you cannot correlate “users saw a blank screen” with “SSR errors spiked” and “database p95 latency doubled”.

Industry data consistently shows the cost of bad detection: IBM’s widely cited estimate puts the average cost of a data breach at USD 4.45 million, and detection and escalation is the largest chunk of that lifecycle cost. Observability is not only an ops concern; it is also a security and risk control. Pair this with your security baseline from Web Application Security Checklist.

Performance is equally business-critical. Google’s Web Vitals research and ecosystem benchmarks repeatedly show that improvements in LCP and INP correlate with better engagement and conversion. If you care about speed, combine this guide with Website Performance Optimization.

# Observability 101: Logs, Metrics, Traces, and What Each Solves#

You need three pillars because each answers a different question:

SignalBest forQuestion it answersCommon anti-pattern
LogsDebugging specific eventsWhat happened and with what contextUnstructured console spam you cannot query
MetricsDetecting trends and incidentsIs it getting worse and for whomHigh-cardinality labels that explode cost
TracesDebugging latency across servicesWhere is the time spent end-to-endTracing everything without sampling

A practical rule: metrics detect, traces explain, logs confirm.

🎯 Key Takeaway: Treat observability as an incident workflow tool, not a data collection project. Instrument what helps you detect and explain user impact.

There is no single perfect stack, but you want tools that can correlate signals and support both browser and server runtime.

A pragmatic, widely adopted stack#

ConcernRecommendedWhy it’s practical for Next.js
Error trackingSentryExcellent Next.js integration, source maps, session replay, performance spans
Metrics and dashboardsGrafana Cloud or DatadogStrong alerting, SLOs, easy dashboards
LogsGrafana Loki, Datadog Logs, or ElasticsearchStructured JSON ingestion, queryable during incidents
TracingOpenTelemetry plus Grafana Tempo or Datadog APMVendor-neutral instrumentation with strong correlation
Uptime and synthetic checksBetter Uptime, Pingdom, or Datadog SyntheticsCatch outages and broken critical flows outside real-user traffic

If you want the lowest operational overhead for a small team, a single vendor suite can reduce integration pain. If you want maximum portability and control, standardize on OpenTelemetry and pick best-of-breed backends.

What we typically recommend at Samioda#

  • Sentry for error tracking and frontend performance visibility
  • OpenTelemetry for traces and server metrics
  • Grafana for dashboards and alert rules
  • Structured JSON logs routed to Loki or a managed log platform

This gives fast time-to-value while keeping the core instrumentation portable.

ℹ️ Note: For Next.js running on Vercel, you may need to align your logging and tracing approach with serverless constraints. You can still do structured logs and Sentry reliably, and add OpenTelemetry where runtime support is available.

# Instrumentation Order: What to Add First for Fast ROI#

Most teams fail by trying to instrument everything in week one. Instead, instrument in this order:

  1. 1
    Error tracking with release and commit metadata
  2. 2
    Frontend performance (Web Vitals and route transitions)
  3. 3
    Server performance for your critical API and SSR routes
  4. 4
    Structured logs with request correlation
  5. 5
    Distributed tracing across services and data stores
  6. 6
    SLO-based dashboards and actionable alerts

This order matches incident frequency. In typical production apps, the most common “stop the line” events are uncaught exceptions, broken deploys, and a single slow dependency causing global slowness.

💡 Tip: Define your top 3 user journeys before instrumentation. Examples: sign up, checkout, search. If you cannot measure these, you are blind even with perfect infrastructure metrics.

# Step 1: Error Tracking for React and Next.js#

Error tracking should answer: what broke, for how many users, after which release, and how do we reproduce it.

Set up Sentry with releases and source maps#

Key best practices:

  • Upload source maps on CI so stack traces map to real code
  • Tag events with release and environment
  • Capture user identifiers carefully to respect privacy and compliance
  • Track deploys so you can correlate spikes with releases

A minimal example for initializing Sentry on the client:

JavaScript
// sentry.client.config.js
import * as Sentry from "@sentry/nextjs";
 
Sentry.init({
  dsn: process.env.NEXT_PUBLIC_SENTRY_DSN,
  environment: process.env.NEXT_PUBLIC_APP_ENV,
  release: process.env.NEXT_PUBLIC_APP_RELEASE,
  tracesSampleRate: 0.1,
  replaysSessionSampleRate: 0.01,
  replaysOnErrorSampleRate: 0.1,
});

Keep sampling conservative at first. Increase for specific routes later.

What to capture for actionable errors#

Capture context that makes issues reproducible:

ContextExampleWhy it matters
Route and params/checkout?plan=proBugs are often route-specific
App version2026.03.31-1a2b3cFast rollback decisions
API error payloadSanitized error code, not raw PIILets you classify failures
Device and browserSafari iOS 17Mobile-only issues are common
Feature flagsnewCheckout=truePrevents false attribution

⚠️ Warning: Do not attach raw request bodies or tokens to error events. It is a common privacy and security incident waiting to happen. Keep sensitive fields out and use allowlists.

Set up alerting for error spikes and new issues#

Configure alerts for:

  • New issue introduced in the last release
  • Error rate above a threshold, for example errors per minute greater than your baseline by 3 times
  • Crash-free sessions below a target, for example less than 99.5 percent for consumer apps, or higher for B2B SaaS

Make sure alerts page a human only when action is required. Everything else should go to a triage channel.

# Step 2: Performance Monitoring that Maps to UX and Revenue#

Performance monitoring is not “CPU and memory”. For web apps, it is user experience.

Track Web Vitals and route-level performance#

At minimum track:

  • LCP for loading experience
  • INP for interaction latency
  • CLS for visual stability
  • TTFB for server and cache responsiveness

Next.js supports reporting Web Vitals. You can forward these to your observability backend.

Example: capturing Web Vitals and sending to an API route:

JavaScript
// pages/_app.js
export function reportWebVitals(metric) {
  const body = JSON.stringify(metric);
  navigator.sendBeacon("/api/vitals", body);
}

Then store aggregated metrics server-side.

Set thresholds that reflect reality#

Use thresholds aligned with Google guidance and business requirements:

MetricGood targetInvestigate whenCommon root causes
LCP2.5 seconds or lessgreater than 4 secondslarge hero images, render-blocking scripts, slow SSR
INP200 ms or lessgreater than 500 msheavy JS, long tasks, chat widgets
CLS0.1 or lessgreater than 0.25late-loading fonts, dynamic banners
TTFB800 ms or lessgreater than 1.8 secondscold starts, DB latency, cache misses

If your app is content-heavy, LCP is usually the first win. If it is dashboard-like, INP tends to drive perceived quality.

For deeper optimization tactics, connect this to Website Performance Optimization.

Monitor real user flows, not just averages#

Averages hide pain. Instrument p75 and p95 for:

  • /login
  • /checkout
  • /search
  • any SSR route with data fetching

Most performance incidents show up in the tail. A small p95 regression can impact conversions disproportionately.

# Step 3: Structured Logging You Can Query During Incidents#

Logs should be structured, consistent, and correlated to a request or trace. Console strings are not enough.

Define a JSON log schema#

Start with a schema like this:

FieldTypeExampleNotes
timestampstring2026-03-31T10:20:30ZISO
levelstringinfoinfo, warn, error
messagestringCheckout API failedshort, stable
servicestringwebuseful in multi-service setups
envstringproductionprod, staging
request_idstringreq_9f...correlate logs
trace_idstring4bf9...correlate with traces
user_idstringu_123avoid emails if possible
routestring/api/checkoutroute template preferred
duration_msnumber812for latency
error_codestringPAYMENT_TIMEOUTstable categorization

Implement a lightweight logger in Next.js#

Keep it simple and consistent:

JavaScript
// lib/logger.js
export function log(level, message, context = {}) {
  const entry = {
    timestamp: new Date().toISOString(),
    level,
    message,
    service: "web",
    env: process.env.APP_ENV,
    ...context,
  };
  console[level === "error" ? "error" : "log"](JSON.stringify(entry));
}

In API routes:

JavaScript
// pages/api/checkout.js
import { log } from "../../lib/logger";
 
export default async function handler(req, res) {
  const start = Date.now();
  const requestId = req.headers["x-request-id"] || `req_${Math.random().toString(16).slice(2)}`;
 
  try {
    // ... your logic
    log("info", "Checkout request", { request_id: requestId, route: "/api/checkout" });
    res.status(200).json({ ok: true });
  } catch (e) {
    log("error", "Checkout failed", {
      request_id: requestId,
      route: "/api/checkout",
      duration_ms: Date.now() - start,
      error_message: e?.message,
    });
    res.status(500).json({ ok: false });
  }
}

This produces logs you can filter by request_id, route, and duration.

💡 Tip: Prefer stable error_code values over raw exception messages. Error codes let you build reliable dashboards and alerts, and reduce noise from message variations.

# Step 4: Metrics That Actually Detect Production Issues#

Metrics should be low cardinality and aligned with user impact.

Core metrics to start with#

MetricTypeDimensions to keepWhy it detects real issues
Request ratecounterroute, method, statusTraffic changes and spikes
Error ratecounterroute, status, error_codeBroken deploys and dependency outages
Latencyhistogramroute, statusSlow APIs and SSR regressions
Cache hit rategaugecache layerUnexpected cache misses
Queue laggaugeworker nameBacklog leading to timeouts
Third-party latencyhistogramprovider namePayment, email, maps failures

Avoid dimensions like full URL with ids, user ids, or session ids. Those belong in logs or traces.

Create an SLI and SLO per critical user journey#

Even if you do not roll out full SRE practices, define one SLO:

  • Checkout availability: at least 99.9 percent of requests return 200 or 201 in 30 days
  • Checkout latency: p95 less than 1.5 seconds for /api/checkout

SLOs keep dashboards focused. They also make alert tuning easier.

# Step 5: Distributed Tracing for Next.js and Downstream Services#

Tracing answers the question: where did the time go.

When tracing becomes mandatory#

Add tracing when you see these patterns:

  • You can detect slowdowns in metrics, but root cause takes hours
  • Problems span multiple services, for example SSR calls API, API calls DB and payment provider
  • You need to prove whether the bottleneck is app code or dependency latency

Use OpenTelemetry and propagate context#

The key is consistent context propagation with trace_id. Even if you start with server-only tracing, the ability to correlate slow requests to logs is a big win.

A simplified example of creating spans around a downstream call:

JavaScript
// lib/instrumentation.js
import { trace } from "@opentelemetry/api";
 
export async function tracedFetch(url, options = {}) {
  const tracer = trace.getTracer("web");
  return tracer.startActiveSpan("http.client", async (span) => {
    try {
      span.setAttribute("http.url", url);
      const res = await fetch(url, options);
      span.setAttribute("http.status_code", res.status);
      return res;
    } catch (e) {
      span.recordException(e);
      throw e;
    } finally {
      span.end();
    }
  });
}

Use this around your critical dependencies: payments, search, auth, and CMS.

⚠️ Warning: Trace data can become expensive fast. Sample aggressively and focus on critical routes. A common approach is 1 percent sampling globally and 10 percent for checkout.

# Step 6: Dashboards That Catch Real Incidents#

Dashboards should answer: are users impacted, where, and since when.

The three dashboards every Next.js team should have#

1) Executive user-impact dashboard

Focus: outcomes, not internals.

Panels to include:

  • Error rate over time, split by environment
  • p75 and p95 LCP and INP over time
  • p95 SSR latency for key routes
  • Checkout success rate and latency

2) Release health dashboard

Focus: “did the deploy break anything”.

Panels to include:

  • Errors per minute by release
  • New issues count by release
  • p95 latency change compared to previous release
  • Web Vitals deltas

Tie this into your delivery workflow from Web Development Process Step-by-Step.

3) Dependency health dashboard

Focus: upstream and downstream reliability.

Panels to include:

  • Payment provider latency and errors
  • Email provider latency and errors
  • Database p95 latency and connection saturation
  • Cache hit ratio and evictions

A simple, effective alert matrix#

Alerting should map to user harm:

AlertTriggerSeverityFirst action
Checkout error spikeerror rate greater than 2 percent for 5 minutesPagerollback or disable feature flag
SSR 500 spike500s greater than baseline by 3 times for 10 minutesPagecheck recent deploy and logs
p95 API latency regressionp95 increases by 50 percent for 15 minutesPage during business hourscheck dependency dashboard and traces
Web Vitals regressionLCP p75 worsens by 20 percent day-over-dayTicketinvestigate bundle and images
Background job backlogqueue lag greater than 5 minutes for 10 minutesPagescale workers or fix stuck jobs

Make sure each alert has an owner and a runbook link.

ℹ️ Note: Many teams set static latency thresholds and get alert fatigue. Prefer alerts based on change from baseline, combined with absolute guardrails.

# Step 7: What to Instrument First in a Real Project#

If you have one week, do this:

  1. 1
    Sentry for frontend and Next.js server errors, with releases and source maps
  2. 2
    Web Vitals reporting to your metrics backend
  3. 3
    Structured JSON logs from API routes and SSR, with request ids
  4. 4
    One dashboard: release health plus checkout journey
  5. 5
    Two paging alerts: checkout errors and SSR 500 spikes

If you have one month, add:

  • OpenTelemetry traces for critical routes
  • Dependency metrics for payment, auth, and database
  • SLOs and error budget alerting
  • Synthetic monitoring for sign-in and checkout

This creates a compounding effect: each new signal becomes more valuable because you can correlate it.

# Common Pitfalls and How to Avoid Them#

  1. 1

    Collecting too much data without questions
    Start from failure modes. Example: “users cannot pay”, “homepage is slow on mobile”, “SSR returns 500”.

  2. 2

    Putting high-cardinality data into metrics
    Avoid user ids and full URLs in metrics labels. Keep them in logs and traces.

  3. 3

    No correlation ids across logs, errors, and traces
    A request_id and trace_id should show up everywhere. Without this, debugging becomes manual.

  4. 4

    Alerts without runbooks
    If an alert fires at 3 AM, the responder needs a next step in 30 seconds.

  5. 5

    Ignoring security in telemetry
    Telemetry often contains sensitive data. Apply the same rigor as in your security practices from Web Application Security Checklist.

# Key Takeaways#

  • Start your web application observability setup with error tracking and release health, then add performance, logs, and tracing in phases.
  • Instrument Web Vitals and critical user journeys using p75 and p95, not only averages.
  • Use structured JSON logs with stable error_code, request_id, and trace_id to make incidents searchable and fast to debug.
  • Keep metrics low cardinality and aligned with SLOs so dashboards and alerts reflect user impact.
  • Build three dashboards that map to reality: user impact, release health, and dependency health, then wire alerts to clear runbooks.

# Conclusion#

A solid web application observability foundation for React and Next.js is not complicated, but it must be intentional. Start with high-signal instrumentation, correlate everything with request and trace identifiers, and build dashboards and alerts around the user journeys that drive revenue.

If you want Samioda to implement an end-to-end observability stack for your Next.js app, including Sentry, OpenTelemetry, dashboards, and actionable alerts, contact us and we’ll ship a production-ready setup aligned with your release process and performance goals.

FAQ

Share
A
Adrijan OmičevićSamioda Team
All articles →

Need help with your project?

We build custom solutions using the technologies discussed in this article. Senior team, fixed prices.