Gleidson Leite da Silva

Posted on Oct 26

Supercharge Your Node.js Monitoring with OpenTelemetry, Prometheus, and Grafana

#observability #node #webdev #devops

As developers, we often aim to optimize our applications for performance and resilience—but how do we truly know if our systems are working at their best? The answer lies in effective monitoring. In this tutorial, we’ll walk through setting up a powerful monitoring stack for a Node.js application using OpenTelemetry for tracing, Prometheus for metric collection, and Grafana for beautiful, real-time visualizations. By the end, you’ll have a system that gives you deep insights into your application’s health, focusing on monitoring specific endpoints and visualizing key metrics—all within a dynamic Grafana dashboard.

Why Monitoring Matters

Imagine an e-commerce platform where even a few milliseconds of delay can impact customer satisfaction and revenue. Or consider a backend service processing thousands of requests per second—one small error, and performance degradation can ripple through the entire system. Monitoring is your gateway to understanding how your application performs in real-world conditions. With proper monitoring, you can proactively optimize, troubleshoot issues faster, and ensure a seamless experience for users.

So, how do we implement a monitoring setup that’s both scalable and provides deep visibility? Let’s dive in.

Introducing the Stack

Our monitoring setup comprises three key technologies:

OpenTelemetry: A unified standard for collecting, processing, and exporting telemetry data from applications. With OpenTelemetry, we’ll collect trace and metric data to understand our application’s internal processes and latency bottlenecks.
Prometheus: An open-source metrics collection and alerting toolkit that’s particularly powerful for time-series data. It will act as our data source, continuously scraping metrics from our application.
Grafana: A visualization tool that turns raw data into insightful dashboards. Grafana will provide us with a dynamic interface to visualize our Prometheus metrics and build customized dashboards.

This stack is a tried-and-true setup, widely adopted in production environments to monitor microservices, applications, and infrastructure.

Step 1: Setting Up OpenTelemetry in Node.js

To begin, let’s integrate OpenTelemetry in our Node.js application. This setup allows our application to export key metrics to Prometheus.

Create a new file, tracing.js, to configure OpenTelemetry for your app.

`tracing.js`

// tracing.js
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const { PrometheusExporter } = require('@opentelemetry/exporter-prometheus');

// Set up the Prometheus exporter to expose metrics at /metrics on port 9091
const prometheusExporter = new PrometheusExporter(
  { port: 9091, endpoint: '/metrics' },
  () => console.log('Prometheus scrape endpoint: http://localhost:9091/metrics')
);

// Initialize the OpenTelemetry SDK with auto-instrumentations for common modules
const sdk = new NodeSDK({
  metricReader: prometheusExporter,
  instrumentations: [getNodeAutoInstrumentations()],
});

sdk.start();
console.log('OpenTelemetry initialized with Prometheus Exporter');

Here’s what’s happening:

Prometheus Exporter: We set up OpenTelemetry to expose metrics via a Prometheus exporter, making metrics available at http://localhost:9091/metrics.
Auto-Instrumentations: We include getNodeAutoInstrumentations to automatically collect telemetry data from common modules (e.g., HTTP server, database calls). This removes the need for additional configuration for those modules.

With OpenTelemetry in place, our application will automatically generate and expose metrics without extra coding on each endpoint.

Step 2: Building the Mock Application

Let’s create a basic Express application with a single route (/query). We’ll simulate a database call within this route and add manual instrumentation with OpenTelemetry, allowing us to trace and measure each request’s performance.

`index.js`

require('./tracing'); // Initialize OpenTelemetry
const express = require('express');
const { trace } = require('@opentelemetry/api');

const app = express();
const PORT = 4000;

// Middleware to parse JSON requests
app.use(express.json());

// Mock function to simulate data fetching
function getMockData() {
  return {
    userId: 1,
    name: 'John Doe',
    email: 'johndoe@example.com',
    orders: [
      { orderId: 101, item: 'Laptop', price: 1200 },
      { orderId: 102, item: 'Smartphone', price: 800 },
    ],
  };
}

// Main endpoint to simulate data retrieval with instrumentation
app.get('/query', async (req, res) => {
  const tracer = trace.getTracer('default');
  const span = tracer.startSpan('GET /query'); // Track request duration

  try {
    await new Promise(resolve => setTimeout(resolve, 200)); // Simulate database delay
    const mockData = getMockData();
    res.json({ success: true, data: mockData });
  } catch (error) {
    console.error('Error fetching data:', error);
    res.status(500).json({ success: false, message: 'Error fetching data' });
  } finally {
    span.end(); // End span for tracking
  }
});

app.listen(PORT, () => {
  console.log(`Server running at http://localhost:${PORT}`);
});

In this code:

The /query endpoint simulates a database query with a 200ms delay.
We use OpenTelemetry spans to trace each request’s duration, enabling us to capture latency data specifically for this endpoint.

Step 3: Configuring Prometheus and Grafana

Now, let’s set up Prometheus and Grafana with Docker Compose to collect and visualize our metrics.

`docker-compose.yml`

version: '3.8'

services:
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
    restart: unless-stopped

  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    ports:
      - "3000:3000"
    volumes:
      - grafana-storage:/var/lib/grafana
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    restart: unless-stopped

volumes:
  grafana-storage:

This configuration will:

Launch Prometheus on http://localhost:9090 and Grafana on http://localhost:3000.
Persist Grafana dashboards and settings, even after stopping the containers.

Prometheus Configuration

Prometheus needs a configuration file to know where to scrape metrics. Here’s the prometheus.yml:

global:
  scrape_interval: 1s

scrape_configs:
  - job_name: 'example'
    static_configs:
      - targets: ['host.docker.internal:9091']

This file:

Sets scrape_interval to 1s for near-real-time data collection.
Configures target as host.docker.internal:9091, pointing Prometheus to the OpenTelemetry metrics endpoint.

Step 4: Installing Dependencies

To run this setup, make sure you install the following Node.js dependencies:

"dependencies": {
    "@opentelemetry/api": "^1.9.0",
    "@opentelemetry/auto-instrumentations-node": "^0.52.0",
    "@opentelemetry/exporter-prometheus": "^0.54.0",
    "@opentelemetry/sdk-node": "^0.54.0",
    "express": "^4.21.1"
}

To install these dependencies, run:

npm install @opentelemetry/api@^1.9.0 @opentelemetry/auto-instrumentations-node@^0.52.0 @opentelemetry/exporter-prometheus@^0.54.0 @opentelemetry/sdk-node@^0.54.0 express@^4.21.1

With these dependencies in place, your environment is ready to collect and export metrics.

Step 5: Running the Application

With everything set up, launch the services with Docker Compose:

docker-compose up -d

Access the Services

Prometheus: Open http://localhost:9090 to view metrics.
Grafana: Open http://localhost:3000 to configure and view dashboards. Login with admin/admin.

Step 6: Visualizing Data in Grafana

With Grafana connected to Prometheus, you can now create dashboards to monitor various aspects of your application’s performance. Visualize metrics such as:

Average request duration
95th percentile latency
Success and error rates
Total request count

Below is a sample screenshot of a Grafana dashboard that showcases request metrics for the /query route.

Wrapping Up

Congratulations! You’ve now set up a robust monitoring stack with OpenTelemetry, Prometheus, and Grafana. By collecting detailed telemetry from your application and visualizing it in real-time, you’re equipped to:

Identify latency bottlenecks: Monitor endpoints like /query for slow responses and optimize them.
Track error rates: Quickly identify when requests fail and respond proactively.
Gauge overall health: Use total request counts, latency percentiles, and success rates to understand your application’s performance under various loads.

With this stack in place, you can detect issues before they affect users, optimize resource usage, and maintain a high-performance application. Monitoring isn’t just about fixing problems—it’s about making your application resilient and proactive in delivering a great user experience.

Happy monitoring! 🚀

DEV Community