As developers, we often aim to optimize our applications for performance and resilience—but how do we truly know if our systems are working at their best? The answer lies in effective monitoring. In this tutorial, we’ll walk through setting up a powerful monitoring stack for a Node.js application using OpenTelemetry for tracing, Prometheus for metric collection, and Grafana for beautiful, real-time visualizations. By the end, you’ll have a system that gives you deep insights into your application’s health, focusing on monitoring specific endpoints and visualizing key metrics—all within a dynamic Grafana dashboard.
Why Monitoring Matters
Imagine an e-commerce platform where even a few milliseconds of delay can impact customer satisfaction and revenue. Or consider a backend service processing thousands of requests per second—one small error, and performance degradation can ripple through the entire system. Monitoring is your gateway to understanding how your application performs in real-world conditions. With proper monitoring, you can proactively optimize, troubleshoot issues faster, and ensure a seamless experience for users.
So, how do we implement a monitoring setup that’s both scalable and provides deep visibility? Let’s dive in.
Introducing the Stack
Our monitoring setup comprises three key technologies:
- OpenTelemetry: A unified standard for collecting, processing, and exporting telemetry data from applications. With OpenTelemetry, we’ll collect trace and metric data to understand our application’s internal processes and latency bottlenecks.
- Prometheus: An open-source metrics collection and alerting toolkit that’s particularly powerful for time-series data. It will act as our data source, continuously scraping metrics from our application.
- Grafana: A visualization tool that turns raw data into insightful dashboards. Grafana will provide us with a dynamic interface to visualize our Prometheus metrics and build customized dashboards.
This stack is a tried-and-true setup, widely adopted in production environments to monitor microservices, applications, and infrastructure.
Step 1: Setting Up OpenTelemetry in Node.js
To begin, let’s integrate OpenTelemetry in our Node.js application. This setup allows our application to export key metrics to Prometheus.
Create a new file, tracing.js
, to configure OpenTelemetry for your app.
tracing.js
// tracing.js
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const { PrometheusExporter } = require('@opentelemetry/exporter-prometheus');
// Set up the Prometheus exporter to expose metrics at /metrics on port 9091
const prometheusExporter = new PrometheusExporter(
{ port: 9091, endpoint: '/metrics' },
() => console.log('Prometheus scrape endpoint: http://localhost:9091/metrics')
);
// Initialize the OpenTelemetry SDK with auto-instrumentations for common modules
const sdk = new NodeSDK({
metricReader: prometheusExporter,
instrumentations: [getNodeAutoInstrumentations()],
});
sdk.start();
console.log('OpenTelemetry initialized with Prometheus Exporter');
Here’s what’s happening:
-
Prometheus Exporter: We set up OpenTelemetry to expose metrics via a Prometheus exporter, making metrics available at
http://localhost:9091/metrics
. -
Auto-Instrumentations: We include
getNodeAutoInstrumentations
to automatically collect telemetry data from common modules (e.g., HTTP server, database calls). This removes the need for additional configuration for those modules.
With OpenTelemetry in place, our application will automatically generate and expose metrics without extra coding on each endpoint.
Step 2: Building the Mock Application
Let’s create a basic Express application with a single route (/query
). We’ll simulate a database call within this route and add manual instrumentation with OpenTelemetry, allowing us to trace and measure each request’s performance.
index.js
require('./tracing'); // Initialize OpenTelemetry
const express = require('express');
const { trace } = require('@opentelemetry/api');
const app = express();
const PORT = 4000;
// Middleware to parse JSON requests
app.use(express.json());
// Mock function to simulate data fetching
function getMockData() {
return {
userId: 1,
name: 'John Doe',
email: 'johndoe@example.com',
orders: [
{ orderId: 101, item: 'Laptop', price: 1200 },
{ orderId: 102, item: 'Smartphone', price: 800 },
],
};
}
// Main endpoint to simulate data retrieval with instrumentation
app.get('/query', async (req, res) => {
const tracer = trace.getTracer('default');
const span = tracer.startSpan('GET /query'); // Track request duration
try {
await new Promise(resolve => setTimeout(resolve, 200)); // Simulate database delay
const mockData = getMockData();
res.json({ success: true, data: mockData });
} catch (error) {
console.error('Error fetching data:', error);
res.status(500).json({ success: false, message: 'Error fetching data' });
} finally {
span.end(); // End span for tracking
}
});
app.listen(PORT, () => {
console.log(`Server running at http://localhost:${PORT}`);
});
In this code:
- The
/query
endpoint simulates a database query with a 200ms delay. - We use OpenTelemetry spans to trace each request’s duration, enabling us to capture latency data specifically for this endpoint.
Step 3: Configuring Prometheus and Grafana
Now, let’s set up Prometheus and Grafana with Docker Compose to collect and visualize our metrics.
docker-compose.yml
version: '3.8'
services:
prometheus:
image: prom/prometheus:latest
container_name: prometheus
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
command:
- '--config.file=/etc/prometheus/prometheus.yml'
restart: unless-stopped
grafana:
image: grafana/grafana:latest
container_name: grafana
ports:
- "3000:3000"
volumes:
- grafana-storage:/var/lib/grafana
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
restart: unless-stopped
volumes:
grafana-storage:
This configuration will:
- Launch Prometheus on
http://localhost:9090
and Grafana onhttp://localhost:3000
. - Persist Grafana dashboards and settings, even after stopping the containers.
Prometheus Configuration
Prometheus needs a configuration file to know where to scrape metrics. Here’s the prometheus.yml
:
global:
scrape_interval: 1s
scrape_configs:
- job_name: 'example'
static_configs:
- targets: ['host.docker.internal:9091']
This file:
- Sets scrape_interval to
1s
for near-real-time data collection. - Configures target as
host.docker.internal:9091
, pointing Prometheus to the OpenTelemetry metrics endpoint.
Step 4: Installing Dependencies
To run this setup, make sure you install the following Node.js dependencies:
"dependencies": {
"@opentelemetry/api": "^1.9.0",
"@opentelemetry/auto-instrumentations-node": "^0.52.0",
"@opentelemetry/exporter-prometheus": "^0.54.0",
"@opentelemetry/sdk-node": "^0.54.0",
"express": "^4.21.1"
}
To install these dependencies, run:
npm install @opentelemetry/api@^1.9.0 @opentelemetry/auto-instrumentations-node@^0.52.0 @opentelemetry/exporter-prometheus@^0.54.0 @opentelemetry/sdk-node@^0.54.0 express@^4.21.1
With these dependencies in place, your environment is ready to collect and export metrics.
Step 5: Running the Application
With everything set up, launch the services with Docker Compose:
docker-compose up -d
Access the Services
- Prometheus: Open http://localhost:9090 to view metrics.
-
Grafana: Open http://localhost:3000 to configure and view dashboards. Login with
admin
/admin
.
Step 6: Visualizing Data in Grafana
With Grafana connected to Prometheus, you can now create dashboards to monitor various aspects of your application’s performance. Visualize metrics such as:
- Average request duration
- 95th percentile latency
- Success and error rates
- Total request count
Below is a sample screenshot of a Grafana dashboard that showcases request metrics for the /query
route.
Wrapping Up
Congratulations! You’ve now set up a robust monitoring stack with OpenTelemetry, Prometheus, and Grafana. By collecting detailed telemetry from your application and visualizing it in real-time, you’re equipped to:
-
Identify latency bottlenecks: Monitor endpoints like
/query
for slow responses and optimize them. - Track error rates: Quickly identify when requests fail and respond proactively.
- Gauge overall health: Use total request counts, latency percentiles, and success rates to understand your application’s performance under various loads.
With this stack in place, you can detect issues before they affect users, optimize resource usage, and maintain a high-performance application. Monitoring isn’t just about fixing problems—it’s about making your application resilient and proactive in delivering a great user experience.
Happy monitoring! 🚀
Top comments (0)