gaurang101197

Posted on Aug 10 • Edited on Aug 13

Plotting Histogram Distribution In Grafana

#grafana #prometheus #observability

If you are looking for plotting histogram distribution as shown in above image then this blog is for you. This blog does not cover internals of histogram and Grafana.

Why Histogram Distribution

Histogram distribution gives overview of how data distribution looks like for selected period.
API latency histogram is incredibly useful for understanding the performance and behavior of API.
Range of Latency: Histogram distribution shows how latency is spread out across different buckets. This helps us understand the typical range of response times.

Pre-requisite

Internals of histogram: https://prometheus.io/docs/practices/histograms/
Better to have hands on experience on how Prometheus histogram works and prior experience with Grafana.

Use-case

Plot latency distribution for selected time period, for e.g. API latency, db latency.

Setup

Measure latency metric using Prometheus Histogram.
Metric name is my_latency_metric.
Histogram buckets used are [0, 80, 160, 320, 640, 1280, 2560, 5120].

Step 1: Panel visualization

Select Bar Gauge Panel as panel.

Step 2: Query

round(sum by (le) (increase(my_latency_metric_bucket{label_name=~"label_value"}[$__interval])))

label_name=~"label_value" - [Optional] filters the metric.
increase - Calculate the difference between two data points. We have used $__interval to make use of appropriate interval automatically calculated by Grafana.

Quote from prometheus documentation.

increase(v range-vector) calculates the increase in the time series in the range vector. Breaks in monotonicity (such as counter resets due to target restarts) are automatically adjusted for. The increase is extrapolated to cover the full time range as specified in the range vector selector, so that it is possible to get a non-integer result even if a counter increases only by integer increments.

increase acts on native histograms by calculating a new histogram where each component (sum and count of observations, buckets) is the increase between the respective component in the first and last native histogram in v.
sum by (le): Sums metric values by le (where le refers histogram bucket label name). Suppose you measure latencies of your API which is deployed on k8s with multiple pods and you have pod id as label name. In this case, each pod emits latency data and we want to get picture of overall deployment. So we need to aggregates data of all pods and sum by (le) perform this. It aggregates increase happens in each pod by le.
round: As you might know, increase can return non integer value and if we see non-integer number for counter then it looks bad. To avoid this, we use round function to convert all values to integer.

Step 3: Query Options

Select heatmap in Format and type {{le}} in Legend as shown in below image.

Step 4: Panel Query Options

Select Min Interval as twice of Scrape Interval. In given example, I have used 1m. This handles variation in Scrape Interval If any.

Step 5: Value options

Want to know more ?: https://grafana.com/docs/grafana/latest/panels-visualizations/visualizations/bar-gauge/#value-options

Select Total as calculation as shown in below image.

Reference

https://grafana.com/blog/2020/06/23/how-to-visualize-prometheus-histograms-in-grafana/

DEV Community

Plotting Histogram Distribution In Grafana

Why Histogram Distribution

Pre-requisite

Use-case

Setup

Step 1: Panel visualization

Step 2: Query

Step 3: Query Options

Step 4: Panel Query Options

Step 5: Value options

Reference

Top comments (0)

Read next

Classic vs. Agile Groups: A Comparative Analysis

Snello in Outsourcing Computer software Development: Leveraging Flexibility for Success

Azure DevOps Flutter IOS CI Pipeline

Day 12: Layouts and Floats