DEV Community

gaurang101197
gaurang101197

Posted on • Edited on

Plotting Histogram Distribution In Grafana

Histogram Distribution

If you are looking for plotting histogram distribution as shown in above image then this blog is for you. This blog does not cover internals of histogram and Grafana.

Why Histogram Distribution

  • Histogram distribution gives overview of how data distribution looks like for selected period.
  • API latency histogram is incredibly useful for understanding the performance and behavior of API.
  • Range of Latency: Histogram distribution shows how latency is spread out across different buckets. This helps us understand the typical range of response times.

Pre-requisite

  1. Internals of histogram: https://prometheus.io/docs/practices/histograms/
  2. Better to have hands on experience on how Prometheus histogram works and prior experience with Grafana.

Use-case

Plot latency distribution for selected time period, for e.g. API latency, db latency.

Setup

  • Measure latency metric using Prometheus Histogram.
  • Metric name is my_latency_metric.
  • Histogram buckets used are [0, 80, 160, 320, 640, 1280, 2560, 5120].

Step 1: Panel visualization

Select Bar Gauge Panel as panel.

Bar gauge

Step 2: Query

round(sum by (le) (increase(my_latency_metric_bucket{label_name=~"label_value"}[$__interval])))
Enter fullscreen mode Exit fullscreen mode
  1. label_name=~"label_value" - [Optional] filters the metric.

  2. increase - Calculate the difference between two data points. We have used $__interval to make use of appropriate interval automatically calculated by Grafana.

    Quote from prometheus documentation.

    increase(v range-vector) calculates the increase in the time series in the range vector. Breaks in monotonicity (such as counter resets due to target restarts) are automatically adjusted for. The increase is extrapolated to cover the full time range as specified in the range vector selector, so that it is possible to get a non-integer result even if a counter increases only by integer increments.

    increase acts on native histograms by calculating a new histogram where each component (sum and count of observations, buckets) is the increase between the respective component in the first and last native histogram in v.

  3. sum by (le): Sums metric values by le (where le refers histogram bucket label name). Suppose you measure latencies of your API which is deployed on k8s with multiple pods and you have pod id as label name. In this case, each pod emits latency data and we want to get picture of overall deployment. So we need to aggregates data of all pods and sum by (le) perform this. It aggregates increase happens in each pod by le.

  4. round: As you might know, increase can return non integer value and if we see non-integer number for counter then it looks bad. To avoid this, we use round function to convert all values to integer.

Step 3: Query Options

Select heatmap in Format and type {{le}} in Legend as shown in below image.

Latency Histogram Query Option

Step 4: Panel Query Options

Select Min Interval as twice of Scrape Interval. In given example, I have used 1m. This handles variation in Scrape Interval If any.

Panel Query Options

Step 5: Value options

Want to know more ?: https://grafana.com/docs/grafana/latest/panels-visualizations/visualizations/bar-gauge/#value-options

Select Total as calculation as shown in below image.

Bar Gauge Value Option

Reference

  1. https://grafana.com/blog/2020/06/23/how-to-visualize-prometheus-histograms-in-grafana/

Top comments (0)