Distributed systems help an organisation absorb countless benefits but at the cost of complexity. With the rise of the adoption of container orchestrators like Kubernetes, a need for monitoring and alerting systems came.
One such system is Prometheus which is famous for being “the Kubernetes monitoring solution.”
This post will explore Prometheus in a beginner way without any intricate details that generally scare a novice.
Let’s Start!
What is Prometheus?
SoundCloud first developed Prometheus in 2012 after determining that their existing monitoring tools were insufficient for their needs. However, the first release of Prometheus v1 would not be available to the public until 2016, but now it’s fully open sourced and a CNCF graduate project.
Prometheus is a time series database-based Monitoring and Alerting tool. Prometheus gathers data from apps and systems and allows you to visualise it and set up alerts. We will go deeper with this in a minute, but why do we need monitoring again?
The Stroy Behind Prometeus and Cloud
Long before you start reading this blog, there used to stay giant creatures called Monoliths. They were slow and had complexity. It was hard to understand them and harder to diagnose problems. But then we had an evolution, and micro creatures was a better option. They had a lot of benefits and had less complexity with them. But managing one creature is more demanding than managing hundreds or thousands.
You’re correct!
So, we need someone to orchestrate the herd and someone to look over or monitor the flock. The orchestrator is Kubernetes, and Prometheus manages the monitoring. Suppose your micro creatures had one telephone in their camp. Prometheus helps you look for how many hours they’re talking, whether they can connect to the telephone service and how many times they can’t call someone. All without going to every individual’s tent and enquiring!
Your micro creatures are microservices in a container (or Virtual Machine) that performs a job.
Prometheus Architecture
The Prometheus Server consists of 3 components:
Time series database (TSDB)
All of the metrics get stores in a time series database which is optimised for time-stamped data that needs measuring changes over time and also help query efficiently. So, simply it holds every generated metric for you to reference later.
The architecture of Prometheus and its components Source: Prometheus
Data Retrieval Worker
It collects (by pulling/scraping) metrics from external sources and stores them in the TSBD.
Web UI
It provides a simple web interface UI for configuring and querying the database to help you visualise your queries. We receive centralised management and configuration using the server that can track when we show new data from each unique target.
How does Prometheus work?
Querying and Scraping
Prometheus scrapes metrics from our apps and services regularly via HTTP endpoints from the target systems that have the client library installed. So, in order to collect metrics, you don’t need to install custom software or configure anything on your physical servers, nor do you need to do so on your container images.
This pull based approach use by Prometheus is better than the traditional push based approach because:
"You can run your monitoring on your laptop when developing changes. You can more easily tell if a target is down.
You can manually go to a target and inspect its health with a web browser." — Prometheus — FAQ
Service discovery
Prometheus was built from the bottom up to function in dynamic environments like Kubernetes and requires very minimal configuration when first installed. As a result, it undertakes automatic discovery of operating services in order to make a “best guess” as to what it should be monitoring.
Prometheus Exporters Source: Devconnected
Snapshots and Querying
Now, as your data is scraped what’s next? We want to store the metrics.
Prometheus stores a database record with each scraped metric’s snapshot of the data. You can use PromQL queries and the Prometheus web UI, or other tools like Grafana to explore how metric data evolves over time by querying and analysing metric data snapshots.
Also, you can label your metrics to manage the metrics. Now you don’t have to worry about searching the metrics name manually after changes because you can search them by labels. How cool is that?
Different Type of Metrics
Metrics is the specific set of data from an endpoint. It consists of TYPE
and HELP
Attributes which are described below.
HELP
For a token like HELP, it is expected that another token would follow, which is the metric name. The docstring for the metric name consists of all of the tokens that are left over. HELP lines should be composed of a metric name at a time.
TYPE
There are two more tokens required if the token is TYPE. The first identifier is the metric name, and the second identifier is a qualifier specifying what kind of metric it is (for example, a counter, gauge, histogram, summary, or type unknown). A single TYPE line is allowed for each metric name. Metric names should have their TYPE lines, which specify the metric’s unit of measurement, positioned first before any metric samples are given. If a metric name does not have a TYPE line, the type is left as untyped
.
Metrics Type
Counter
It simply counts the metrics. We’ll count these by things like the number of errors or the number of requests. This type is recommended unless your metric value can fluctuate and go down.
Gauge
It is ideal for metrics that fluctuate, like CPU utilisation.
Histogram
A histogram is used to measure information by counting information in specific observed buckets. Additionally, it presents a complete account of the total amount of all observed values, so it is one of the most challenging types of metrics to read.
Summary
For every observation, a summary takes a sample (usually things like request durations and response sizes). It likewise gives you the total number of observations and a list of all observed values, but it is capable of generating a customisable number of quantiles across a sliding time window.
You can only utilise four sorts of metrics (Counter, Gauge, Summary and Histogram), therefore choose the best one for the purpose.
Fail Proofs
When “pulling” metrics is not possible (for example, short-lived jobs that will not live long enough to be scraped), Prometheus provides a Pushgateway that allows applications to still push metric data if necessary. Essentially, we get the best of both worlds.
Alerting
You can use PromQL, the querying language used by prometheus to retrieve metrics via the database and use Prometheus Web UI, API clients and Grafana to visualise. Setting up external alerting services like pagerduty and email is even possible via Alertmanager.
Final Thoughts
Prometheus is popular, and everyone in the CNCF landscape knows this. The features help it become one of the most promising monitoring systems and beat the likes of Amazon CloudWatch, ApplicationInsights, NewRelic, which are push based platforms. But, Prometheus itself states that the pull based factor shouldn’t be the major point when considering a monitoring system. Also, even if Prometheus serves as Kubernetes best friend it’s capable of more!
If you want to get started with prometheus no other place is better than the First Steps with Prometheus documentation.
Top comments (0)