Understanding Event-Driven Metrics

#metrics #kubernetes #tutorial

Event-driven systems at the moment are dominating when it comes to software system design. Some of the characteristics of event-driven systems are asynchronous
actions and eventual consistency.

In traditional systems, each call will produce an immediate response. While in event-driven systems(EDS) the response is not immediate but it is made when
the system is ready to process a call. The EDS is not easy to debug or predict the next state. But we can try to understand what is happening in the system by looking
at the events produced by the system.

I'll try to explain the motivation behind the metrics, the key metrics, and how to design metrics for the EDS. As well as some of the methodologies
used in practice.

What is Metric?

The measurement of a quantitative attribute of a system would be metric.

The system produces the metric. For the EDS we are adding the time when the measurement happens into account. In the end, the metric is a triplet of name,
value, and timestamp. The name is a unique identifier of the metric.

What can I do with metrics?

In essence, metrics are for decision-making, for example:

data processing is slow: refactor and improve performance
there are too many calls to the system: scale up
there are too few calls to the system: scale down
some features are used more than others: improve them
some features are not used at all: remove them
too many errors: let's call someone to fix it

I think these sound familiar.

Looking at the metrics over time gives us the ability to detect and predict problems in the system. It is crucial for further development and maintenance. The
day-to-day metric usage is for monitoring and alerting. The same metric can be used for resource planning, incident analysis, SLOs and SLIs, and many more.

Metric design

The metric type and what needs to be measured depends on of system component. The EDS is composed of multiple components with different natures. Each of those
components has characteristics that will drive metric design.

For example, the metrics for the web server facing the users will have different metrics than the event streaming platforms, like Apache Kafka, or the database.

The metrics for the web server will target the user experience, like request duration, request rate, request size, error rate, etc.

While, metrics for the event streaming platforms will target the system health, like the number of messages in the topic, message size, number of received messages,
the number of sent messages, and so on.

Databases will have different metrics, like the number of queries, query duration, number of rows returned, errors, etc.

For all mentioned components we can have metrics for the system health, like CPU usage, memory usage, disk usage, network usage, etc. These metrics are common for all.

Metric methodologies

Taking all, we can say that the key metrics are:

Latency, or duration - distribution of time it takes to complete an action
Traffic, or rate - distribution of the number of actions per time
Errors - distribution of the number of errors per time
Saturation - the resource use level

If you are interested in this topic, please continue reading the article on my blog.

DEV Community

Understanding Event-Driven Metrics

What is Metric?

What can I do with metrics?

Metric design

Metric methodologies

Top comments (0)

Read next

Building a website using Markdown content with Next.js App Router and Fusionable

Rust TUI Chat Application - Mastering Terminal User Interfaces

Evaluating 2 Popular Service Meshes

Understanding dangerouslySetInnerHTML in React: A Complete Guide