OpenTelemetry is buzzing lately and everybody is speaking about distributed tracing as a mean to solve microservices debugging hell and minimize MTTR.
Developers want to spend less time on debugging and have a better expeirence. Dev leaders need to ensure velocity and quality, lower the cost of the engineeering unit and find creative ways to deliver at scale (especially in today's atmosphere, when saving costs are a main KPI in the tech ecosystem.
Debugging in microservices is a big pain. huge.
Dependencies, complexities, endless components- these aspects make route cause analysis extreamly complex and traditional monitoring methods, that are built on statistical analysis, such as logging, can't offer a reasnable solution.
Microservices observabiity, which differs from monitoring as it is built on data instrumentation and recording events, is needed.
OPenTelemetry (OSS project) distributed tracing capabilities compensate for traditional observability methods, that master monolith apps but are hardly sufficient in observing and debugging distributed environments.
OTel allows developers to instrument their microservices apps with the standard instrumentation library that generates telemetry data from various sources, such as logs, metrics, and traces.
OpenTelemetry agents can then collect and export this telemetry data to multiple systems for logging, tracing, and monitoring. A main advantage of OpenTelemetry is that it aims to be vendor-agnostic, meaning that the data collected can be sent to any backend and moving between them doesn't require any client-side changes. Instrumentation allows deep dive into contextual error data that is needed for fast root cause analysis.
In order to make OpenTelemetry actionable, there's a need to export the data to a 3rd party tool that can help generating insights, such as Jaeger or alike.
Some of these tools are better than others, easier to maintain, offer advanced trace based visualization and highlight granular payload data that is critical to understand and solve issues such as bottlenecks, latency, and more.
Overview this article about OTel specifics as well as best practices and tools to minimize overhead and maximize the value >>
Top comments (0)