Ashok Nagaraj

Posted on Nov 9, 2022

Open-telemetry collector: the powerful recipe for observability pipelines

#opentelemetry #kubernetes #observability

OTel collector is a component used to receive, process and export telemetry data (signals) from sources to observability backends like elasticsearch, cassandra, Datadog, NewRelic ...

Advantages

Avoids resource contentions due to scalable nature and variety of deployment modes
Customizable at multiple levels, no need for constant reboot/reload of the pipeline
Tolerant to network partitions to the most part

Use a collector when you

need a common ingestion point for variety of signals like metrics and traces
need to collect signals from multiple sources like application, infra, cluster, framework, databases
apply transformation to the signal data before storing it in the backend
enrich signals with additional meta-data
filter-out signals based on various predefined criteria
send signal data to multiple observability backends
build a loosely coupled, scalable pipeline for signal data flow

Installation

Create a configuration file, say config.yaml; details here
Run the collector

$ docker pull otel/opentelemetry-collector:latest
$ docker run [-d] -v $(pwd)/config.yaml:/etc/otelcol/config.yaml [port-config] otel/opentelemetry-collector:latest

# port configuration
      - "1888:1888"   # pprof extension
      - "8888:8888"   # Prometheus metrics exposed by the collector
      - "8889:8889"   # Prometheus exporter metrics
      - "13133:13133" # health_check extension
      - "4317:4317"   # OTLP gRPC receiver
      - "4318:4318"   # OTLP http receiver
      - "55679:55679" # zpages extension

Configuration

Has 3 (or 4) components all of which needs to be enabled in service section
Note: Atleast one service > pipeline is mandatory

Receivers - describe modes of how collector gets the data IN and can be PUSH or PULL based (eg: host metrics, application metrics, zipkin traces).
Processors - run on the data being transported and optionally massage, transform and filter-out data (eg: filter, batch, samplers)
Exporters - specify how data is sent out to one/more configured backends, can be PUSH or PULL based (eg: file, jaeger, prometheus). They generally involve details of authentication in production environments.
Extensions(optional) - provide additional capabilities to the collector, but not requiring direct access to signal data (eg: health_check, pprof)

More info: configuration wiki

Deployment modes

Agent: A collector instance running on the same node as the application (binary, side-car or daemonset)
Gateway: One or more instances collectively running centrally as a standalone service. It can often offer advanced capabilities like simple load-balancing, tail based sampling, independent scaling .. generally acting as a receiver for the agents.

Demo

Gist to working demo
Note: metrics exporter does not seem to work based on official documentation

OTel Deployment patterns

Source of all the below information CNCF presentation

Basic - instrument and send to a collector
Used when application is instrumented with OTel SDK and signals are sent to a predefined collector

Basic - fanout
Used when signals are (processed and) sent to multiple destinations. Useful in situations where multiple views/perspectives of same data is to be generated (eg: one from Jaeger one from Datadog)

Normalizer
Collector works as an intermediate proxy and massages the data before passing on to destination; used when common processors are to be applied on the incoming signals

Kubernetes sidecar
Workloads send signals to a OTel collector sidecar which is sent over to a collector residing in a central namespace (which processes and sends it to destination). Advantages of this pattern is decoupled central collector, easily customizable side-car and implicit load balancing.

Kubernetes daemonset
Collector is deployed as a daemonset; while it eases management, multi-tenancy and scaling requirements are hard to customize.

Loadbalanced collector
A central load-balancing collector is used to aggregate all signals from a given source to a given backend collector (like how session affinity is handled). The idea behind the implementation is that any given collector should provide full picture of the source application independently.

Multicluster
A common Otel collector is deployed on a central cluster which acts as the final stop before writing to destinations. It is useful in regulatory scenarios where common point of control need be established

Multi-tenant
Multiple destinations generally are involved and Otel collector processes and sends to multiple destinations based on filtering tags

Per signal
An otel collector per signal type (eg: one for metrics, one for traces ..). Useful to establish saperate observability pipelines per signal. Note: A PUSH based collector can be scaled easily while PULL based (prometheus) is not straight-forward given the idempotency semantics.

DEV Community

Open-telemetry collector: the powerful recipe for observability pipelines

Advantages

Use a collector when you

Installation

Configuration

Deployment modes

Demo

OTel Deployment patterns

Top comments (0)

Read next

Writing Kubernetes Manifests: From Beginner to Advanced

🧑‍🚀 Road to be a Kubestronaut, C'est parti ! 🚀

GitOps Across Clusters — How ArgoCD and Kustomize Makes It Simple

Best Free Platforms to Practice Kubernetes for Free