Observability is the ability to measure the internal states of a system by examining its outputs. A system is considered “observable” if the current state can be estimated by only using information from outputs, namely sensor data.
In the context of microservices, Observability allows teams to:
- Monitor modern systems more effectively
- Find and connect effects in a complex chain and trace them back to their cause
- Enable visibility for system administrators, IT operations analysts and developers into the entire architecture
3 pillars of observability
Metrics - A metric is a numeric value measured over an interval of time and includes specific attributes such as timestamp, name, KPIs and value. Unlike logs, metrics are structured by default, which makes it easier to query and optimize for storage, giving you the ability to retain them for longer periods.
Logs - A log is a text record of an event that happened at a particular time and includes a timestamp that tells when it occurred and a payload that provides context. Logs come in three formats: plain text, structured and binary
Traces - A trace represents the end-to-end journey of a request through a distributed system. As a request moves through the host system, every operation performed on it — called a “span” — is encoded with important data relating to the microservice performing that operation.
By viewing traces, each of which includes one or more spans, you can track its course through a distributed system and identify the cause of a bottleneck or breakdown.
Instrumentation with Python
Let us start with a simple flask server.
$ pip install flask
import datetime
import flask
######################
## initialization
######################
app = flask.Flask(__name__)
start = datetime.datetime.now()
######################
## routes
######################
@app.route('/', methods=['GET'])
def root():
return flask.jsonify({'message': 'flask app root/'})
@app.route('/healthz', methods=['GET'])
def healthz():
now = datetime.datetime.now()
return flask.jsonify({'message': f'up and running since {(now - start)}'})
if __name__ == '__main__':
app.run(debug=True, host='0.0.0.0', port=5000)
Let us add OpenTelemetry(otel) libraries
$ pip install opentelemetry-api opentelemetry-sdk
Now start instrumenting, let us add tracing and a metric for counting number of times /healthz
is called
import datetime
import flask
from opentelemetry import trace
from opentelemetry import metrics
######################
## initialization
######################
app = flask.Flask(__name__)
start = datetime.datetime.now()
tracer = trace.get_tracer(__name__)
meter = metrics.get_meter(__name__)
hltz_counter = meter.create_counter('healthz_count', description='Number of /healthz requests')
######################
## routes
######################
@app.route('/', methods=['GET'])
def root():
return flask.jsonify({'message': 'flask app root/'})
@app.route('/healthz', methods=['GET'])
def healthz():
now = datetime.datetime.now()
hltz_counter.add(1)
return flask.jsonify({'message': f'up and running since {(now - start)}'})
if __name__ == '__main__':
app.run(debug=True, host='0.0.0.0', port=5000)
Run the instrumented code
$ opentelemetry-instrument --traces_exporter console --metrics_exporter console flask run
* Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on http://127.0.0.1:5000
...
Pass some traffic
$ curl localhost:5000
{"message":"flask app root/"}
$ curl localhost:5000/healthz
{"message":"up and running since 0:00:53.605913"}
Observe the terminal and check for healthz_count
127.0.0.1 - - [13/Oct/2022 09:16:54] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [13/Oct/2022 09:16:58] "GET /healthz HTTP/1.1" 200 -
{
"name": "/healthz",
"context": {
"trace_id": "0x7d30b2042efe9a4661cc427352119754",
"span_id": "0x479211d157c16733",
"trace_state": "[]"
},
"kind": "SpanKind.SERVER",
"parent_id": null,
"start_time": "2022-10-13T03:50:31.090144Z",
"end_time": "2022-10-13T03:50:31.090545Z",
"status": {
"status_code": "UNSET"
},
"attributes": {
"http.method": "GET",
"http.server_name": "127.0.0.1",
"http.scheme": "http",
"net.host.port": 5000,
"http.host": "localhost:5000",
"http.target": "/healthz",
"net.peer.ip": "127.0.0.1",
"http.user_agent": "curl/7.79.1",
"net.peer.port": 50286,
"http.flavor": "1.1",
"http.route": "/healthz",
"http.status_code": 200
},
"events": [],
"links": [],
"resource": {
"attributes": {
"telemetry.sdk.language": "python",
"telemetry.sdk.name": "opentelemetry",
"telemetry.sdk.version": "1.13.0",
"telemetry.auto.version": "0.34b0",
"service.name": "unknown_service"
},
"schema_url": ""
}
}
{"resource_metrics": [{"resource": {"attributes": {"telemetry.sdk.language": "python", "telemetry.sdk.name": "opentelemetry", "telemetry.sdk.version": "1.13.0", "telemetry.auto.version": "0.34b0", "service.name": "unknown_service"}, "schema_url": ""}, "scope_metrics": [{"scope": {"name": "app", "version": "", "schema_url": ""}, "metrics": [{"name": "healthz_count", "description": "Number of /healthz requests", "unit": "", "data": {"data_points": [{"attributes": {}, "start_time_unix_nano": 1665632818794016000, "time_unix_nano": 1665632825058633000, "value": 2}], "aggregation_temporality": 2, "is_monotonic": true}}], "schema_url": ""}], "schema_url": ""}]}
We have successfully generated traces and metrics (sometimes it takes a couple of seconds for them to show-up)
Top comments (0)