Table of Contents
- Background
- Goal
- A Minimal Interesting Example
- Future Directions
- Parting Thoughts
- References
- Console Output
Background
Deno is a runtime for JavaScript, TypeScript, and WebAssembly that is based on the V8 JavaScript engine and the Rust programming language.
OpenTelemetry (OTEL) is a collection of tools, APIs, and SDKs. It's used to instrument, generate, collect, and export telemetry data (metrics, logs, and traces) to help you analyze your software’s performance and behavior.
Until relatively recently, it wasn’t possible to bring the power of OpenTelemetry to bear on Deno. All of us were missing out on the information OpenTelemetry can gather, in particular for tracing.
Goal
This article will go over 1 simplified example in depth of using OpenTelemetry for tracing in Deno.
The aim is primarily to serve as an introduction to OpenTelemetry concepts for folks already somewhat familiar with Deno.
A Minimal Interesting Example
Here is a visualization of 1 trace emitted by the example code (taken from Honeycomb, an observability vendor).
The "HTTP GET" span ending after its parent is due to a small bug in otel-js. For now, just pretend it ended before its parent did.
Note the following things you can tell without even looking at the code
- Different parts of the code have their duration measured
- Outgoing HTTP requests are captured
- The structure of the code is probably reflected in the structure of the diagram
And now here is the code that was used to generate that telemetry.
import { registerInstrumentations } from "npm:@opentelemetry/instrumentation";
import { FetchInstrumentation } from 'npm:@opentelemetry/instrumentation-fetch';
import { NodeTracerProvider } from "npm:@opentelemetry/sdk-trace-node";
import { Resource } from "npm:@opentelemetry/resources";
import { SemanticResourceAttributes } from "npm:@opentelemetry/semantic-conventions";
import { BatchSpanProcessor, ConsoleSpanExporter } from "npm:@opentelemetry/sdk-trace-base";
import { OTLPTraceExporter } from "npm:@opentelemetry/exporter-trace-otlp-proto";
import opentelemetry from "npm:@opentelemetry/api";
import { serve } from "https://deno.land/std@0.180.0/http/server.ts";
// autoinstrumentation.ts
registerInstrumentations({
instrumentations: [new FetchInstrumentation()],
});
// Monkeypatching to get past FetchInstrumentation's dependence on sdk-trace-web, which has runtime dependencies on some browser-only constructs. See https://github.com/open-telemetry/opentelemetry-js/issues/3413#issuecomment-1496834689 for more details
// Specifically for this line - https://github.com/open-telemetry/opentelemetry-js/blob/main/packages/opentelemetry-sdk-trace-web/src/utils.ts#L310
globalThis.location = {};
// tracing.ts
const resource =
Resource.default().merge(
new Resource({
[SemanticResourceAttributes.SERVICE_NAME]: "deno-demo-service",
[SemanticResourceAttributes.SERVICE_VERSION]: "0.1.0",
})
);
const provider = new NodeTracerProvider({
resource: resource,
});
const consoleExporter = new ConsoleSpanExporter();
provider.addSpanProcessor(new BatchSpanProcessor(consoleExporter));
const traceExporter = new OTLPTraceExporter();
provider.addSpanProcessor(new BatchSpanProcessor(traceExporter));
provider.register();
// Application code
const tracer = opentelemetry.trace.getTracer(
'deno-demo-tracer'
);
const port = 8080;
const handler = async (request: Request): Promise<Response> => {
// This call will be autoinstrumented
await fetch("http://www.example.com");
const span = tracer.startSpan(`constructBody`);
const body = `Your user-agent is:\n\n${request.headers.get("user-agent") ?? "Unknown"}`;
span.end();
return new Response(body, { status: 200 });
};
await serve(instrument(handler), { port });
// Helper code
function instrument(handler) {
async function instrumentedHandler(request) {
let response;
await tracer.startActiveSpan('handler', async (span) => {
response = await handler(request);
span.end();
});
return response;
}
return instrumentedHandler;
}
It’s a lot! But let’s take a look at it piece-by-piece.
The Boilerplate
Autoinstrumentation
This is the code for the fetch autoinstrumentation
import { registerInstrumentations } from "npm:@opentelemetry/instrumentation";
import { FetchInstrumentation } from 'npm:@opentelemetry/instrumentation-fetch';
...
registerInstrumentations({
instrumentations: [new FetchInstrumentation()],
});
This monkeypatches the global fetch
so that all network calls made with it are instrumented (i.e. have data on them recorded in telemetry).
It saves you the trouble of needing to instrument every fetch
call in application code yourself. And it also instruments fetch
calls your dependencies are making, which may have been otherwise impossible to track.
Span Processing and Exporting
This is the code for setting up the span processing and exporting pipelines.
import { BatchSpanProcessor, ConsoleSpanExporter } from "npm:@opentelemetry/sdk-trace-base";
import { OTLPTraceExporter } from "npm:@opentelemetry/exporter-trace-otlp-proto";
...
const consoleExporter = new ConsoleSpanExporter();
provider.addSpanProcessor(new BatchSpanProcessor(consoleExporter));
const traceExporter = new OTLPTraceExporter();
provider.addSpanProcessor(new BatchSpanProcessor(traceExporter));
Once a span has ended, it will be sent to a span processor, which will decide what to do with it and eventually pass it on to an exporter.
In this case BatchSpanProcessor
is being used, meaning that spans are queued up in-memory and flushed in batches via a setTimeout
every 5 seconds.
Exporting to the Console and to a Tracing Vendor
The first “endpoint” spans are exported to is the console, via the ConsoleSpanExporter
. This is useful to have for debugging purposes, especially when you’re not seeing traces show up in your vendor but you are seeing them in the console.
The second endpoint spans are exported to is your tracing vendor (e.g. Honeycomb, NewRelic, etc...), via the OTLPTraceExporter
. It will depend on the vendor, but specifying your vendor’s remote endpoint and auth credentials as environment variables should be enough
export OTEL_EXPORTER_OTLP_ENDPOINT=<your vendor HTTP OTLP ingest endpoint>
export OTEL_EXPORTER_OTLP_HEADERS=<vendor specific auth credentials>
This should enable the exporter code to send telemetry via OTLP (OpenTelemetry Line Protocol) over HTTP to your tracing vendor.
The App Code
OpenTelemetry’s API Surface
Everything so far has been part of the OpenTelemetry SDK, and should not be touched directly by application code.
Application code should only ever have to interact with the OpenTelemetry API. The API then hooks up to the SDK behind the scenes to do all of the things previously discussed.
To create spans and other instrumentation, application code should use the OpenTelemetry API, as shown below.
import opentelemetry from "npm:@opentelemetry/api";
...
const tracer = opentelemetry.trace.getTracer(
'deno-demo-tracer'
);
...
const span = tracer.startSpan(`constructBody`);
const body = `Your user-agent is:\n\n${request.headers.get("user-agent") ?? "Unknown"}`;
span.end();
...
async function instrumentedHandler(request) {
let response;
await tracer.startActiveSpan('handler', async (span) => {
response = await handler(request);
span.end();
});
return response;
}
A key, subtle difference here is between startSpan
and startActiveSpan.
-
startSpan
- Creates a span, finds the currently “active” span, and adds the newly created span as a child of it
-
startActiveSpan
- Creates a span, finds the currently “active” span, and adds the newly created span as a child of it.
- Makes the new span the “active” span, so all spans created in the function passed in the 2nd parameter will be added as child spans of it
The difference is that startActiveSpan
makes a new parent by default, whereas startSpan
does not.
In order to have a span created by startSpan
become “active” and the parent of any child spans, you have to do this
const ctx = opentelemetry.trace.setSpan(
opentelemetry.context.active(),
wantsToBeAParentSpan
);
Then calling startSpan
afterwards will create the new spans as children of wantsToBeAParentSpan
.
Handling Concurrent Requests and the Need for Async Context
Imagine 2 requests (say request A and request B) come in near simultaneously. Both create their own parent spans (say parent span A and parent span B) and both end up waiting on the asynchronous fetch call to resolve at the same time.
Say request A’s fetch call finishes first. How does the tracer.startSpan
call know to attach itself to parent span A and not parent span B?
const handler = async (request: Request): Promise<Response> => {
await fetch("http://www.example.com");
const span = tracer.startSpan(`constructBody`);
const body = `Your user-agent is:\n\n${request.headers.get("user-agent") ?? "Unknown"}`;
span.end();
return new Response(body, { status: 200 });
};
We need some way to keep the context of the request across asynchronous events, so that tracer.startSpan
can know that this is still for request A, and it should make the new span a child of parent span A.
OpenTelemetry-JS handles this differently based on the situation
- Browser - Uses zone.js to keep track of these asynchronous contexts
- Node.js - Uses AsyncLocalStorage to keep track of asynchronous contexts
Thanks to Deno’s Node.js compatibility efforts all that’s needed to benefit from the Node.js async context management approach is to use the Node SDK
import { NodeTracerProvider } from "npm:@opentelemetry/sdk-trace-node";
...
const provider = new NodeTracerProvider({
resource: resource,
});
This will automatically make sure tracer.startSpan
attaches the span it creates to the correct parent span in the above situation.
Future Directions
This example is just scratching the surface of using OpenTelemetry in Deno. Here are some other ways to take it moving forward.
Adding Structured Data to Spans
One thing the example didn't highlight is the ability to add structured data to spans, just like you would with logs (this OTEL doc covers how to do this in more detail).
In certain use cases, you could potentially get away with only using traces and not using logs at all.
To Deploy and Beyond
As of this writing, Deno Deploy doesn’t support npm:
specifiers so it’s not possible to use the OTEL example above there.
But once that lands it should be good to go (!)
Actually Knowing CPU Time per Request
Deno Deploy (and other edge function services) limit or bill based on the milliseconds of CPU time your functions use.
However, there’s no easy way to profile or measure this today (as far as I’m aware).
With OpenTelemetry traces, you should be able to drop in spans in CPU-intensive parts of your code to zone in on what’s eating up CPU time.
And with autoinstrumentation, you could even measure the underlying framework you’re using too.
Autoinstrumenting Deno-specific APIs and Libraries
There are many Deno APIs outside of fetch
that could probably benefit from being autoinstrumented (e.g. Cache), in particular the ones that don’t already have a browser autoinstrumentation package available (i.e. for Deno-specific APIs, like filesystem ones).
There are also a wide variety of Deno-specific libraries (e.g. oak, Fresh) that could possibly benefit from being autoinstrumented too, either via a separate autoinstrumentation library or by getting built-in to the library itself.
Putting the “Distributed” in Distributed Tracing
This example was only of 1 service, but you could imagine running several different distributed services as well.
In that case, trace context propagation across the network (where Service B is aware of the parent span Service A created before sending Service B the request, so Service B can add its spans to the correct parent) is critical in creating 1 unbroken distributed trace across all services.
I haven't yet tried to see if this just works out-of-the-box, but I’m guessing it may need some effort to get firing on all cylinders.
All the Pillars
This example was just of tracing, but OpenTelemetry also covers metrics and logging as well (enabling cool things like finding trace exemplars that are contributing to a metric).
It would be interesting to see if those work out-of-the-box for Deno too.
Parting Thoughts
I personally didn't anticipate that the Node.js compat work done by the Deno team would impact OpenTelemetry support, so this came as a pleasant surprise to me. I have no idea what that involved but hats off to the Deno folks for their efforts.
And I hope there is more to come in making Deno the best server-side JS runtime around for production workloads!
References
- https://en.wikipedia.org/wiki/Deno_(software) is where I got the definition of Deno from
- https://opentelemetry.io/ is where I got the definition of OpenTelemetry from
- https://opentelemetry.io/docs/instrumentation/js/instrumentation/ and the remaining OTEL-JS docs have a lot of information on what more you can do with spans and how to go about doing it
- For more tips on troubleshooting beyond ConsoleSpanExporter, including turning on OpenTelemetry-JS's diagnostic Debug-level logging, see https://opentelemetry.io/docs/instrumentation/js/getting-started/nodejs/#troubleshooting
- As of this writing, Deno supports Node.js's AsyncLocalStorage API for async context management except for setTimeout per Luca Casonato
- This older article I wrote collects some notes on the history of async context management in JS
Console Output
For the curious, here is the output that the ConsoleSpanExporter
generates after the server handles a request
{
traceId: "9a9a625e1562e9847ffe97e09fcf1bea",
parentId: "f80a96f14dc334d8",
traceState: undefined,
name: "constructBody",
id: "5a1974974e18fcf6",
kind: 0,
timestamp: 1680829418679000,
duration: 195,
attributes: {},
status: { code: 0 },
events: [],
links: []
}
{
traceId: "9a9a625e1562e9847ffe97e09fcf1bea",
parentId: undefined,
traceState: undefined,
name: "handler",
id: "f80a96f14dc334d8",
kind: 0,
timestamp: 1680829418663000,
duration: 16340,
attributes: {},
status: { code: 0 },
events: [],
links: []
}
{
traceId: "9a9a625e1562e9847ffe97e09fcf1bea",
parentId: "f80a96f14dc334d8",
traceState: undefined,
name: "HTTP GET",
id: "7ec3993d6f40671c",
kind: 2,
timestamp: 1680829418669000,
duration: 12000,
attributes: {
component: "fetch",
"http.method": "GET",
"http.url": "http://www.example.com/",
"http.status_code": 200,
"http.status_text": "OK",
"http.host": "www.example.com",
"http.scheme": "http",
"http.user_agent": "Deno/1.32.1"
},
status: { code: 0 },
events: [],
links: []
}
{
traceId: "5086d5f4412c11a533d81044998ac7d6",
parentId: undefined,
traceState: undefined,
name: "HTTP POST",
id: "698b36fbc35e09c9",
kind: 2,
timestamp: 1680829423694000,
duration: 94000,
attributes: {
component: "fetch",
"http.method": "POST",
"http.url": "https://api.honeycomb.io/v1/traces",
"http.status_code": 200,
"http.status_text": "OK",
"http.host": "api.honeycomb.io",
"http.scheme": "https",
"http.user_agent": "Deno/1.32.1"
},
status: { code: 0 },
events: [],
links: []
}
Top comments (2)
For the curious, all of that initial OpenTelemetry boilerplate code seems to take 10-20ms to run on Github Codespaces.
We'll have to wait and see how that translates to Deno Deploy, but that could take a bite out of the 50ms max CPU budget, at least for the 1st request that causes a new isolate to spin up (subsequent requests to that isolate shouldn't pay the penalty)
Awesome article, thanks!