Gernot Glawe for AWS Community Builders

Posted on Dec 20, 2022 • Originally published at tecracer.com

Serverless Spy Vs. Spy Chapter 3: X-Ray vs Jaeger - Send Lambda traces with open telemetry

#serverless #telemetry #aws #jaeger

In modern architectures, Lambda functions co-exist with containers. Cloud Native Observability is achieved with open telemetry. I show you how to send open telemetry traces from Lambda to a Jaeger tracing server. Let's see how this compares to the X-Ray tracing service.

As the Lambda setup with Typescript and Python already had a good coverage in chapter 2, I will stick to GO here. The CDK code is easy to migrate.

Setting

)
Architecture overview

The Lambda Function (2) sends traces to the jaeger backend with the OpenTelemetry Protocol. Because we do not want to accept requests from the internet, Lambda has to run within the network of the VPC called basevpc. This VPC is created (1) at first. The jaeger container announces its IP via the AWS Serviced-Discovery service.

To access the frontend/UI of jaeger a Load Balancer is created between the internal jaeger service private IP and the internet.

The CDK code, the application code and jaeger itself are written on GO.

Lambda

Lambda Resources

AWS Lambda Resources

  1   lambdaPath := filepath.Join(path, "../dist/main.zip")
  2   adotLayer := lambda.LayerVersion_FromLayerVersionArn(this, aws.String("adotlayer"),
  3     aws.String("arn:aws:lambda:eu-central-1:901920570463:layer:aws-otel-collector-amd64-ver-0-62-1:1"))
  4   fn := lambda.NewFunction(this, aws.String("adotlambda"),
  5   &lambda.FunctionProps{
  6     Vpc: vpc,
  7     Handler: aws.String("main"),
  8     Runtime: lambda.Runtime_PROVIDED_AL2(),
  9     Tracing: lambda.Tracing_ACTIVE,
 10     Environment: &map[string]*string{
 11       "OPENTELEMETRY_COLLECTOR_CONFIG_FILE" : aws.String("/var/task/config.yml"),
 12       // "https://opentelemetry.io/docs/concepts/sdk-configuration/general-sdk-configuration/"
 13       "OTEL_SERVICE_NAME" : aws.String("documentcounter"),
 14     },
 15     AllowPublicSubnet: aws.Bool(true),
 16     Layers: &[]lambda.ILayerVersion{
 17         adotLayer,
 18     },
 19     },
 20   )

You have to define the following configuration, see chapter 2:

Line 2:3 - The Lambda Layer for the otel collector
Line 6 - run in the VPC
Line 1 - Set the configuration file location
Line 16 - Activate the layer

Lambda Code

In the application you have to do:

1. Configure the middleware to send traces

otelaws.AppendMiddlewares(&cfg.APIOptions)
ClientDDB = dynamodb.NewFromConfig(cfg)

2. Propagate the context through all functions:

From main:

tp, err := xrayconfig.NewTracerProvider(ctx)
//...
lambda.Start(otellambda.InstrumentHandler(HandleRequest, xrayconfig.WithRecommendedOptions(tp)... ))

to HandleRequest

func HandleRequest(ctx context.Context, s3Event events.S3Event) (string, error) {
//...
putItem(ctx,s3input)

to putitem

func putItem(ctx context.Context, itemID string){
//...
result, err := ClientDDB.PutItem(ctx,input)

)

In the app, at the end an s3 listobjects is performed, so that you have two AWS services in the traces.

See chapter 2 for more details.

Now Lambda could send traces, so we need a target. I chose Jaeger, an open-source, end-to-end distributed tracing, originally provided by Uber Technologies.

Jaeger Installation

VPC

We provide a VPN to run the ECS service - just a VPC with a private subnet.

Fargate Service

)
The JAEGER service

The front end will be provided on port 16686, the OTEL request will go to port 4317 via gRPC. All jaeger ports are described in the deployment part of the jaeger documentation.

To access the jager front end with a DNS name, you have to have a domain. So change the following configurations in jaeger/cluster.go:

var SERVICE_NAME = "jaeger"
var NAMESPACE = "otel.letsbuild-aws.com"
var HOSTED_ZONE_ID = "Z042038724KH99T9LFKK6"
var DNS_NAME = "service.letsbuild-aws.com"

In this example, I have created a subdomain "service.letsbuild-aws.com" for the Load Balancer. The NAMESPACE is used for service discovery. You do not need a real domain for service discovery.

To get jaeger up and running, there is an all-in-one image we use:

jaegertracing/all-in-one:1.39.0

The jaeger container can be configured via the environment:

"SPAN_STORAGE_TYPE":      aws.String("memory"),
"COLLECTOR_OTLP_ENABLED": aws.String("true"),
"LOG_LEVEL":              aws.String("debug"),

To keep it (almost) simple, the storage is set to memory. In production, you could use Cassandra, elasticsearch and other backends. As stated in the jaeger documentation, all CLI parameters can be set via ENV variables. To be able to receive otlp data, its enabled.

The management ui and otlp ports are configured for the container:

task.AddContainer(aws.String("jaegerContainer"),
    &ecs.ContainerDefinitionOptions{
        Image:         ecs.ContainerImage_FromRegistry(aws.String("jaegertracing/all-in-one:1.39.0"), nil),
        ContainerName: aws.String("jaeger-all"),
//...
        PortMappings: &[]*ecs.PortMapping{
            {
                ContainerPort: MANAGEMENT_PORT,
                HostPort:      MANAGEMENT_PORT,
                Protocol:      ecs.Protocol_TCP,
                // management
            },
            {
                ContainerPort: aws.Float64(4317),
                HostPort:      aws.Float64(4317),
                Protocol:      ecs.Protocol_TCP,
                // "otel-grpc"
            },
//...

See the jaeger/cluster.go file for the complete source.

Connect Lambda to Jaeger

On the jaeger side a namespace is configured:

namespace := awsservicediscovery.NewPrivateDnsNamespace(this, aws.String("oteltrace-namespace"),
    &awsservicediscovery.PrivateDnsNamespaceProps{
        Name:        aws.String(NAMESPACE),
        Description: aws.String("DNS service discovery subdomain"),
        Vpc:         vpc,
    },
)

This creates an entry in the private domain otel.letsbuild-aws.com:

On the Lambda side the first thing is to tell the adot Layer, where to find the config file:

"OPENTELEMETRY_COLLECTOR_CONFIG_FILE" : aws.String("/var/task/config.yml"),

Because Lambda apps are deployed into the directory /var/task on the Lambda micro-vm, you have to prepend the path /var/task.

You find the file here: app/config.yml

The configuration is added to the Lambda deployment package:

env GOOS=linux GOARCH=amd64 CGO_ENABLED=0 go build -ldflags="-s -w" -o ../dist/main main/main.go
chmod +x ../dist/main
mv ../dist/main ../dist/bootstrap
cp config.yml ../dist
cd ../dist && zip main.zip bootstrap config.yml

The lambda package build script

Configuration

In the configuration, we have three parts

1) The local receiver:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

It receives the traces.

2) The exporter

exporters:
  otlp:
    endpoint: jaeger.otel.letsbuild-aws.com:4317
    tls:
      insecure: true

Here the dns name from the awsservicediscovery is used for the ENDPOINT.

3) The pipelines

Now incoming receiver is piped to the outgoing exporter

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [ otlp]

Details are described in the OpenTelemetry documentation. As stated in chapter 2, not all configurations are valid here.

Compare X-Ray UI to Jager UI

X-Ray now

As the collector is not configured for X-Ray traces, we just see the data from the Lambda service, not the function:

Switch the configuration back to x-rays:

1) Change app/config.yml

Samples for the configurations are provided in

app/config-otel.yml
app/config-xray.yml

2) Deploy app

cd app
task fastdeploy

Then some traffic:

cd ..
task traffic

Then we see all nodes in the X-Ray Map view:

And some traces.

X-Ray Trace Map

Jaeger/Otel

Switch the configuration back to otel and deploy Lambda app again. After creating some traffic, you see traces in the jaeger ui.

Access the jaeger UI from the loadbalancer dns entry or your domain name.

Jaeger Trace Map

1) Choose Service documentcounter here
This is the name I set with the environment variable
OTEL_SERVICE_NAME, configured in the Lambda Resource.
2) The button [Find Traces] shows a graphical view (4) and the single traces

Click on a trace (3) to see the detail view:

Jaeger Timeline

Comparing both trace maps we notice the missing nodes with the AWS service icons.

Comparing the timelines, you see that the segments otellambda AWS::Lambda and otellamba AWS::Lambda::Function only appear in X-Ray, not on jaeger.

That is because only the Lambda Function sends traces, not the Lambda Service. In the "Cloud-Native" container world, usually, it is assumed that the container is already running. So the startup time is not interesting. In Lambda the micro-vm is started, when a request hits a cold start. If that happens often, it may affect your overall latency, so you want to have data. You may get the init duration also from the Lambda Logs. If you need information from the Lambda Resource, you might use the Lambda extensions and the AWS Lambda Telemetry API, which I will cover in the last chapter.

The detail information are almost the same:

Is there a winner?

Functionality

If you have to decide whether to use X-Ray or other services for your traces, ADOT is the more flexible choice. It provides more support from various sources.

For services that have a large AWS part, the X-Ray service provide some more functionality like creating nodes.

Cost

It depends on your metrics!

Otel open source tracing e.g. jaeger

I have seen some other posts, which stated that an extra tracing service would be cheaper, "because it is open source". If you compare the costs the tco have some more parts:

Costs of the running container

Price in eu-central-1
per vCPU per hour $0.04656
per GB per hour $0.00511
With 2 vCPU | 4 GB

Which would be 62.01 €/month

Cost of storage: depends on backend

In the production environment, you would like to set up an application load balancer with cognito authentication with additional costs.

X-Ray

The X-Ray server costs are $5.00 per 1 million traces in eu-central-1. You can also adapt the sample rate to not have a trace with each call.

Operations

The telemetry infrastructure setup is done only once. Once you have it running, there should be not much to do.

With X-Ray, there is no additional operational cost.

Usage

Because you provide the jaeger container yourself, you can adapt the size to the speed you need. In my tests the jaeger frontend seemed very much faster than the X-Ray aka CloudWatch Service Map.

Conclusion

With the sample apps from the opentelemetry-lambda repository the Lambda part itself was easy to implement. What took me some time was to provide the jaeger Fargate service with IaC ouside of an k8s environment. But with ECS and ServiceDiscovery that was easy in the end. This should be even more simple in an EKS environment with the jaegertracing helm-charts.

Using something else as tracing solution instead of X-Ray not looks like a good choice for AWS serverless projects.
But if you have a container solution up and running, otel would be a good choice for an environment, where container traces and Lambda traces are stored together.

Appendix: Quick Walkthrough

Clone repository

git clone https://github.com/megaproaktiv/adot-otelstarter.git
cd adot-otelstarter

Set region export AWS_REGION=yourregion, e.g.

  export AWS_REGION=eu-central-1

If CDK is not bootstrapped:

  task bootstrap

Create VPC

  task jaeger:deploy-vpc

Set Domain and Service configuration

Edit jaeger/cluster.go:

  var SERVICE_NAME = "jaeger"
  var NAMESPACE = "otel.letsbuild-aws.com"
  var HOSTED_ZONE_ID = "Z042035555KH99T9LFKK6"
  var DNS_NAME = "service.letsbuild-aws.com"

Create ECS cluster with jaeger service

  task jaeger:deploy-jaeger

Deploy Lambda Resources and function

  task deploy

Note: because of the ENI this could take a few minutes

Create Traffic

  task traffic

DEV Community