Written by Ignacio Bonafonte
Issues and failures in production
Tracking down crashes and issues in asynchronous code is usually very hard. If your code crashes, the crash report and the stack trace are related to the thread that crashed, but the context where the crash happened is usually lost. If the app didn’t crash but had a wrong behavior, it can be even worse, because the best information you can probably get is a log line in a hidden log file.
To locally track these problems, Apple provides ActivityTracing.framework, which lets you group your application code in Activities and assign logs to those activities. ActivityTracing also allows you to leave a trail of events to help you identify the path your code walked before the problem happened. This functionality is very helpful to identify problems locally. If you are still not using this technology in your app, take a look at Apple’s documentation, it can help make life easier as a developer.
However, the usefulness of ActivityTracing is limited if your application interacts with multiple services and performs requests and receives responses asynchronously. As a developer, identifying the root cause for failure turns into a guessing game: if the failure happens in a repeatable manner, you can run the problematic code many times while monitoring the service to find it; but if the error arises in the hands of your users and is not reproducible then you lack the visibility to find a solution. Here comes Observability to the rescue.
Observability and distributed tracing
Observability means understanding how and why an application reached its current state just by its outputs without modifying its current status. So, when something wrong occurs in the wild you should have all the data needed to know why it happened just by checking the output of the existing application and related services.
It consists of several practices that must be followed in all the systems involved: monitoring, alerts, logs, and a common way to compose all the information from your different systems working together.
Distributed tracing is a method used to profile and monitor applications, especially useful for those built using a microservices architecture. It monitors the transactions that happen between systems and reports the monitoring and log results of every system that participate in that communication to a central server that unifies the different reports around the transaction. You could say that it works like ActivityTracing but in a multi-system environment.
Why bother with microservices?
IT and DevOps use distributed tracing for debugging and monitoring distributed software architectures, but you can also use it for your development and debugging purposes. When developing application functionality around an external service, the communications with that service are not always as nice as desired: maybe you misimplemented the specification of a REST API and sometimes your application doesn’t work, maybe the server just didn’t handle a corner case properly and your application is receiving an error 500, or maybe the service changed under the hood and your code is not compatible anymore.
When your application is released and your code is running in the client’s hands, if some feature doesn’t work as expected the error reports will find their way back to you. You have to start the process of debugging to figure out why it is happening: probably trying to reproduce the issue yourself, checking the latest code added to that functionality, or checking the crash if you have them. Wouldn’t it be better if you could just look for the issue in the logs and see that the server returned an error because it had an internal error? Wouldn’t it be even better if you could provide the exact request that made the service fail to your fellow colleague who’s writing the backend stuff?
This is what observability and distributed tracing can bring to your workflow, the complete context for every interaction with an external system that your application or framework has. You can also add your existing ActivityTracing activities to have a complete frame of the code and narrow down issues even faster.
Meet OpenTelemetry
Observability is mainly used in microservices environments and each solution supports a subset of system and languages. Finding one that supports both your backend systems and your iOS platform may not be an easy task.
OpenTelemetry is an open-source observability framework still in the works, formed through a merger of the two most popular distributed tracing standards (OpenTracing and OpenCensus). The goal of OpenTelemetry is to provide both the API and a vendor-neutral implementation so you won’t be tied to what the vendor of your solution provides for all your platforms.
Right now, iOS is not in the group of officially supported platforms, but we are working at Undefined Labs to provide the source of a Swift client that will allow any iOS or macOS developer to use this technology in their products. We will be providing the first alpha version of the code in the following weeks.
At Undefined Labs, we are interested in the standardization of distributed tracing in all platforms. One of our products in the works, Scope, is a management and monitoring platform for all your testing needs. Scope provides these observability superpowers to all your tests, so when a test unexpectedly fails, the root cause of the failure can be easily found and fixed. Scope makes it easy for you to keep your test suites healthy and robust.
Testing is a core competency to build great software. But testing has failed to keep up with the fundamental shift in how we build applications. Scope gives engineering teams production-level visibility on every test for every app — spanning mobile, monoliths, and microservices.
Your journey to better applications through better testing starts with Scope.
Top comments (0)