Warmup
Introduction
Imagine you set a goal in your life to be at the Olympics. You are not yet sure why but let’s see how it goes. Of course, you also have prepared a plan B to just show up at your local gym at least once a week.
Remember what managers repeat - Goals should be ambitious and realistic. We take the second part seriously 😇
Anyway, you have started training regularly. Over time you are introducing little tweaks into your life here and there. You think it should improve or at least help in your initial goal. After some time, you have got that weird feeling. Something is wrong. Even if you are doing workouts regularly. What did we miss?
Step back
Context
It appears that we forgot about almost, well… everything 😅
Speaking in programming language for a moment - our app is live… but that’s it.
Let me show you a few areas that we have missed.
We didn’t track anything. As youngsters say - pics or didn't happen.
- We don’t know if we can do more repeats over time or if we can increase the weights (metrics and monitoring - this post focus by the way).
- We don’t track our progress with some note or gym app (logs and perf budgets)
- We don’t know about our potential mistakes during workouts. We are doing exercises as we see them. Yes, we have a big imagination (error monitoring/tracking).
Also a few more but I’m out of analogies for now.
Basically, you just go there. I’m not even sure if you were training or just staring at your phone the whole time. Did you? 🤨
marathon
historical data
If we want to survive, we have to analyze. This knowledge can be crucial if we want to head into the right direction. Historical data for the rescue!
How are you going to validate your training plan?
I would start with workout notes (there are apps for that already, you don’t need another side project 😒). Eventually “flex” image from the gym can work as well. Regular weight checks are a must-have.
This unlocks better observations. It works the same way with our application. We can compare metrics, set milestones and make long-term predictions, plans and analyses.
It’s a common concept to set alert conditions when some part is crucial for you. Like setting a clock to avoid sleeping through. On the other hand, you can observe your heartbeat during a workout same as you would set up some dashboards with base metrics.
Usually, you have some industry standards set up that are represented by metrics. You want to follow them and keep them within your scope of interests. Web vitals metrics are a good example there. The same goes for the gym world with different techniques available and just a few of them might interest you depending on what is your goal.
In the end, you build good habits around the workout. It’s healthier, motivating and productive. Over time you can introduce your custom metrics if we build strong ground level.
our pace
metrics
I have mentioned metrics. I meant that we can develop our tailored area of interest. We can depend on delivered statistics to define our condition and crossbar level.
After first failures and regrets, we can also define our border lines that we shouldn't cross. We can agree with ourselves that we will go to the gym at least twice a week (our SLA - Service Level Agreement here). Usually, it's also a good habit to define a penalty for crossing this line and tell somebody about it so we can convince ourselves to respect it. In job, it’s usually our stakeholder or customer.
In general, we can agree on some error degree per month for our application. Otherwise, you have to reconsider the problem and follow it to the root. This should be defined clearly for everyone. SLOs (service-level objectives) come in handy for that.
For example, we expect to train for at least 1 hour per visit 80% of the time during the last month. In other words, we agree that some of our training can be short due to unknown circumstances.
Or we declare to do a full body workout once a month and focus on different body parts (up to 2) every other workout.
Improvement plans
Based on the historical data that you were able to collect you should rethink if it's worth going there in your case. I mean you really rock this Candy Crush saga game between exercises but after a few months, you should be able to do more than 5 push-ups 😒
Jokes aside you should probably consider defining some budget for your app. For example, you want to keep (or reach) your app on the green zone score for CLS value based on webvital report.
With such information, you can plan to improve that value in the next quarter. This can include investigation, planning&consultation and implementation part. After introducing improvements you can revisit your stats again and verify the results. By the way - great OKR example!
source: https://web.dev/articles/cls
Is profiling and performance measurements still an observability part?
I would say it's the same area of interest with a different scope. Usually, we want to observe some metrics constantly. This allows us to compare results from the past. Profiling is usually used to find other kinds of problems with application tasks or registered processes/tasks.
Usually, running perf
profiling on production is expensive in terms of resources used. The impact on our app might be noticeable, so make sure you run it only on a small subset of your production. One or a few application pods should be enough data for you.
On the other hand performance measurements can probably be partially automated and collect metrics periodically.
To know which one to choose we need to know how low-level our problem is. Performance tests should address some more generic purpose while profiling can help investigate certain problems and find root causes instead. The most generic example would be increased memory usage over time. Most of the time it's caused by a bug on our end.
It's good to think about the perf
layer after your app has other more common observability data already available. Not a first line of debugging problems.
How does observability help in the team's work?
Of course, we need to assume that the data we collect is valuable for us. Universal example - collecting production errors.
After we release our code on production we should be promptly informed that something is wrong. With data in hand, the team can take action. For example, rollback and prepare a fix, notify others about problems or both 😃
The same goes for any other data and applies to various angles of help. Based on that, we can plan, make decisions and build the next iterations. It is like building a foundation.
It took me a while to make a decent analogy for that. I came up with this one for now.
It can be understood as another good habit for a healthier life. At the same time, a healthy life is built from many small chunks. One is our diet another is workout consistency, our routine, nature and even motivation. We can get the best and fastest results when we put everything together. We can notice very quickly if something gets off-track.
shared responsibility
Allowing others to check what's up with our app can be very beneficial. This can enable us to delegate stuff outside instead of making every improvement ourselves. With such an approach, we maintain a clear and healthy responsibility split. Everybody can gain from it. Let's try with analogy to this part.
Certain exercises will target selected muscles or one part of the body. Also, we can't do only one exercise for a whole training and expect different results. It's good to plan what we will do in the gym and use the proper tools/exercises to target our goal. We could do everything alone but a coach can introduce us way faster.
Coach can be understood in two ways. One is a dedicated observability team that can support you when you need anything. Second, is a custom dashboard which can keep team-scoped queries in one place.
If you have an observability team in your organization, it can work way better. They should provide the feedback you need for the things you want. Cooperation with such teams can lead only to good outcomes. Sharing knowledge with them is paying back in the long term as more people will learn and observe the app.
There is another side of the coin tho. Don't try to store too much data in your dashboards. Otherwise, you can lose your service visibility and don't know what is going on when you observe "too much".
Of course, "too much" is a very generic term. Dashboards should suggest which part is worth debugging and be informative instead scream at you that something is on fire.
Finish… or just a beginning
We have briefly touched on the observability topic. A bit unusual way. At least I hope it was original.
Quick recap! I have tried to present the main observability topics there. Those can be treated as our signs of discovery. We should know where to head next.
I hope I will describe some topics more and better in future posts. Especially the part about shared responsibility, example data to track/not track and how to avoid noise in the dashboards.
Who knows, maybe even with extended technical coverage and fewer analogies. It depends on my creativity 🤞 For now, let me know if such a format is readable.
Top comments (0)