While renovating my house, my partner's dad guided and assisted me. It was my first real foray into DIY, but he had plenty of experience. The apprentice meets the master.
One thing drilled into my head throughout the renovation was that you go faster when you make fewer mistakes, which resonates generally in life. One of the easiest ways to avoid mistakes is to 'measure twice, cut once' — a popular saying taught to apprentice tradesmen, I'm sure.
We should take this approach with software: Make fewer mistakes by measuring. Too often, we make assumptions and act on them before validating them - these acts are mistakes because they lead to wasted effort.
Photo by Loren Biser on Unsplash
Contents
When should we measure?
How do we measure?
Counter Arguments
When should we measure?
To measure, or not to measure?
If you're making decisions based on assumptions or incomplete information, it's time to pause and measure. Rushing into action without proper validation can lead to costly mistakes. If you can't measure yet, the best decision might be to do nothing.
We're wired to believe that inaction is terrible; spoiler - it's not.
Let's take a look at two examples:
The trading system
You're building a system for trading stocks. It has two components: a front end (UI) and a ‘backend’ that accomplishes the work for the UI.
Your UI needs to allow users to create requests to trade stocks. They range from wanting to create one to one thousand requests at a time but average ten (according to product owners!). You already have the ability to create these requests on your backend, but you need to trigger this functionality one request at a time. To improve the user experience, we’ve been tasked with creating a UI that allows users to create many requests to trade at once.
Engineers are discussing creating new functionality on the backend to allow the creation of multiple trades simultaneously, with two purposes:
Simplifying the work for the engineers trying to 'bulk' create trades on the UI.
Reduce performance concerns of calling the create functionality thousands of times.
So, what do you do?
I understand the instant urge to say, 'Create the ‘bulk create’ functionality on the backend, easy.' I've felt that urge; it's an easy answer and sounds like it solves the problems.
Stop. Do nothing. Breathe and measure.
Before you commit to any action, validate your hypothesis. You will save time, prevent technical debt, and provide facts about system use and user behaviours. It's time to start adding measuring to your software development processes.
How do we measure?
First, we need to start fresh and remove both bias and assumption. We need to be analytical here and remove emotion.
Different hypotheses require different methods of measuring - Users might need surveys, and systems might need specific tooling.
Refrain from making assumptions about what to measure. Find evidence that the metric will answer your question.
Let’s work through the above example. First, let’s reiterate what we’re trying to solve:
Simplify the work for the engineers trying to 'bulk' create trades.
Reduce performance concerns of calling the create functionality thousands of times.
There are two assumptions here, so let’s actively call them out. Assumption one : Enabling users to create bulk trades if there’s a specific endpoint is simpler for engineers.
Assumption two : Designated bulk create functionality will be more performant (latency and throughput) than calling the single create functionality individually each time.
If we don’t highlight these assumptions more factually, it’s easy to get lost in what we are trying to analyse - we are discovering the hypothesis’.
There are four assumptions. Did you spot the third and fourth?
Assumption three : Users create bulk orders regularly.
Assumption four : Users create large (1000+) quantities of orders in bulk.
Without assumptions three and four, one and two cannot be accurate, and without assumption three, four cannot be true. We should understand this pattern of dependency to avoid over-investigating when not needed. Let’s investigate assumption three first, as it’s the root of all others.
Assumptions three and four
We must understand user behaviours and measure how often orders are created singularly or in bulk - assumption three. Then we can investigate assumption four - How many orders are made per ‘bulk’?
If you have an observability system connected to the API, it should be as simple as recording every time each endpoint is used and which user it relates to. You can then aggregate orders over a short period, grouping by user, to see how many orders are being created in bulk. If you have access to observability on the system producing these ‘bulk’ orders, you can more accurately observe there instead.
If assumption three, that users create bulk orders, is satisfied, and assumption four is satisfied (the quantitative number is now confirmed), we can justify spending time examining the remaining assumptions.
Assumption two
Next, we need to determine if there’s acceptable latency at the expected throughput— assumption two. Thanks to assumption four, we should know the expected throughput. So, the question we need to answer is: What’s acceptable latency?
It’s subjective and depends on many factors. The easy answer would be ‘whatever the product owner decides’, and that’s ok if they have that answer. Otherwise, it’s complex and requires a lot of research.
Some things to consider might be:
Does the user care about individual requests succeeding? Or is it only about the whole bulk succeeding?
How time-critical is this functionality? Are users blocked from doing further things until it is complete and they have visual confirmation?
Where does the latency requirement sit? Is it in the confirmation for users in the UI (the whole process), or is it just in the sending of the request to another system (a sub-process)?
In this example, latency isn’t too high on our priorities as it requires manual (other) user input on another downstream system, and our user is not blocked waiting for that confirmation. This means we can essentially throw away the need to satisfy this assumption.
In another world, where we have a latency requirement, this comes down to observability and performance testing. You need to set up scenarios that stress test the system to see if it meets the determined latency requirement. The great thing about that is that you will then be able to performance test your system at will because you built the ability to when proving this out.
Assumption one
Our last assumption ( assumption one ) to review is that enabling users to create bulk trades through a specific endpoint is simpler for engineers. Engineering productivity is really hard to measure, so maybe that is the case, or maybe it’s not—you’ll need to do your own measuring/study here, I’m afraid.
Where does that leave us?
Assumption three - Users create bulk orders regularly ✅
Assumption four - Users create large (1000+) quantities of orders in bulk ✅
Assumption two - Designated bulk create functionality will be more performant (latency and throughput) than calling the single create functionality individually each time ❌
Assumption one - Enabling users to create bulk trades if there’s a specific endpoint is simpler for engineers. ❓
Depending on which way assumption one swings, you need to make a decision. Here are my thoughts:
If assumption one turns out not to be true, then just stop caring. You’ve measured and determined it’s not worth the effort. Win.
If assumption one is true, then you still need to weigh up the cost. It costs to build the required functionality to ‘simplify’ the work. Does that cost more than the (technical debt) cost of leaving it as it is?
Weighing up technical debt cost against engineering cost is hard and probably subjective. I don’t know how to measure that, but I’m sure somebody on the internet has something less subjective than gut feeling.
Pitfalls to avoid
Before you get going on measuring everything everywhere, there are some pitfalls to be aware of.
Measuring only once. Look at the title; it doesn’t say measure one. It says to measure twice—in fact, measure continuously. Don’t just measure twice and call it done; keep measuring after the cut and see what you can learn. The secondary impact of all this measuring is the learning.
Measuring for perfection. Sometimes, it’s just not worth measuring absolutely everything. Do you need to invest time measuring productivity when working in a specific codebase to know if it’s worth refactoring to something more manageable? That’s a big investment upfront, and it's hard to measure the impact - especially on something as subjective as code. How do you measure the effectiveness of the action? My point is that it’s okay to trust your gut and let your brain do that automatic thing where it tells you the answer without spending brain cycles deliberating it. Previous experiences do still count as long as we make sure to take lessons from them.
A subset of measuring for perfection is spending too long deciding what to measure or how to measure. You’ll reach analysis paralysis, get nowhere, and just waste time. Make sure it’s going to have a worthwhile impact if you’re going to spend a long time deliberating it.
Measuring the wrong thing. In counter to not measuring for perfection, don’t measure the wrong thing. This isn’t the worst thing in the world if you still learn from it and can use the measurements, but it’s still a waste of time. Do spend some time trying to make sure you’re measuring the correct metrics!
Counter Arguments
Measuring may take longer than implementing (I can't measure it, I need to build the tooling to measure it, or it's very complex to measure).
Yes, it might. But with it in place, you'll be able to measure more things and objectively discuss other pieces in the future. It has a further impact than ‘just now’.
Yes, it might. But you won't build something unneeded and won't have to remove it / have the tech debt of maintaining it.
If it's too complex, you may be trying to measure too much at once or measuring the wrong thing.
My previous experience tells me that I am right in my assumptions. It’s a waste of time to measure it all first.
How did you find out that you were right last time? You measured.
Where’s the harm in proving it out? Having definitive evidence is always better when staring down the barrel.
There are great learning opportunities for everybody involved in measuring - it’s not just about you being right.
Summary
Stop and measure once in a while. Commit to knowledge before action to make informed decisions - It’s easier to account for when things go wrong, and hopefully, things should go wrong less frequently because you’re using real data to form your actions. If it’s too expensive to measure, then take no action. Measure once the expense of measuring is less than that of not.
Figuring out what to measure is hard and involves confronting our biases and assumptions, but it’s worthwhile. Data is hard to argue with. You’ll save yourself a lot of time and work in the long run to make better products for your users, which is what we’re here for at the end of the day.
The conclusion to the example
So what happened in the end? We did nothing. It wasn’t worth measuring because we had nothing causing us to need to measure - we didn’t have users complaining, and the engineers could still easily build their functionality. So, we didn’t measure and took no action.
The ‘issue’ just went away.
Top comments (1)
Great post