Objective data to measure software development is here, and it’s here to stay.
For a long time, the notion of using such data was thought to not really be possible. Thought leaders like Martin Fowler and Joel Spolsky basically said it couldn’t be done. Clearly, it’s a challenging task that frustrated software development managers everywhere. Shoot, I wrote an article way back when basically arguing that it is impossible to do.
Well, I’d continue to argue that it impossible to do. But , with the rise of tooling like git, Jira, and other project management tools, it started becoming clear that the data is there to enable us to get a closer, more data-driven look at what is going on inside software development projects. That data just had to be revealed.
And of course, people have started doing so. One of the most important and well-known results of that was done by the DevOps Research and Assessment organization — known commonly as DORA. They surveyed thousands of DevOps engineers and leaders over six years, coming up with a set of four metrics that were considered critical to the success of software development projects.
The four DORA engineering metrics are:
Deployment Frequency
Mean Lead Time for Changes
Mean Time to Recovery (MTTR)
Change Failure Rate
The first two metrics — Deployment Frequency and Mean Lead Time for Changes — measure the velocity of a team. MTTR and Change Failure rate are a measure of the quality and stability of a project. All four metrics can be derived from mining the tools that you are currently using.
These four DORA engineering metrics are designed to allow software developers to align their work against the goals of the business. They have become the standard way for CTOs and VPs of Engineering to get a high-level overview of how their organizations are performing. By keeping an eye on the DORA metrics and organizing their work around improving them, development teams can ensure that they are doing the right things to move their projects, and more importantly their business, forward.
Of course, understanding what the metrics actually measured and what they mean is necessary to make them useful. In addition, knowing the current state of these metrics is required for improving them as you move forward.
So let’s take a look at these four key measures.
Deployment Frequency
What is it?
Deployment Frequency measures the number of times that code is deployed into production. It’s usually reported in “Deployments Per Day”.
Now, production can mean different things to different customers. For a SaaS company, it normally means actually delivering code to the production platform used by actual customers. For other companies, it might mean “made a version available for use by customers”.
Why it’s important
Increasing deployment frequency is an indication of team efficiency and confidence in their process. A team that can deploy more frequently is moving work through their pipeline faster and being more efficient about all of their work products.
How is it calculated?
It tallies the total number of deployments an organization does in a single day. As noted, the definition of “deployment” can vary between organizations. This metric can be automated if a team has a Continuous Integration/Continuous Delivery(CI/CD) tool that provides an API into its activity.
How do you improve it?
If you wish to improve your deployment frequency, you should invest in:
Improving automated test coverage
Integrating with CI/CD tools
Automating the release validation phase and release process
Reducing the error recovery time on production
Mean Lead Time for Changes
What is it?
Mean Lead Time for Changes is the average time it takes from code being committed to that code being released into production.
Some organizations begin tracking the time from the first commit of the project’s code, while others measure it beginning from merging the code to the main branch.
Many organizations roll Mean Lead Time for Changes into a metric called Cycle Time, which is discussed below.
Why it’s important
A lower Mean Lead Time for Changes means that your team is efficient in coding and deploying projects and are adding value to your product in a timely manner. Attempting to lower the average incentivizes teams to properly divide the work, to thoroughly review the code, and to have a fast deployment
How is it calculated?
Each project is measured from start to finish, and an average of those times is calculated.
How do you improve it?
This metric can be improved by:
Adding automation to the deployment process
Ensuring that the CI/CD process is as efficient as possible
Break projects into smaller and more manageable chunks
Create an efficient code review process
Mean Time to Recovery (MTTR)
What is it?
This metric measures the average time it takes the team to recover from a failure in the system.
“Failure” can mean anything from a bug in production to the production system going down.
Why it’s important
Obviously, down time is not good, and the quicker a team can recover from it, the better.
Keeping an eye on mean time to recovery will encourage the building of more robust systems and increased monitoring of those systems.
Quick recovery times are a reflection of the team’s ability to diagnose problems and correct them. Measuring mean time to recover can have the effect of making the team more careful and concerned about quality throughout the entire development process.
How is it calculated?
Normally, this metric is tracked by measuring the average time between a production bug report being created in your system and that bug report being resolved. Alternatively, it can be calculated by measuring the time between the report being created and the fix being deployed to production.
How do you improve it?
MTTR can be made better by:
Building a CI/CD system that quickly reports failure
Ensure there is a process in place to take immediate action on failures
Prioritize recovery from failure over all other tasks
Improve Deployment Time
Change Failure Rate
What is it?
Change Failure Rate measures how often a code change results in a failure in production. Changes that result in a rollback, in production failing, or in production having a bug all contribute to this metric.
Why it’s important
This metric is important because all time spent dealing with failures is time not spent delivering new features and value to customers. Obviously, lowering the number of problems in your software is desirable.
How is it calculated?
Normally, this metric is calculated by counting the number of times a deployment results in a failure and dividing by the number of total deployments to get an average number. A lower average is better.
How do you improve it?
Change Failure Rate is improved when you:
Ensure all new code is covered by automated unit tests
Improve automated testing as part of your continuous integration process
Do thorough and complete code reviews to help prevent issues being introduced into production
The Benefits of Tracking DORA Metrics
Decision Making
Consistently tracking DORA metrics will enable you to make better decisions about where and how to improve your development process. Doing so will reveal bottlenecks, and enable you to focus attention on those places where the process may be stalled. Trends can be identified and the quality of decisions about what was focused on can be validated.
DORA tracking can help focus both the development team and management on the things that will really drive value. They allow you to make decisions based on data rather than merely a finger in the wind or a gut feeling.
Delivering Value
DORA measures the value that your team is delivering. If your DORA metrics are favorable, your team is delivering value to your customers and are maintaining the quality necessary not to be distracted from that focus. And that’s pretty much the bottom line for any business — delivering value to your customers.
Virtuous Cycle
When anything gets measured, it will likely be gamed — that is, people will change behavior to optimize that which is measured. Many times this can have a negative, distorting effect on what a development team does
DORA metrics can be gamed, but the great thing is that you want them to be gamed. You want your team working to optimize these metrics. Gaming them results in good things. Normally, gaming a metric has a negative impact on teams, but these metrics were carefully devised to do the exact opposite. Since they highlight inefficiencies and wasted time, gaming them will increase efficiency and reduce waste.
LinearB helps you measure and improve DORA engineering metrics
DORA Metrics are important, and LinearB allows them to be tracked easily. We give you DORA metrics right out of the box that can be easily displayed and tracked.
A dashboard like this could be useful by giving senior members of your software development organization a higher level view of the DORA metrics for the organization. With this simple view, leaders can see at a glance how the team is doing and what mid-course corrections might need to be made.
In addition to the actual DORA metrics themselves, LinearB can track other metrics that can help improve your organization’s performance.
Metrics like Pull Request Size, Pull Request Review Depth and Pull Request Review Time can all be monitored and when improved, will reduce Mean Lead Time for Changes and Deployment Frequency
Track DORA engineering metrics in your own code repository. Click here to get started free and see your Cycle Time drop!
Going above and beyond
LinearB goes beyond the DORA metric of Mean Lead Time for Changes to provide Cycle Time.
Cycle Time is a powerful metric that measures how long it takes a given unit of code to progress from branch creation to deployment in production. It’s really a measure of how fast a given task or subtask gets delivered to end users. And of course actually delivering functionality is the purpose of every development organization.
Cycle Time is divided into four subsections:
Coding Time — Normally measured as the time between the first commit to a given branch and the moment a pull request is created for that branch
Pull Request Pickup Time — This is the time between a pull request is created and when the review for that pull request starts
Pull Request Review Time — The time between the pull request review starting and the code being merged
Deploy Time — Deploy time is the span between the merging of the code and that code actually being deployed to production.
Improving Cycle Time has a number of benefits:
Closely tracking coding time encourages you to divide work into smaller, more manageable chunks. Cycle Time goes up if a given branch or project is big and takes a long time. Instead, it encourages smaller bites of work
It pushes the team to process pull requests in a timely manner. It helps prevent languishing pull requests and pull requests that are too large to review effectively.
Teams that track deployment time are motivated to focus on improving and streamlining build and deployment processes.
Rising Cycle Times can be an early warning system for project difficulties. If I had to pick one thing for a team to measure, it would be Cycle Time.
WorkerB improves DORA Metrics
Idle time is the time spent waiting for things to happen in your software development process — pull requests sitting idle and unreviewed is a good example. It is a big killer of two important DORA metrics: Deployment Frequency and Mean Lead Time for Changes.
WorkerB is a feature provided by LinearB that can have a drastic, positive effect on reducing idle time and thus improving your DORA metrics. By notifying your team members about repository events, it ensures that the team is immediately aware of the components of Cycle Time (e.g. Pull Request Pickup Time and Pull Request Review Time) and allows them to react in a more timely manner.
LinearB customers routinely report more that a 50% reduction in CycleTime in their first four months of using WorkerB.
Measure for Success
DORA Metrics are based on years of research into what really matters for software development teams. Focusing on them will result in more value being delivered through your development pipeline. More value means happier customers.
LinearB can help your team track them consistently and thus bring about a profound and lasting impact on your software development process and your business.
Sign up for LinearB for free today.
If you haven't already heard, Dev Interrupted is hosting INTERACT: The interactive, community-driven, digital conference that takes place September 30th. Designed by engineering leaders, for engineering leaders, INTERACT will feature 10 speakers, 100s of engineers and engineering leaders, and is totally free.
Register Now
If you haven’t already joined the best developer discord out there, WYD?
Look, I know we talk about it a lot but we love our developer discord community. With over 1500 members, the Dev Interrupted Discord Community is the best place for Engineering Leaders to engage in daily conversation. No salespeople allowed. Join the community >>
Originally published at https://linearb.io on June 17, 2021.
Top comments (4)
Nice article, Nick! I think this is a really good primer on DORA metrics and why they matter. I definitely agree that now that these metrics exist, they won't be going anywhere.
Gaming these metrics isn't a problem?
The opposite problem also applies, DORA takes no account of how complex a change may be. Also, DORA often spreads across multiple teams (dev vs ops, etc). So using DORA as performance metrics for team management is a thorny subject at best.
As we all know, a more complicated change increases the size of the commit, and the amount of time in development and test.
The obvious answer, is to integrate Story Points into the metrics, to get an idea of "how long it takes 1 Story Point to be running code in Production" - but then we descend into the debate about how to determine story points, if we should use Fibonacci, how Story Points from team A measure to Story Points from team B.... etc etc (and a myriad of other problems).
Don't get me wrong, DORA is a good start if you have nothing else, but no matter what you're tracking or how you're tracking it, someone will gamify it the second you start tracking it.
Thanks for the breakdown, Dave. I appreciate it!