At one point or another you might have found yourself putting a Pull Request up for review that was significantly bigger than what you were expecting it to be. And you found yourself wondering:
“How big should it really be? Is there a sweet spot for the size of a review? If we could theoretically always fully control it, how big should we make it?”
You googled around, and you found a lot of resources, sites, and articles like this one, analysing the subject and ending up with something along the lines of:
“Too few lines might not offer a comprehensive view of the changes, while an excessively large PR can overwhelm reviewers and make it challenging to identify issues or provide meaningful feedback”
And although you understood the sentiment of the writer, you also understood that the theoretical answer could only be vague, as there is not a silver bullet. As always life is more complicated than that.
What we are gonna be doing in this article is something different however:
“We will analyze the PRs of ~30k developers to see how the size of PRs correlates with lead time, comments received and change failure, to try and find what statistically is the best size, as well as examine what affects it.”
Disclaimer: For anyone who has played around with data, and especially if you did any courses/training in data, the above might bring back some memories of this phrase “Correlation does not mean causation”. First of all hello to you my fellow scholar, and secondly you are absolutely right. We will try to look at it from various angles to see how this correlation varies by company, developer, per developer and amount of code committed, and any other angles which might help us understand what others values, for any reason, follow relevant patterns. However, these are “only” numbers and correlations, they do not explain the reason behind them, so any assumptions for causes that we make are more anecdotal and less scientifically backed.
Methodology
Lead Time
In this case we use as lead time the time between the earliest PR event (either 1st commit, or PR open), and when the PR gets merged in.
Data Preparation
Data that are removed as outliers:
- PRs that had a lead time of more that 6 months
- PRs that had a lead time of less than 5 minutes
- File changes of more than 20k lines
- PRs with more than 20k line changes
After we have done that we have a few hundreds of thousands of merged Pull Requests that are used to produce the below analysis.
Algorithm
All Correlations have been done using the kendall tau method, which should be able to better estimate the correlation in the case of non-linear relationships.
How does Lead Time relate to PR size
Before we go more deeply, intuitively we expect that the size of a PR should correlate in one way or another with the lead time, but is it actually the case? Running correlation between the two variables for the whole dataset, gives us as a result the below correlation matrix.
From these numbers we could say that there seems to be some correlation between the two variables but it seems to be a bit above the limit of statistical insignificance, meaning that:
Their correlation is there, but is not very strong, maybe less than one would have expected.
Seems like we’ll have to dig deeper to see why this correlation appears to be so weak, and unfortunately, plotting the graph of total line changes to lead time if anything makes things less clear, as although the trend seems to suggest that the ones with the higher lead time had slightly bigger size on average, we see that any link between them is not so clear to see.
Now, if we change this chart a bit, by grouping the data points by day, and taking the median of the total changes by day, we start to see a bit more clearly how they relate and potentially an explanation for why their correlation is not that high.
So this suggests that at fast lead times the PRs are consistently low in lines changed, and as they get bigger there is a linear increase on the lead time. However, higher lead times can be produced by any size of PR and the correlation is very low between them.
What is the best size
To try and answer this question, we’d first have to ask ourselves what is it that matters to us, ie what are we trying to optimize for. Now, that is a question with endless possibilities. For our purposes however, we will examine what is the largest size of PR that statistically works better given these 3 wants:
- Low lead time (aka be done fast)
- High number of comments (not too big to review properly)
- Low defects/reverts (aka we are not breaking things)
If we plot in a heatmap the probability of a PR getting done in a number of weeks to the size we get the below.
Meaning that a PR of less than 100 lines of code has ~80% chance of getting done within the first week
A similar heatmap for the amount of comments gives us the below.
Which means that a PR of 6000 lines of code has the same probability of getting 0 review comments as much as a review of less than 50 lines of code.
And finally doing the same for the probability of reverts gives us the below heatmap, and depicting the probability of no commits from that PR getting reverted gives us the below.
Which means that generally larger size PRs have a larger probability of being having some parts of their code reverted (ie faulty)
From the above if we plot on the same graph the probabilities of completing an PR within the 1st week, the probability of getting at least 1 or more comments, and the probability to not have to revert a commit from that PR, we get the below.
Therefore, statistically, below ~400 lines of code per PR gives a good probability of getting some review comments, completing it within the first week and not having issues with the code.
Of course that is only “statistically” the case. It surely depends on a lot of things. Let examine some potential ones.
Does it depend on the user
We would potentially expect it to vary per user, but how different it could be per user, either that being the author or the reviewer, could be more interesting. After removing all users that have one or more of the below:
- Less than 10 merged PRs
- Less than 10 commits
- Less than 100 lines of code changed
And all reviewers that have:
- Less than 10 approved and merged PRs
We perform correlation analysis between Lead Time and PR size per user. If we then put the result of the analysis on a histogram showing how many users had each of the correlation value, we get the below charts:
The correlation between Lead Time and PR size heavily depends on the PR author as well as the PR reviewer
There are a wide range of reasons why that could potentially happen, like level of seniority, company/team process, coding language, review tool, etc.
Below we plot the relation of the correlation depending on the amount of lines of code a developer has written throughout the last 6 months. Although that instinctively could lead us to think that that means a more “experienced” developer, it is not necessarily true, as it may be also affected by multiple factors, such as eg amount of meetings, mentoring, collaboration per day, which could vary on seniority, the tasks each one took up, etc., and so on and so forth.
Nonetheless we depict it here for anyone that might find it interesting. Also keep in mind that the difference in the correlation between a user with many PRs merged and few is not a very large one.
The more lines one has written the more correlated the PR size is with the lead time. This could also mean that lead time becomes more predictable in this case, and it depends more heavily on the size of the PR and not other parameters (e.g. complexity). However, more analysis would be required to establish that.
Does it depend on the company
We mentioned earlier that there are various potential reasons for a correlation between Lead Time and PR size, and we also said that the cause of the strength of the correlation would also be multivariate. One of the potential causes being company/team processes. If that would be the case we’d expect to see the correlation varies by the company.
Taking a small sample of companies and examining the strength of that correlation seems to suggest that that is a valid assumption as well, as we can see here it varies from 0.1 suggesting that the two metrics are not related at all for the specific company to almost 0.7 suggesting a relatively strong correlation between the two.
How much PR size relates to Lead Time seems to depend heavily on the specific company
Does it change over time
It absolutely does, and massively so! Unfortunately, it’s rather hard to depict that for everyone in a single chart. However, I’m putting here my own correlation over time that I got from our free analytics platform so you can get an image of how much it can vary.
Conclusion
We examined the correlation between Lead Time and PR size to try and see if we can draw some conclusions about what is the size we should be aiming for. We found that statistically there are some generalisations we can do and estimate an optimal size. However we also came to the conclusion that the link between them heavily depends on the company, the team and even the individual developer. Which seems to suggest in the end that:
Each developer works in unique ways, and only you, if anyone, knows what is the optimal for you, and your team.
Now If you would like to check where you or your team/company stands wrt this correlation between Lead Time and PR size, we created a simple way for developers and teams to get insight on how this correlation changes over and see where they stand, either individual, team, or as a whole company. If you are curious about it, feel free to check it out.
Top comments (0)