DEV Community

How to cleanup a branch (PR) with huge number of commits

Albert Zeyer on September 02, 2021

I was trying to implement some new feature in some larger somewhat messy project (RETURNN but not so relevant). So I created a new branch, also ma...

Read full post

Albert Zeyer • Sep 2 '21

Hm, there was some post here, but now it's gone, while I was already answering. Here my answer:

We already do trunk-based development. And also we have lots of CI tests. And exactly of this, I was not merging it in before because still some tests were failing. In the end after everything was fixed, it turned out to be a huge number of commits. And this is the situation now. I.e. I want to split this up.

Usually this does not happen like this, and changes turn out to be smaller. But because we require that tests are always passing, one particular change, fix, or extension needs follow-up fixes of other things.

Benjioe • Sep 2 '21

Yes, I deleted it because your on a open source project, so that's not relevant ...

Albert Zeyer • Sep 2 '21

Yes? What was not relevant about it? You mentioned trunk-based development, which is relevant. I thought also your other points might have been valid options.

Benjioe • Sep 2 '21 • Edited

Yeah, you true, so that's was :

Maybe you can :

Create a new branch from master when you need unrelated changes, merge it and rebase your feature branch.
Or use Trunk Base Development where your branches lives 1 day max
Ask yourself, do your really need a clean history ? Or good comments are good enough ? (not for code-maat)

Benjioe • Sep 2 '21

Do you have some automatics test that's taking long times ?

Albert Zeyer • Sep 2 '21

There are about 50 GitHub CI jobs, each running a lot of different tests. The longest job takes about 10 minutes but most take only 1-3 minutes. But overall until GitHub runs through all of them, it takes maybe 12 minutes or so, once it started, because many run in parallel.

These are run on every push and for every PR. When I keep updating the PR, this can queue a lot of CI runs. And then sometimes I need to wait hours until I see the CI results.

This is a big problem. Because we must know that the tests are passing before we merge something in. So I'm often just waiting.

Benjioe • Sep 2 '21

And could-you rewrite thoses test using Test double to make them quicker ?

Albert Zeyer • Sep 2 '21

I think most tests on their own are already pretty small (most probably take about 0.1 sec or so). I don't think you can optimize much there.

This longer 10 minute test runs PyCharm CLI checks on code style. I haven't really found ways to speed it up.

Maybe with a lot of effort, you could somehow reduce it a bit. But even if you maybe reduce it by half (unrealistic), this problem would still persists, that you often need to wait (although of course to a lesser degree).

I think we are still doing fine with about 10 minutes per CI run. From what I have heard from Google and elsewhere, it can take much longer. And they also manage to still stay productive. So maybe they don't need to reiterate too often for one set of changes, or they don't wait each time for the tests to pass. But this is only because I now want to split it up and actually figure out working independent sets of commits (working = passing tests).

Benjioe • Sep 2 '21 • Edited

Do you run thoses fast test on your own machine before commit them (before CI) ?

Albert Zeyer • Sep 2 '21

Yes sure. I do that a lot. Esp while debugging.

For this cleanup and the search of good subsets of commits which, I maybe don't do it enough. I mostly look at the big list of commits, and find related commits (via my script, or just manually), and then structure and partition it logically, and then push it to individual branches/PRs, and then wait for CI to finish.

Running them locally (before or after the push) could maybe speed this up. But this would maybe distract me from continuing with the partitioning/structuring of the other commits, because then I would wait for the local tests to finish. This is still a couple of minutes, so I would get out of the flow and train of thoughts.

Or maybe I could have a separate checkout of the repo, where I run the tests locally in the background. Then I could continue with structuring or other things in the meanwhile. But managing this seems a bit annoying and still manual work.

This is basically why I posted the question here. Everything what I can think of does not really seem to be optimal. Maybe I'm not knowing about some better workflow, or better tools.

Benjioe • Sep 2 '21

Ok, I think I get it:

Your tests are too slow (but can't be faster) so launching them get you out of the flow and train of thoughts.
You want's independent commits who's can be revert without breaking changes.

So you write the full features with refectaroing without test, send them to the CI to check regressions, fix everythings and when it's good, rewrite history to have independent commits.

Albert Zeyer • Sep 2 '21

Yea, this is what I do. But this is a lot of work, with lots of effort, lots of trial and error, and lots of waiting. And I feel like there should be tools which at least support me on this to some degree, to make it a bit more convenient. Some of these things could be automated, or semi-automated. E.g. there could be a tool or script which automatically figures out these set of commits which can be applied individually without breaking the tests.

Benjioe • Sep 2 '21 • Edited

And If you don't test regression but only code style of the new code (configuring your PyCharm with the same rules as CLI) ... tests are still too slow ?

Albert Zeyer • Sep 2 '21

Yes. Checking the code style is actually slower than the regression tests. But also, the regression are even more important than the code style.

Benjioe • Sep 2 '21 • Edited

Yeah, I was thinking about Open/Close principle so you don't always have to launch regression testing.

So, yeah a tool like you do or like gitflow cli with a set of command to fix CI errors while keeping your history clean. (but I don't know any :'().

Or git rebase --autosquash