A recurring conversation in developer circles is if you should use git --squash when merging or do explicit merge commits. The short answer: you sh...
For further actions, you may consider blocking this person and/or reporting abuse
I assume you are referring to the "Squash and merge" option on GitHub? If so, yes I 100% agree with you.
On the other hand, if you mean devs should not squash and rebase before pushing a PR, then I disagree.
PS. Some of the formatting in your post is off :) Around the last example with git
by squash you mean collapse all commits into a single one? because i think that's wrong :)
I do think spending some time in
git rebase --interactive
(ormagit
in my case) makes a lot of sense, however.Yeah, totally agree!
No, that's not what I mean. I was confused if that was what you meant. I guess we're on same page :D
It's not as bad. Provided you keep reference to the PR number in the commit message. Luckily Github includes PR number into the commit message when merging automatically and in the github history it will even create a link directly to the PR. That way one does not lose the history of the PR itself, should anyone really need it.
It worked well with one team I was involved in.
Teaching good commit practices and using git to its full potential is doable, when majority of the team is already good with it and only some developers need help, if the whole team has problems with that, it's not so easy, the option to squash merge saves a lot of time.
Also helps to get rid of nasty merge commits of merging main branch into feature branch, if github is setup so that it requires the feature branch to have latest changes from main branch (which should be required). Rebasing would be preferable, but it's not as comfortable, because it will require new approval from your team, if your protected branches rules require an approval before merge (which it should).
If squashing means loosing too much information, then your PRs are probably too big to begin with. Imho it's a code (or process) smell that should be brought to attention asap.
As for looking at what the developer did in their branch, I tend to think the PR should speak for itself. How we got there is not important. Unless you're also prepared to spend a lot of time cleaning up your branches before sending PRs. (Time you could possibly spend making many small PRs instead.)
Nice trick for the CLI tools with "first parent"! I was not aware it even existed. Unfortunately it's not available in most graphical tools that I'm aware of, so those users will be stuck with the "ugly" history.
I used to insist devs squashed/rebased/etc. their commits before opening a PR and then use rebase-merge to merge the PR into main.
Over time I've learned the value of a squash merge. If a PR is too big to be able to describe in one commit message, or too complicated to understand from looking at the diff, then you're doing too much in one go.
Squash merging PRs is absolutely fine if your branch has nothing but work-in-progress commits. If you feel like you're losing something by squashing, then you need to rethink your process...
So you are saying to only do pull request that have the size of a single commit?
As a (very) general rule, yes. You should be able to understand a change based on a single commit message, yes.
Obviously sometimes you have a big feature that can't be released piece-by-piece. In that case I would have a feature branch, and then individual branches off that. You PR (with squash commits) each smaller piece of work into the feature branch, and then at the end merge (not squash) the feature branch in. You have a history of all the pieces of work done, but not all the useless wip commits that don't actually tell any kind of story...
That makes sense. I see a pattern emerging here. I think that usually, when we "argue", we often are actually solving the same problem, often in the same way, but with different words.
I operate under the premise that your branch history is meaningful, and has relevant commits. If you do a ton of WIP commits, I would question why you would do a "WIP" commit in the first place, because squash merge or not, you are robbing yourself of helpful history while developing your feature already. I also heavily use interface staging (staging individual hunks), both for "pseudo review", and to split up my work in proper chunks, with git commit hooks validating at every step of the way that my tests run. If I still manage to make a mess (say, I'm tired, or in a rush, or just frustrated), I will often spend the time to go back with interactive rebase and eliminate the junk either with squash / revert+squash or plain delete, more rarely split up bigger commits into smaller ones.
What you are describe in your workflow above to me is basically what I am achieving by keeping side branch history. I would say, as someone who often had to merge dirty crap branches, I do like to keep the WIP commits anyway, because they give me an insight into what someone was trying to do, what their cognitive style is, what they were struggling with, to be able to assist them better.
But let's let things speak. I recently merged a "big" commit, setting out to build a feature that led me to start introducing typescript annotations. We are fairly fast moving 2 dev team and reasonably trust each other, so the other dev was fine with keeping both the typescript introduction and the actual feature in the same PR. Here's my history in this case:
Those are all valuable commits to me that I would like to keep for the long run. Maybe I'll figure out that the reason a certain DB query doesn't work anymore is not say, the API change, but actually the "Fix php-cs-fixer" commit. Of course I could make a separate PR for "fix php-cs-fixer", and then again for "Make linkbutton clickable", and then again for "Remove logging entries", but then we end up where we started, except with a lot more PRs and CI runs.
You'll note I use gitmoji, which I also find very useful, as I can at a glance recognize what the reason is behind commits. To show the graphical view:
I had a long conversation about that with other developers, it was very interesting, and I plan to write about "big evil merges" in the future.
Situations where big branch merges might happen (for valid reasons, imo):
In general, I'm not a fan of "in a perfect world you wouldn't need more information" arguments.
In my experience, even small clean PRs can benefit from having a granular history, say when git blaming something 3 years down the road.
As for UI tools, I use magit / sourcetree / intellij's history browser, I'm sorry if other tools don't support it :/
I wish all tools supported --first-parent, because the (valid, because tool friction matters) reason is "my tool doesn't know how to display the information i want, thus i have to lose context for it", arguing that "merges make the history sloppy" is just a cop-out, it's just not true. I think one reason for that is that many developers don't know how git internally works, and thus have a warped understanding of what the history is. Git's CLI tooling really doesn't help here.
100% agreed. A "clean history" does not mean that every PR should have the commit history of the work process wiped out.
It's also really harmful to code reviews, where seeing the differences between revisions of the pull request is impossible with GIT, and near impossible with most git servers. GitLab supports this FWIW, I still don't think GitHub does.
There are lots of things that benefit from having separate commits, like a rename and then a diff, or a refactor inside of a PR. PLEASE DO NOT MERGE A REFACTOR WITH A FUNCTIONAL CHANGE. I don't want to see it in the same PR, let alone in the same commit.
No one that says you should squash, as ever had to a do a difficult task regarding git history. Had to find a bug with git bisect, sub directory migrations, find when a critical feature changed, or refactor gone wrong, you would know. I'll die on that hill.
That isn't to say you shouldn't clean up your commits before your open your PR. The only thing I don't want to see is "updates from pull request review", but anyone bringing to the table a conversation about squashing having to do with commit messages, is barking up the wrong tree. Fix your commit messages, squashing isn't a solution to that problem, it's a patch.
This is a good point.
Which makes me think, how can you give someone who hasn't experienced larger git pains the context in which some of those decisions are made? Or in general, workflow/code hygiene steps that might seem like red tape until you've experienced some of the nightmares that can ensure.
It's a paradoxical kind of thing, because if these systems are in place, by definition you will not encounter the reason why these systems were put in place (similar to any kind of preventative measure that works). I ran into really heated discussions where my point of view was basically "it caused me much pain in the past, trust me, we should do X", which is not a great argument.
Honestly it goes both ways. If they are allowed to say "I like it better this way", then you are allowed to say "have my wisdom rather than learning through the experience of failure". "It looks prettier" is less of an argument. This is where:
I use that in those circumstances, and it has yet to come back to me. In most cases, years later I have these same engineers coming and telling me "OMG remember this problem, I was totally wrong, and I tried to convey that same point to others, but they also didn't get it" Followed up by: "What am I supposed to do with these 'seniors'?"
Hi Manuel, I'm Manuel.
you got me with this:
Above all considering that my opinion is the correct one. Kidding.
Nice post, I just disagree, squashing is fine. But I can see your points. In the end it's a tool and sometimes will be handy sometimes won't.
Do you use squashing because you want to have a "clean" history per default? Or do you have other reasons?
IMO too much information leads to disinformation. Checking the actual "WIP" commits from a feature branch is a "thin grain info" I've never-ever required.
Cleaner history is , yep, the main reason.
But actually I've faced another issue in the past; there was this repo 15+ years old in my company, with hundreds of committers through the ages, and commits in the order of n * 100.000. Dealing with this repo was a challenge actually! too much useless info at
.git/
folder. What I'm trying to say is that "thin grain" info do weigh. Of course you need to reach those numbers.A lot of people bring up "WIP" commits. Do you often do WIP commits? I personally rarely do (I do get frustrated and use the 💩 emoji, as I use gitmoji, but still make meaningful commits). But the point of the article was that you can easily hide all that information and focus on what you need.
As for historical gits, I wonder if people here ever did a git "cleanup" where most of the ancient state gets culled, and just the last few years are kept. Cruft does indeed accumulate.
Just adding a little perspective, I don't think I've ever worked on a team with anyone who didn't use WIP commits. I work on a small team and there is a lot of context switching that needs to happen and 'finishing' a commit before switching to something else just isn't an option.
Interesting. I have the opposite experience. I use
git stash
in those cases, or do yougit rebase --interactive
to clean things up later.I don't like the idea of most of the ancient state being culled.
The codebase at my current job has a cutoff from when it was moved to git, and there's even less hope of finding out why something was done for code that predates that than there is for the rest of the codebase.
Maybe I've always worked at the wrong places, but I've never been in a place where I wish there were fewer commits in the history, but a lot of the time I have wished there were more commits (often when trying to review code), so that I had a finer grain insight into why a particular line of code was written, and what else was changed for the same purpose.
I'm 100% with you that losing this information is nothing but a bad thing. I find that even the worst git commits tend to provide the best and most accurate and up-to date documentation of the code; it amazes me that so many people choose to throw that away!
What no one is mentioning is that the squash feature on GitHub PRs preserves the original commits that were squashed. If you REALLY need to go back and examine the granular history of a PR than you can still do so. On teams I've been on, we strive for PRs that are not over scoped and where that squashed message tells you exactly what feature / fix was added by those line changes. We also often use Conventional Commits, which help in a big way with release note automation. When I look at a
main
branch history like this:I see VERY clearly what features and fixes have gone in between versions 1.0.0 and 1.1.0. I also have easy links to the PRs that were squashed to produce those commits if I need to drill down any further. If a feature needs to be reverted it's a very easy reversion (no extra parameters).
If you end up with a PR that has a very large scope, there are two things one can/should do:
I'll agree that just having a "linear commit history" shouldn't be the only reason for doing squash commits. But if it simplifies your team's workflow, reduces cognitive load, and makes understanding exactly what is included in a release easier to find then I say it's worth doing.
That said, I feel there is a "best of both worlds" place we could get to if we strived for it.
First, a merge commit is really a squash but with an extra parent link to a branch where the squash originated. I wish this was driven home more however the standard commit title for these PR merge commits is always something like these:
Those titles don't help me. It has the PR number, which I have to individually click on and look up, and a branch name, which could easily be too brief, poorly written or even irrelevant.
Nothing in Git, from what I understand, prevents those titles being similar to the squash titles I shared above. So if GitHub produced them you'd suddenly have the clarity you have with squash merges.
Second, the Git CLI commands and various Git UIs default to showing/working with the full branch out history of everything. You need to know special parameters or set certain settings in order to see and work with a simplified linear view. If these tools defaulted to a linear history and required special parameters in order to drill down into merged branches I feel that would improve the developer experience a bunch. You get an easy to understand summarized linear history and the ability to go deeper when you need to.
Of course, outside of the GUIs maybe, any changes in how people work with Git are very hard pushes from what I understand.
Third, GitHub could allow the PR author to set the merging strategy to be used in advance. Since each developer may have their own style, some with very intentful and effortful PR commits like your own @wesen, others who commit WIP things quickly and often, and some with a mix. This gives the author the ability to decide how they will ultimately formulate their PR. Obviously certain projects can still limit what PR merge strategies are available, and admins could still override the author's preset wishes. But the author at least has a chance to influence how the commits in the PR will be laid into the base branch.
But just to wrap up my argument, I think like any other tool squashing vs merging vs rebasing are options that teams can consider and make a decision on using given whatever their needs and circumstances are. There is no one size fits all approach to it.
Tell that to GitHub, they seem to have made squash commits the new default. Not possible to make merge commits in their web UI anymore, and that sucks!
I think you can with an option?
It used to be possible, but in practice, it is always grayed out, and this does not seem to result of a conscious decision by the project maintainers.
I think it has to be enabled in the repo settings. But now that
git bisect
andgit blame
all support--first-parent
, I really don't see the point anymore. Maybe save some space because the intermediary blobs are garbage collected, but that kind of only makes sense on big public repositories like linux. And even then, people maintain different repositories that still keep the individual history.My understanding is that GitHub also has some kind of hidden tag on the PR branch, so you can still view it on GitHub after it is squashed, so it presumably doesn't save any space for them.
Yep. Not squashing your commit is like a submitting your rough draft for publishing to an editor. It's unnecessary verbosity.
I'm not sure I understand. If you don't want to see the individual commits, you don't have to look at them?
In the past I've argued against squashing commits from the POV of making bisects easier later. That said, if your bisect is broken by intermediate buggy commits, that could throw you off too.
Probably my main reason to dislike squashing commits is that I like to review bigger PRs by going commit by commit, so I can better grasp the author's original intention and thought process as they evolved it.
I don't buy the idea of "tidying up" previous commits and I'm not sure who would really benefit from that. Seeing misconceptions and things that doesn't work out, and what you did to arrive at the current solution, that's valuable for my understanding.
I'm so glad that someone mentioned this, because I was thinking the exact same, and didn't understand this bizarro upside-down world, where people seem to think pull requests were easier to review if all the commits were squashed beforehand.
Yes, it's good to do a bit of tidy up with an interactive rebase first, if you have too many WIP commits, or "oops, missed this bit" commits, but I'd generally prefer more rather than fewer commits.
Makes total sense. I "tidy" up commits because I often do partial commits (staging individual hunks) in the moment, and then I go back to make sure everything builds properly at intermediate steps. I agree that bisecting on non-building side branches is a serious pain. There's ways to address things, but nonetheless, not the best experience.
Yeah, me too. This is one of (if not the most) important practices you should do to make the PR review process quick. A quick PR review process in turn speeds up the shipping of code => business moves faster => greater chances of success for the company.
Always make your commits nice before asking for a PR review!
do you look at individual commits when doing a review? because the diff view shown is just the comparison of the trees, the history itself is irrelevant.
I do look at individual commits of the PR, yeah. Sometimes it makes sense to split up a task into multiple commits, or include refactor work, and that work should not be combined IMO.
I agree. I'm not the greatest at this (often solo dev on things), but it's a good skill to know.
I'm using GitLab at work and we are using the Squash-and-Merge option of GitLab.. I don't know how that compares to the way GitHub is doing it. But I also don't know how Squash-and-Merge compares to manually squashing, pushing and then opening a PR
We, as a team, are using Squash-and-Merge because one single feature will be mapped to one single commit. But I suppose the same is possible with unsquashed merge commits..
Only one way to know! look at the graph, and use
git cat-file
to look into the internals.How do you mean you can't not look at it? In which context? I usually use
git log --first-parent
when doing release work, and only look at side histories when I need to get in there.I really liked the way you explained the way squash is working. But it seems that the main argument against squashing is that it just drops the history?
That's fair, but it doesn't mean squash is bad, it means one just need to know the cost, correct?
Yes. But most people argue that it "cleans up" history, because they are unaware that you can easily hide the right parent when printing out logs, for example. I find losing history a very high cost to avoid using a pretty-print flag.
That's a very good argument. The other perspective to consider is that more and more people don't use git from command line so they see whatever they git tool shows, which may be unable to do pretty-print in a firts place
A number of commentors have made the distinction of squashing before the PR is submitted and after it is submitted. I contend it's an irrelevant distinction.
If you are in the squash-before-but-not-after crowd, I counter that once a PR starts being reviewed, and updated and re-reviewed, you're going to get a whole list of commits that you'll events up wanting to squash anyway.
My criteria for squashing is this: for each particular commit, if you cannot roll back that commit and have a working functional system, then there is no point in having that commits in your history; squash it out.
Now, if you think there is value in the various conversations surrounding those commits, then keep them around, off the
main
branch like this:main
branch directly...), naming the copy itarchive/branchname
branchbname
archive/branchname
in the PR's comments.As with all blanket statements, you're wrong :)
I do agree with a caveat, which is that as your contributor maturity increases, merges becone the dominant option.
IF a PR is composed of well-formed, meaningful commits, you should probably merge.
If a PR is composed of point-in-time or fix-fix-fix commits, and/or the individual commits don't build and pass tests, then you should squash them.
The point you're missing is that individual commits within a branch might be data, or they might be noise.
Agree, I ask people to put some effort into their own git histories. It not only provides better information for the rest of the team, but it also helps structure your own development workflow, and help you debug / bisect your own stuff.
That said, I like having a bunch of fix fix wtf wip wip commits still, because it still provides insight into what people struggled with, what their thinking process was, and gives a starting point when looking for some hairy stuff.
One reason I'm for the squash merge camp hasn't been mentioned, so I think I should mention it here so @wesen can correct me.
We use squash merge to make it easy to revert a whole PR since it's just a commit.
It can be reverted after some time has passed easily by just reverting that commit.
What do you recommend for this so I can leave the wrong squash merge camp and follow the righteous path oh great @wesen .
you can do exactly the same for a merge commit by using
git revert -m1
. The squash merge commit and the merge commit both point to the same tree hash, they only differ wrt the parent commits. With a squash merge, you only have 1, sogit revert
knows "ok well you just want to revert to the parent". With the merge commit, you have 2, so you have to tell it "please use the left parent (aka, the parent on the main branch) to revert to". easy peasy!I don't like Squash if you are working with good commits they should be few per PR, if you have something around 10 commits on a feature, it probably should be separated into smaller tasks.
If the developer wants to aggregate two or more commits, like if they refactored some part of the code, BEFORE MAKING THE PR, then I totally like Squash.
If you have small commits with good names, it probably is better to just Merge than Squash and Merge, e.g.
Commit 1: Add DatePicker component to App
Commit 2: Make API call for Date Service on UserScreen
Commit 3: Make API call for Date Service on PostScreen
If something is breaking, it will probably be easier to see in which commit, also allows for better cherry picking.
Oh good. I was going to say.. half of my commits are 'saving crap', 'working.. I think'. Not sure if what values those have.
Merging branches is different.
It feels so good to read a tech blog not written by a junior enthusiast with a degree trying to skip that 5 years learning period.
Thanks mate.
I despise git for always going against my way, and I am looking at alternatives CVS workflows. The whole fact that there is room for arguing is so toxic.
We occasionally need to revert a feature from a release branch if it fails UAT. While this isn't too common, It's much easier to do this if the feature is squashed into a single commit (pre-push). Yet to see much of a downside.
You can just use
git revert -m1
to revert a merge commit to the first parent (aka, what git reverting the squashed commit would do).-m1
says "revert to the first parent, aka the one that git squash preserves.git revert really just resets the checked out tree to a specific tree hash, and prepares a pretty commit message. It doesn't really have much to do with the history itself. You could pretty much get the same result by doing (haven't fully tried this out, just a sketch):
How easy it is to do git reverts in unsquashed commit histories?
just as easy as a revert of a squashed commit. Pass in
git revert -m1 commit
, it will then use the "filesystem state" of the left parent node as the revert result, just as it would if you had a single squash commit.Cool, I think that most people don't use merge + squash and it's something that with the
--first-parent
option doesn't really matter now if they do or not.awesome article, at just the beggining it was obvious you know what you are doing, keep it up
If you have a good reason to squash commit, please post it here. But I don't think you have.
I have some pretty good reasons to squash my commits when working with a team:
There's a couple of tricks to have an easier time rebasing:
I always think "tree state" first, more so than caring about individual commits. I can always link up the graph by manually putting in a parent link when merging, if I do want to show what happened in the history.
Realizing that git only cares about file contents, not diffs or commit or patches, really freed up how I can navigate "complicated" issues.
For the last two points you raise, my approach is to use
--first-parent
or similar flags to just look at the part of the history i care about (usually, one commit per ticket on the main branch) and link it up to product features (the ticket themselves). No need to squash.I think you assume that people use Git to a high standard and they commit with more intent always... often times in practice a PR/MR is filled with commits like
wip
,wtf is this?
,fix
,wip again
. I highly encourage people to Squash & Merge those PRs ;dIf you are being diligent and maintain clean git history or cleanup your commits like a good lil developer, by all means, rebase those commits to main ;d
Is this because you think they don't have the times/skills to clean up their own history, and thus delegate it to merge time?
Haven't wanted to press the matter when other things seem more important. Easier to add husky and semantic commit linting with pre-commit hooks imo ;d
Well said. I will be using this. 😅
Absolutely true that
merge
is the heart of git and squashing is a kind of perversion. It just so happens that it's exactly the right kind of perverted for some team workflows.In my company we have a pretty strict rule about squashing your commits on your private branch into a neat, minimal history before you merge your PR. This makes sense for keeping shared history manageable but causes many problems.
git rebase -i HEAD~2
I don't get the last 2 commits, I get all the commits that were merged in. That's one of the reason our company's policy is torebase
on trunk, notmerge
it in. You can see how far we're getting from the promised land of merge-only purity here, and toward the greater pain of fixing merge conflicts with rebase.And there's also a much more important problem. With every PR, the last commit has passed through an extensive CI pipeline of checks. Ensuring every commit in the PR passes all checks infeasible. As a result, all those other commits must be assumed to be broken (especially when they're artificial Frankenstein commits created flyby in interactive rebase). These unsafe commits lie around as hand grenades with loose pins left strewn all around. Release managers must be extremely careful to avoid them when trying to assemble a good release. As a rule of thumb that means ignore everything except merge commits, but what if there's a fast-forward merge? We could get a workaround on that but just why should a large team of variously skilled developers be exposed to a history that is mostly made of unsafe commits?
A solution
Outcome
Drawbacks
git blame
is less helpful for getting granular detail. If we were using git the way its free software advocates intended, as a complete and self-contained source of truth for codebase history, then this would be a dealbreaker. But that's not how we're using it. We have a self-hosted Gitlab and the PRs with all their attached comments and CI runs are a far richer source of historical information than the git repo alone. When I need to understand why something was done that's always where I first look. The purism of avoiding vendor lock-in is nice, but I've never seen it amount to something for repo hosting.Conclusion
I agree with these people that the benefit far outweighs the gotchas in practice.
For any developers who actually code professionally in a big team, this is horrible advice, nobody has time to care about your commit history it's just noise. People don't spend time investigating where this original issue happened in which sub-commit of which branch, blah blah, you are busy finding solutions and moving on to the next task.
Pedantic un-pragmatic advice, ignore this article and start coding, you are not an academic or a historian.
IMHO the main problem when you do NOT squash your work (generally it's a feature or a fix) is that, sometimes or often (it depends on how many devs are working on the same repo and how disciplined they are) you have to fix more merge/rebase conflicts.
Let's say you send a PR with 5 commits, and you have made changes on a same line in 3 of those 5 commits. And after that you have to rebase your PR (because other PRs were accepted before yours) and unluckily you got a conflict on that line... you will have to fix 3 times a merge conflict. but if you squash your work you have to fix just 1 merge conflict. So all those little code refactoring made during the development of the same feature/fix could multiply these problems exponentially.
Another aspect is that all those sub-feature/fix commits make sense just for the author, but for the other devs generally not.
But what I'm saying depends on some assumptions I've made, for example if you're working alone in a repository these issues will probably never happen.
Anyway, although I don't agree with you, I liked the article for the discussion it generated.
Thank you!
Awesome article.
nice stuff
Awesome post
Thanks
Great post!
Really helpful.
“Squashing commits has no purpose other than losing information.”
Cleaning your house has no purpose other than losing dirt. Pretty neat tho…