2016-06-04

Why I prefer merging over rebasing in git

The argument between rebase workflow vs merge-based workflow is a lot like the argument between spaces vs tabs. There are pros and cons for each side as well as compromises in or near the middle, but at the end of the day, it mainly comes down to one person's personal preference, typically held for unspecified reasons. So I'd like to share my preferences, and my reasons, when it comes to git: the (currently very popular) version control system which features Distributed Version Control for Directed Acyclic Graphs and a Command-Line Interface that many people mock or struggle with. If you don't already know what rebasing and merging are, you should probably just look them up yourself, as I am only going to give a brief overview. If you don't even know what git is, this blog post is not for you, and you can safely bookmark it for later when you do eventually learn programming.

On one hand, you have rebasing: you change a commit's parent to the one you want it to be, and maybe fix any conflicts that result from that change. In either case, the hash of the commit changes, because in git, a commit's hash is based on itself and all commits that come before it. Changing any commit changes its has and all the commit hashes after it. (This has the excellent property that simply having a commit's hash is enough to guarantee that it will always be the same way it was when you first found it). Thus, rebasing has the effect of destroying history by changing the parents of the commits and sometimes even the content of the commits. It can also result in commits appearing in the DAG out of chronological order. But the result is a DAG which is a nice straight line.
A "clean" DAG, typical of a rebase workflow.
On the other hand, you have merging: you create a new commit which has multiple parents. It should be pretty obvious what the implications of that are. A merge commit really only describes how to resolve conflicts. If there are no conflicts, you should consider a merge commit to be effectively empty (even if it isn't actually implemented that way). Note that merging does not modify any other commits - it only creates a new one that explains how to combine others properly. Thus, history is preserved, but at the cost of making the DAG look messy.
A "messy" DAG, typical of a merge-based workflow.
For the most part, the difference is aesthetic: you either want a clean DAG and could care less for history, or you want to preserve history and could care less how the DAG looks. However, I believe there are more compelling reasons to prefer one over the other, and that the aesthetics may actually not be what they seem.

Git supports signing commits with GPG for good reason, and GitHub supports it too. When you rebase commits, you change them, and thus it is impossible to keep the signature. Instead you have to either drop the signature, or create a new signature, and 99% of the time it can only be your signature (unless you want to rebase commits one-by-one and pass the responsibility around to each author in turn). This completely defeats the purpose of signing commits. With merging, not only do all of the parent commits keep their signature, but the merge commit itself can be signed too, which is especially useful when conflicts have to be resolved. This alone would be enough for me to avoid rebasing except in special cases.

But even if you don't bother with signing commits (even though you should), I still think it is important to preserve history. I'm not worried about little things like commits being out of chronological order. I'm concerned about making changes to resolve rebase conflicts. No matter what version control software you use, there will be conflicts that arise from simultaneous development. (Unless you don't allow simultaneous development, which is stupid). There are even conflicts which don't manifest at rebase or merge time, and are only evident when trying to compile the code or run the program. (For example, a unused variable is deleted in one branch, and another branch adds code that uses the variable - both independently work, and when rebasing or merging git will not detect any conflicts, but the resulting code will not compile without modification - I typically check for this during the merge and make changes in the merge commit).

With rebasing, you have to either modify commits to resolve conflicts (and which commit(s) do you modify?) or you have to create a new commit that resolves the conflicts (which is basically a merge commit with one parent - the wrong parent). This also causes trouble when bisecting the DAG - suddenly you run into code that would have normally compiled with a merge-based workflow but now doesn't compile due to the rebase-based workflow (if you don't modify each commit during the rebase to ensure it compiles). In either case, you are changing the commit authors' original intentions. They originally intended for their commit to be based on the parent(s) it was already based on, and they probably didn't intend for someone to modify their commit. Git does support multiple authors on a commit (but only one committer), but are you really going to go through all that effort? And how will you know which changes are attributed to which authors? This alone is enough of a headache for me to avoid rebasing except in special circumstances.

With a merge-based workflow, these problems do not exist: you never need to modify a commit, at all. No commit hashes are ever changed. Everything remains exactly as the author originally intended, and conflicts are resolved where they should be: in the merge commit. You have preservation of history as a side-effect.

As for the aesthetics, a rebase workflow only hides the complexities of parallel development. It hides them in a way that makes them difficult to rediscover when you need them. I don't believe you will never need to know how or why a change occurred in the past. A merge-based workflow doesn't try to hide the real complexity, and also makes it dead simple to discover it when needed: just look at the DAG. No detective work required, no bothering with commits not in chronological order, or wondering if all the changes in a commit were really made by the commit's author.

If understanding how and why code changes over time wasn't important, we wouldn't be using version control software like git; we'd be using dumb snapshot backups (aka primitive version control). This is a case where hiding complexity makes it harder to understand what has really happened. Again, this alone is reason enough for me to use a merge-based workflow.

Since there are multiple reasons to use merge-based workflows that I have said are reason enough alone, I think it's pretty obvious I almost never touch rebase. But I encourage you to do the research and make decisions for yourself, rather than just listening to the one guy who preferred one over the other for unspecified reasons or because "it makes the DAG look better". Of course, ninjas already know to do that, but you totally aren't a ninja, so I guess you just forgot.
I had a good reason for this, I promise. But I also wanted to see what it would look like.