Ford-Johnson sorting algorithm, or merge-insertion sort, is not a very well known algorithm, it is optimised for making the least amount of comparisons between elements. The reason on why this algorithm is not well known is that it's neither fast, nor the best minimum sort algorithm, which is why there are almost no good resources on this algorithm, between scientific papers, hard to read computer science books, or obscure threads on Reddit or Stack Overflow.
This algorithm has little to do with merge sort but is often confused with it. In the 42 school it's a common mistake, or sometimes a way to cheat, to implement much simpler hybrid sort between merge sort and binary insertion sort. This article aims to give an understanding of the algorithm using simple explanations and step-by-step execution of the algorithm on a set of 22 numbers.
My name is Eldar, and I'm a student at 42 Lyon. Hence, the article will also have the context of one of the school assignments. I decided to write an article on this algorithm because almost no one does this algorithm correctly and understandably since it is hard, and easily one of the more complex sorting algorithms out there.
Steps of the algorithm
I recommend to read first Wikipedia's description of the algorithm, as well as Knuth's. Both of those sources describe the steps of the algorithm a little bit differently. I intend to give my own explanation as well, as I believe that rewording helps to develop understanding; I will also later explain each step individually in depth. Hence, my explanation of the steps for sorting numbers using this algorithm in ascending order:
- Recursively divide into pairs of numbers, pairs of pairs of numbers, pairs of pairs of pairs of numbers... etc, and sort them by the biggest number, until we can't form any pair anymore. If there is an unpaired odd element, leave it be.
- I will also refer to the units on which we operate on as
elements
.elements
can be numbers, pairs of numbers, pairs of pairs of numbers etc... - We will refer to the smallest element in each pair as
b
, and to the biggest asa
. Depending on the index of the pair, we will call themb1
,a1
,b2
,a2
...bx
,ax
.
- I will also refer to the units on which we operate on as
- Create a sequence (also referred to as
S
orthe main chain
in Wikipedia and Knuth's book respectively) out of the smallest element of the smallest pair (b1
) and the rest ofa
s. If the first step was done correctly, this sequence will be sorted. Create the sequence that consists of the rest ofb
s (also referred to asthe pend
).- I will refer to those sequences as
the main
andthe pend
from now on.
- I will refer to those sequences as
- Binary insert the elements from the pend into the main, in the order based on Jacobsthal numbers. I will explain later why and how it works.
- If we can't insert elements into
the main
using Jacobsthal numbers anymore, we insert them normally, in order, using good ol' binary insertion. - Insert the odd element that didn't fit into any pair.
If everything is done correctly, the sequence will be sorted. Of course, I omitted some of the details, for the sake of giving the general description on what we will be doing. If the algorithm is still not clear, it's normal, the algorithm isn't easy to understand. After you see the visualisation of the algorithm using the established nomenclature, you will have a much better idea.
Step 1: the division into the pairs & sorting
This step is pretty confusing to understand, and also tricky to implement in code, but good visualisation helps. The difficulty is that with each recursion call we operate on bigger and bigger groups of elements: pairs of numbers on the first call, pairs of pairs of numbers on the second call, pairs of pairs of pairs of numbers on the third etc, until we can no longer divide our original sequence into a single pair of elements.
On top of this, we also need to preserve the original pairing: we can't break pairs formed on the first recursion call on the fifth one. This is important for this algorithm, as it relies on certain guarantees about the elements: that the biggest element of b2
is smaller than the biggest element of a2
, b3
is smaller than a3
, b4
is smaller than a4
, bx
is smaller than ax
. This is important for the algorithm's nature of a minimum comparison sort. The easiest way to achieve this is to simply swap entire groupes of numbers with each other when we sort them. It'll be much clearer in a minute.
For the visualisation I'll denote at the top, the recursion depth we are currently it. Because the algorithm requires recursive sorting of pairs before the insertion part, I can demonstrate this process in this section.
This level of recursion depth is easy to wrap head around: here we operate just on numbers. If the number in the pair is bigger than the other one, we simply swap them in place. After sorting the pairs, I put the b
and a
labels that I described earlier underneath each element in the new sequence: now it should be clear how the labels work.
This is where it starts to get funky: now our element is a pair of numbers instead of just a number. Also, we have an odd element that is left without its pair. We name it b6
.
On top of the white borders denoting pairs, here there are red borders for the pairs of pairs. Notice also that I placed the labels against the last number of each element: this is no coincedence, the labels reside under the number that we actually use for comparison/pair sorting: 11
against 17
, 16
against 15
etc, but the swapping happens on an entire subsequence of numbers. Green dotted border denotes an odd pair. Here, only one swap occures: of 8 16
and of 6 15
.
Now we have a blue border to denote pairs of pairs of pairs... It's already mouthful to say. But we must go deeper.
Notice that there are numbers without any borders: I chose this way to separate numbers who can't even form an element. At this level, the smallest unit on which we operate consists out of 4 numbers, and the pairs are formed out of 8 numbers. What we are gonna do with those numbers? Simple: we ignore them.
At this level, two swaps occured: 2 11 0 17
and 6 15 8 16
, 3 10 1 21
and 9 18 14 19
.
Now we are at the level of pairs of pairs of pairs of pairs. We have 22 elements, but the element here is formed out of 8 numbers: hence only one pair of two elements.
17
is lesser than 21
. Lucky us! No swaps happened at this level.
There is nothing to be done at the level 5 of recursion: because the amount of numbers in the element doubles with each recursion call, at one point we won't be able to form a pair of elements. This is where we break from this level. Because at level 5 one element consists of 16 numbers, and we have only 22 numbers overall, pair of 2 elements is impossible to form. From now on, we proceed with the steps starting with the level 4, then we go to level 3, 2, 1... until we got our sorted number sequence.
To really understand what's going on at the sorting of pairs level, I invite you to come up with a random sequence of numbers and try to sort them on paper. After doing it a couple of times, you will probably have some ideas on how to implement this step in code.
Steps 2, 3, 4 and 5: the insertion
This is another very confusing step for a lot of people. This is where binary insertion of our merge-insertion algorithm comes in. What do I mean by binary insertion? It means that we use binary search algorithm on our main
, which is a sorted sequence, to find a place for insertion for our pend
elements. Insertion itself works very similarly to how it's done in the binary insertion sort.
Here is where the labels we defined earlier will come in handy.
Step 2 is easy:
The main
is initialised with the elements {b1, a1}
and then with the rest of a
s.
The pend
is initialised with the rest of b
s starting from b2
.
The main
at this point should be sorted, but not entirely. Like with comparing and swapping at the step 1, only the biggest number in each element counts for "being sorted". Let's demonstrate this with the example we already have. This is where we're at after continuing on the level 4 of recursion:
What counts as "sorted" are the biggest numbers in each element in b1
and a1
, 17
and 21
respectively. Reminder, I will place the labels under the biggest numbers of each element to make it clear which numbers actually participate in comparisons.
At this level we don't have any pend
elements, since we don't have b
s and a
s besides b1
and a1
, which both start as a part of the main
. By the way, I remind you: b1
is a part of the main
because we know that a
s are sorted, and b1
is smaller than a1
; so we don't need to do any comparisons, because we already know that b1
is smaller than any of the a
s. That is one of the optimisations of this algorithm to minimize comparisons.
No pend
- no insertions. Move on.
We are done with level 4: nothing else to do here. And here comes the ignored numbers 5 12 4 20 7 13
from before, which can't even form an element. They are still the part of the sequence, they just didn't participate in the insertion.
At this level our element consists of 4 numbers. From the start, we divide elements into b
s and a
s. We initialize the pend
and the main
as I said before: the main
is b1
and the rest of a
s, the pend
is the rest of b
s. Notice that b
s and a
s are changed. We change the labels at each recursion level, and also at the start of insertion. Notice, however, that the actual number pairings remained unbroken.
At this level we actually have one element in the pend
: however, it is not enough elements for insertion in the order of Jacobsthal numbers. Hence, the step 3 is skipped: we go to step 4 (ordered insertion without Jacobsthal numbers order). We still, however, should insert the numbers optimally for minimal comparisons.
From now on, I will mark the current element for insertion with a blue border, the elements we should try to insert our pend
element in with green, and the bound element, corresponding a
of the current b
in the pend
is in red. Bound marks the end of the area of search. This means that we should insert b2
into {b1, a1}
. In other words, we should compare 19
, the biggest number in the element, to {16, 17}
, the biggest numbers in the elements of our search area, and then insert the whole element after that in it's corresponding place.
Since 19
is bigger than 16
or 17
, we insert the entire element b2
(9 18 14 19
) after them.
And now, we need to insert the odd element into the main
. We don't have any guarantees about the odd element, hence the whole main
is our area of search: we compare 20
against {16, 17, 19, 21}
using binary search.
We insert b3
(20
) between b2
(19
) and a2
(21
), with all its numbers.
So, at the moment, our sequence is this: 6 15 8 16 2 11 0 17 9 18 14 19 5 12 4 20 3 10 1 21 7 13
. When we go down a recursion level, the main
and the pend
initialisation, as well as insertion, will go from this sequence.
We divide the resulting sequence into the elements again, and initialize our main and pend. The sequence has changed since the recursion level of 3, and the labels too: but the original pairs (like ((2 11) (0 17))
) remained unbroken. So the guarantee that the (2, 11)
(now b2
) element is smaller than (0, 17)
(now a2
) is still valid.
This is also where the Jacobsthal numbers come with this algorithm! Before we proceed further, let's spend a little bit of time to understand what's going on here. Why Jacobsthal numbers? Why bother with them, what do they do with sorting?
Why Jacobsthal numbers?
Ford-Johnson algorithm uses an important binary search algorithm optimisation that further cuts the number of times we compare our elements to each other. The maximal number of comparisons for the binary search is the same for 2^k
as for 2^(k+1)-1
. That means that the maximal number of comparisons is the same for 4(2^2
) and 7(2^3-1
). Same thing for 8 and 15, for 16 and 31... etc. This algorithm uses this property of the binary search during the insertion phase, if it can. What do I mean by "if it can"? Well, as you remember, we didn't used any Jacobsthal numbers during unrolling of our level 4 and 3 recursions. Wait, but how are those damn numbers related to this optimization? Well, turns out, that if you insert the pend
elements into the main
using a specific order dictated by the Jacobsthal numbers and we respect the bound elements, the surface area for insertions is 2^(k+1)-1
for (nearly) all elements, as opposed to 2^k
for the first element and God knows what for next elements, since we would naively insert numbers as in regular binary insertion (like we did during the insertion at the recursion level 3).
The way Jacobsthal numbers dictate the order of insertion is like this: we start from the Jacobsthal number of 3. We start insertion from element b3
. We insert elements in the reverse order starting from this element, until we hit b
element of number of previous Jacobsthal number. In other words, the amount of inserted elements is current_jacobsthal - previous_jacobsthal
.
For the Jacobsthal number of 3, we insert 2 elements (3 - 1), b3, b2
.
For the Jacobsthal number of 5, we insert 2 elements (5 - 3), b5, b4
.
For the Jacobsthal number of 11, we insert 6 elements (11 - 5), b11, b10, b9, b8, b7, b6
.
I hope that you got the idea. And that you understand now that we can't always insert numbers this way. If there's not enough elemets to insert, for example, the Jacobsthal number is 11 (we should insert 6 elements), but we have only 3 elements in the pend
, then we need to handle it somehow, and the simplest way to do this is to insert the pend
elements in order, as we did in recursion level 3.
Ok, but now again, how exactly this order optimises the amount of comparisons for our binary insertion? Let's look at what happens if we insert pend elements in this specific order! Time for another visual example! Yay!
The elements are ordered by labels in these exapmples, but the actual order with actual numbers might be more mixed, so it's like this in the example for clarity.
As you can see, the order dictated by the Jacobsthal numbers is what allows us to use this optimisation in the algorithm: we can insert large amount pend elements into the main with binary search only on 2^(k+1)-1
elements. This works because due to how the algorithm works, we know that the bound for bx
is ax
. So when we insert, for example b5
instead of searching in {b1, a1, b2, a2, b3, a3, a4, a5, a6, a7, ...}
we can search in {b1, a1, b2, a2, b3, a3, a4}
, because we know that there is no point to include into the search range a5
and anything after that since they are guarenteed to be bigger. After we inserted b5
and it is time to insert b4
, the bound is decreased by one (from a5
to a4
), but the amount of elements is increased by one (we inserted b5
), so the amount of elements in this range stayed the same.
Note: however, sometimes, as we will see soon, the Jacobsthal number insertions guarantee 2^(k+1)-1
in the worst case. In the best case the range of search may be even smaller, if inserted element is going to be bigger than any other number in the search range. Then, for the next number the search range will actually be lower. You will see this in the continuation of the execution.
Now that we understood what purpose the Jacobsthal numbers have here, I want to bring up that there is also an optimal way to calculate the n'th Jacobsthal number. Rounded result of (2^(n+1) + -1^n) / 3
will produce an offsetted for our use case (it normally starts from 0) Jacobsthal number.
Back to where we left
So, we inserted our b3
into the main
, but as you can see, it shrinks the search area for the b2
, as this element turned out bigger than a2
, so the area of search shrank. Remember the note from the "Why Jacobsthal numbers?" part? This is the example. It may happen sometimes, and it's another tricky thing of this algorithm that needs to be accounted for.
Ok, so now, I think, we are fully equipped to comprehend the rest of the algorithm, so let's proceed with its steps without any more yapping!
With this, the second level of the recursion was resolved. The resulting sequence after this level looks like this:
And with this, our sequence became sorted. This algorithm is not an easy one to understand and also to implement, but both swapping and insertion follow specific patterns, peculiarities of which you can calculate, if you do this algorithm on paper/in your favorite text editor again and again, which I encourage you to do, as doing is the only way to really cement theory.
I have implemented this algorithm in the context of doing the last C++ exercise of module 09, in the 42 school; and you may find my take on this algorithm here, in cpp09/ex02.
Thank you epolitze for proofreading this article.
Top comments (2)
Ineresting algorithm, thx for detailed describing.
thanks man keep going