DEV Community

David G. Pickett
David G. Pickett

Posted on • Edited on

Fastest Sort? Prove me wrong!

I rediscovered the binomial heap when a student said they wanted a Fibonacci Min Heap, and I created one and tuned it out of Fibonacci into Binomial and a bit beyond. It struck me as the fastest sort, as it has extreme locality of reference, pushing larger values into the basement never to be seen again until the winner is removed from the heap as min (or max), great for cache hits and VM page hits. And as many problems are solved with many items in the heap when it is discarded, the insert/push being O(1) is another advantage. It is also very easy to meld 2 heaps into one, O(log2(N)) of the smaller heap.

You'd think the fastest sort would be in libraries already, but it seems to have been missed. Maybe there is a faster sort out there that I missed. Maybe a faster min heap (priority queue) for cases where the heap is discarded not empty. Tell me! I wonder if NASDAQ uses it to sort orders by time-price in the bid and ask books for each symbol? Basic Data Structures education never mentions it.

Cons: It is not a "stable sort", but is, for better or worse, tolerant of duplicate keys with no simple way to discard them before peek/top/pop/delete. If you added a long sequence to the node, you could make it stable and still have great locality of reference speed for a good speed/space tradeoff.
It is not good for search, except for either min or max (not both). It is not easy to do a copy constructor, assuming that is a deep copy, as you have to duplicate all the nodes.

In comparison to reference examples of the Binomial Heap, mine is very simple: a root pointer to nodes that have the value, node pointers next and child, and a char for rank. The root is a forest or binomial trees in a forward linked list using next. I keep the root list sorted in unique ascending rank, for speed and simplicity. When inserting a new node, it has rank 0, and if it collides with a root rank 0, they are merged into a rank 1, which is subject to a new collision. The loser goes onto the winner's child list. If you insert N elements, the root list will normally have a tree for each one bit in N, so on average 1/2 log2(N) trees. The normal depth of elements is like Pascal's triangle, 1, 1-1, 1-2-1, 1-3-3-1, etc. (all totaling powers of 2). The shape of the tree is not important, as everything below root is ignored until the pop/delete of a min, when the min's children are merged to become the new root forest.

To improve performance, 1) I do not track the min during insert/push, making it even faster and simpler. 2) When the user does a peek/top of the heap, I do not search for the min among the root elements of the root forest, but instead, I merge all the trees together into one, whose value is the min (or max), even though this means the rank may be a fib. For instance, a rank 0 tree forced to merge with a normal rank 2 tree will become a fake rank 3 tree, not as big as a normal rank 3 tree but bigger than a rank 2 tree and smaller than a rank 4 tree. Merging is real sorting of many values (whole trees), narrowing the root list for many near future cases. In comparison, searching only sorts one element.

The purpose of rank, like height in an AVL tree, is to avoid a tree of one linked list, like you get in a BST with no balancing that is fed ordered data, its worst cases. Perfect rank is not necessary. In fact, it would support this purpose even if it overflows and values wrap around, but this is very hard to achieve, as a normal rank 20 tree has 1 M items. Rank triggers merges but values determine the root(s). The Binomial Heap does not have any worst case, does no rotations for balance!

I sent this on to the C++ and Java people, but so far crickets. One other user's favorite container that I need to explore! It seems to outperform the Java PriorityQueue.

Top comments (0)