A Fast and Simple Approach to Merge Sorting using AVX-512
Abstract
Merging and Sorting algorithms are the backbone of many modern computer applica- tions. As such, efficient implementations are desired. New architectural advancements in CPUs allow for ever-present algorithmic improvements. This research presents a new approach to Merge Sorting using SIMD (Single Instruction Multiple Data). Traditional approaches to SIMD sorting typically utilize a bitonic sorting network (Batcher’s Algorithm) which ads sig- nificant overhead. Our approach eliminates the over- head from this approach. We start with a branchless merge algorithm and then use the Merge Path algo- rithm to split up merging between the different SIMD paths. Testing demonstrates that the algorithm not only surpasses the SIMD based bitonic counterpart, but that it is over 2.94 times faster than a standard merge, merging over 300M elements per second. A full sort reaches to over 5x faster than a quicksort and 2x faster than Intel’s IPP library sort, sorting over 5.3M keys per second. A 256 thread parallel sort reaches over 500M keys per second and a speedup of over 2x from a regular merge sort. These results make it the fastest sort on Intel’s KNL processors that we know of.