Bitwise Sort By Matt Hannon
What is Bitwise Sort It is an algorithm that works with the individual bits of each entry in order to place them in groups of the most significant bit(MSB) to the least significant bit(LSB). It, by itself, is an in-place algorithm to minimize space. It is more than often used in conjunction with another sorting algorithm to increase speed or decrease memory usage, but can work by itself.
Bits A bit is a binary digit representing a 1 or 0. Bits are often arranged in combinations to create representations of numbers. On a Intel x86-32: Nibble - 4 bits – 16 combinations Byte - 8 bits – 256 combinations Word - 16 bits – 65,536 combinations Double word - 32 bits – 4,294,967,296 combinations
More on bits Bits work on a base 2. Each bit entry represents consecutively: etc Example: 1011 = (4) + 8 = = 1 + 0(2) + 0(4) = = 1 + 0(2) + 0(4) + 0(8) + 0(16) + 32 = 33 Notice on larger values, it takes more bits to represent them, and vice-versa on smaller values.
Bits and sorting Say you wanted to sort the numbers: 23, 13, 3, 2, 54 Their bits representations are consecutively: , , , , Bitwise sort process: 1. Find the MSB which is 6 of largest number which is Check all values to see if they have bit 6 on. 3. If bit is on, place at end of list, and swap what you replaced. 4. Repeat steps 1-3 for remaining list while decrementing MSB. End result is , , , , which is 3, 2, 13, 23, 54 !OUT OF ORDER! WHAT! PLUS SLOW! NOTE: This technique run through all the way on large lists is irrational and slow. Read on and it will become clear how much potential this simple process can have on massive databases that require quick results. Further the implications of memory conservation.
Out of order Fixes: You can continue and do multiple cycles of the bit checking within each group of set bits to correctly sort the list. The bit checking operations are costly and this method is extremely slow. Although there are CPU extensions that exist that would change the performance dramatically. You can use another sorting algorithm. This could be good if you play on the attributes that bitwise sort offers: Each group of bits are closely arranged A smaller portion of the list to work with at a time. After one group of bits are arranged, you no longer have to deal with them.
Applications Bitwise sort has been tested effectively with comb sort, increasing its speed on large amounts of data substantially. Bitwise sort has also been used effectively with bucket sort to decrease the amount of memory usage by at least (2^MSB) or in optimal situations, over half of the memory use of bucket sort would be no longer required. The speed is decreased although but not by a substantial amount, and on large databases, the memory reduction far outweighs the speed decrease.
Best case lists Good lists for bitwise sort would be something like: 10001, 01001, 00110, 00011, With one cycle, this entire list would be sorted because each has a different most significant bit. A large list is good for bitwise sort because the cost of the individual bit checking code is expensive, and with a large amount of items, this expense can be outdone by its progressive cycle results.
Bad case lists Bitwise sort would not do well with a list such as: , , , , , Because it would generate the exact SAME list with one cycle! More cycles would yield a very slow response. Bitwise sort would do poorly with lists such as: , , , A large range between the MSB and other bits would mean worthless time consuming checks. This is an issue which can be fixed, but you have to take in account the processing time that would go into figuring out these rare occurrences, and judge the time dilation in order to evaluate whether the additional checks are speed efficient.
Restrictions Negative numbers In two’s complement, the modern method of representing signed numbers, the MSB is the sign bit. MSB = 0 = Positive MSB = 1 = Negative So the bitwise sort would give some odd results. This is an issue that can be fixed quite easily, although it currently isn't in my recent implementation.
Implementation Process: 1. Loop starting at MSB to 0 2. Loop through items to be sorted 3. Check current bit against current item 4. If on, swap with closest to end value not yet sorted. 5. Repeat current item with steps Do either recursive bitwise sort or another sort on current group of bits 7. Loop through again
Pseudo code Function bitwiseSort(Items, amount) var maxValue, MSB, numberSorted = amount maxValue = findMaxValue(Items, amount) MSB = findMSB(maxValue) Loop until MSB is 0 Loop from 0 to numberSorted TESTBIT MSB against Items[loop] if 1 then swap(Items[loop], numberSorted) numberSorted--; loop--; Repeat loop GenericSort(Items) //either recursive bitwise sort or another sort algorithm Repeat loop
Memory usage reducer On sorts such as bucket sort, they initialize enough memory for all possible values up to the biggest value. On values such as (63) and (31), the MSB is 6 which is 2^(6-1) = 32. Because with values like 63, bitwise sort already knows that their MSB is set, thus there is no reason to initialize memory for it. Thus saving 32 pieces of memory, over half of what we started with! With a 32bit type, you can optimally save 2 billion pieces of memory! It slows down the bucket sort, but it is still substantially faster than other sorts.
Optimizations Use algorithms to determine bit ranges so not doing useless bit checks. Employ x86 optimized assembly to speed up bit check. Intel’s bit scan reverse/forward(BSR/BSF) And others Make use of many of the “bit twiddling hacks” to optimize all bit operations. Also employ techniques such as bit counting, and optimized swap. Bit twiddling hacks combined with other intuitive ideas, could easily out perform something like Bucket Sort, and many of the other sorting approaches.
From Here There is a wealth of information published for free from Intel that has many more optimizations and concepts that could excel this algorithm exponentially. Note: I have seen a similar type of approach to mine from the Radix Sort, but the key difference is bytes vs. bits and how they are handled internally by the CPU. If you do benchmarking tests to verify the speed differences, bits are noticeably quicker. Also, with bits, you can not only increase the speed, but also decrease the memory usage. With the right implementation, this Bitwise Sort could greatly effect the future of sorting algorithms.