Stupid Columnsort Tricks Geeta Chaudhry Tom Cormen Dartmouth College Department of Computer Science
What Do We Know About Columnsort? Sorts N values on an r s mesh Uses 8 steps –Each step either sorts each column or performs a fixed permutation Divisibility restriction: s divides r Height restriction: r ≥ 2s 2 4s 3/2 –Exponent of s goes from 2 to 3/2 –Mesh need not be quite so tall and skinny –Cost: 2 additional steps –Can simultaneously remove the divisibility restriction and relax the height restriction to r ≥ 6s 3/2
Why Relax the Conditions? Columnsort applies in more circumstances Our motivation: out-of-core sorting Column height r is limited by amount of memory –Either per processor or in entire system –N = rs, r ≥ 2s 2 ==> N ≤ r 3/2 /2 1/2 –N = rs, r ≥ 4s 3/2 ==> N ≤ r 5/3 /4 2/3 –Reducing the exponent of s in the bound for r allows us to sort more values with a given amount of memory A similar technique works for applying columnsort to in-core sorting
This Talk Slabpose columnsort –r ≥ 4s 3/2 –Requires divisibility restriction Also in the paper –Subblock columnsort r ≥ 4s 3/2 with divisibility restriction r ≥ 6s 3/2 without divisibility restriction –Proof that the divisibility restriction is unnecessary in the basic columnsort algorithm
Columnsort Steps 1.Sort each column 2.Transpose entire mesh 3.Sort each column 4.Untranspose entire mesh 5.Sort each column 6.Shift down by half a column 7.Sort each column 8.Shift up by half a column
1.Sort each column 2.Slabpose: transpose within vertical slabs 3.Sort each column 4.Shuffle columns 5.Slabpose 6.Sort each column 7.Untranspose entire mesh 8.Sort each column 9.Shift down by half a column 10.Sort each column 11.Shift up by half a column 1.Sort each column 2.Slabpose: transpose within vertical slabs 3.Sort each column 4.Shuffle columns 5.Slabpose 6.Sort each column 7.Untranspose entire mesh 8.Sort each column 9.Shift down by half a column 10.Sort each column 11.Shift up by half a column 1.Sort each column 2.Slabpose: transpose within vertical slabs 3.Sort each column 4.Shuffle columns 5.Slabpose 6.Sort each column 7.Untranspose entire mesh 8.Sort each column 9.Shift down by half a column 10.Sort each column 11.Shift up by half a column Slabpose Columnsort Steps Oblivious!
1.Sort each column 2.Slabpose: transpose within vertical slabs 3.Sort each column 4.Shuffle columns + slabpose 5.Sort each column 6.Untranspose entire mesh 7.Sort each column 8.Shift down by half a column 9.Sort each column 10.Shift up by half a column Slabpose Columnsort Steps Oblivious!
Why Work With Vertical Slabs? In regular columnsort, the matrix needs to be tall and skinny Working with vertical slabs allows us to change the aspect ratio to use tall and skinny slabs We’ll use slabs that are s columns wide The mesh will have s slabs
0-1 Principle If an oblivious algorithm sorts all input sets consisting solely of 0s and 1s, then it sorts all input sets with arbitrary values Use the 0-1 Principle by looking at portions of the r s mesh Clean: all 0s or all 1s Dirty: may be mixed 0s and 1s
Step 1: Sort Each Column 0 1 dirtyr s
Step 2: Slabpose s-slab s slabs column s ≤ s dirty rows
Step 3: Sort Each Column ≤ s rows
Step 4: Shuffle s-slab s slabs s-slab s slabs ≤ s rows
Step 5: Slabpose s-slab s slabs s-slab s slabs r/ s rows ≤ 2 rows s sets of dirty rows
Step 6: Sort Each Column ≤ 2 s rows ≤ 2s 3/2 elements
Step 7: Untranspose Entire Mesh ≤ 2s 3/2 elements r ≥ 4s 3/2 ==> 2s 3/2 ≤ r/2 ==> dirty area ≤ half a column Once the size of the dirty area is at most half a column, the last four steps will finish up
Step 8: Sort Each Column dirty area resides in one column ==> done
Step 8: Sort Each Column dirty area resides in two columns ==> no change
Step 9: Shift Down by Half a Column dirty area resides in one column
Step 10: Sort Each Column dirty area resides in one column
Step 11: Shift Up by Half a Column sorted
Subblock Columnsort Adds two steps to columnsort –Sort each column –A fixed permutation The permutation is any one that distributes all elements of each s s subblock to all s columns Like slabpose columnsort, the size of the dirty area is ≤ 2s 3/2 entering the last four steps As long as 2s 3/2 ≤ r/2 (half a column), the last four steps complete the sorting
Removing the Divisibility Restriction from Columnsort With the divisibility restriction, the dirty rows after the transpose step have only 0 1 transitions Without the divisibility restriction, there may also be 1 0 transitions The proof shows that even with the 1 0 transitions, the size of the dirty area entering the last four steps does not increase Thus r ≥ 2s 2 suffices, even without the divisibility restriction
Conclusion We can get around the restrictions of columnsort Reduce the exponent in the height restriction from 2 to 3/2 –The mesh need not be quite so tall and skinny –Cost: Two extra steps –In out-of-core implementation, slabpose columnsort requires no additional I/O The divisibility restriction is unnecessary Open question: Can we reduce the exponent further within the columnsort framework?