CSC 213 – Large Scale Programming
Bucket-Sort Buckets, B, is array of Sequence Sorts Collection, C, in two phases: 1. Remove each element v from C & add to B[v] 2. Move elements from each bucket back to C A B C
Bucket-Sort Algorithm Algorithm bucketSort( Sequence C) B = new Sequence [10] // & instantiate each Sequence // Phase 1 for each element v in C B[v].addLast(v) // Assumes each number in C between 0 & 9 endfor // Phase 2 loc = 0 for each Sequence b in B for each element v in b C.set(loc, v) loc += 1 endfor endfor return C
Bucket Sort Properties For this to work, values must be legal indices Non-negative integers only can be used Sorting occurs without comparing objects
Bucket Sort Properties For this to work, values must be legal indices Non-negative integers only can be used Sorting occurs without comparing objects
Bucket Sort Properties
For this to work, values must be legal indices Non-negative integers only can be used Sorting occurs without comparing objects Stable sort describes any sort of this type Preserves relative ordering of objects with same value (B UBBLE - SORT & M ERGE - SORT are other stable sorts)
Bucket Sort Extensions Use Comparator for B UCKET - SORT Get index for v using compare( v, null) Comparator for booleans could return 0 when v is false 1 when v is true Comparator for US states, could return Annual per capita consumption of Jello Consumption of jello overall, in cubic feet State’s ranking by population
Bucket Sort Extensions State’s ranking by population 1 California 2 Texas 3 New York 4 Florida 5 Illinois 6 Pennsylvania 7 Ohio 8 Michigan 9 Georgia
Bucket Sort Extensions Extended B UCKET - SORT works with many types Limited set of data needed for this to work enumerate Need way to enumerate values of the set
Bucket Sort Extensions Extended B UCKET - SORT works with many types Limited set of data needed for this to work enumerate Need way to enumerate values of the set enumerate is subtle hint
d -Tuples Combination of d values such as ( k 1, k 2, …, k d ) k i is i th dimension of the tuple A point ( x, y, z ) is 3-tuple x is 1 st dimension’s value Value of 2 nd dimension is y z is 3 rd dimension’s value
Lexicographic Order Assume a & b are both d-tuples a = ( a 1, a 2, …, a d ) b = ( b 1, b 2, …, b d ) Can say a < b if and only if a 1 < b 1 OR a 1 = b 1 && ( a 2, …, a d ) < ( b 2, …, b d ) Order these 2-tuples using previous definition (3 4) (7 8) (3 2) (1 4) (4 8)
Lexicographic Order Assume a & b are both d-tuples a = ( a 1, a 2, …, a d ) b = ( b 1, b 2, …, b d ) Can say a < b if and only if a 1 < b 1 OR a 1 = b 1 && ( a 2, …, a d ) < ( b 2, …, b d ) Order these 2-tuples using previous definition (3 4) (7 8) (3 2) (1 4) (4 8) (1 4) (3 2) (3 4) (4 8) (7 8)
Radix-Sort Very fast sort for data expressed as d-tuple Cheats to win Cheats to win; faster than sorting’s lower bound Sort performed using d calls to bucket sort Sorts least to most important dimension of tuple Luckily lots of data are d-tuples String is d-tuple of char
Radix-Sort Very fast sort for data expressed as d-tuple Cheats to win Cheats to win; faster than sorting’s lower bound Sort performed using d calls to bucket sort Sorts least to most important dimension of tuple Luckily lots of data are d-tuples Digits of an int can be used for sorting, also
Radix-Sort For Integers Represent int as a d-tuple of digits: = = Decimal digits needs 10 buckets to use for sorting Ordering using their bits needs 2 buckets O (d∙ n ) time needed to run R ADIX - SORT d is length of longest element in input In most cases value of d is constant (d = 31 for int ) Radix sort takes O ( n ) time, ignoring constant
Radix-Sort In Action List of 4-bit integers sorted using R ADIX - SORT
Radix-Sort In Action List of 4-bit integers sorted using R ADIX - SORT
Radix-Sort In Action List of 4-bit integers sorted using R ADIX - SORT
Radix-Sort In Action List of 4-bit integers sorted using R ADIX - SORT
Radix-Sort In Action List of 4-bit integers sorted using R ADIX - SORT
Radix-Sort Algorithm radixSort( Sequence C) // Works from least to most significant value for bit = 0 to 30 C = bucketSort(C, bit) // Sort C using the specified bit endfor return C What is big-Oh complexity for Radix-Sort? Call in loop uses each element twice Loop repeats once per digit to complete sort
Radix-Sort Algorithm radixSort( Sequence C) // Works from least to most significant value for bit = 0 to 30 C = bucketSort(C, bit) // Sort C using the specified bit endfor return C What is big-Oh complexity for Radix-Sort? Call in loop uses each element twice O(n) Loop repeats once per digit to complete sort * O(1) O(n)
Radix-Sort
For Next Lecture Review requirements for program #2 1 st Preliminary deadline is Monday Spend time working on this: design saves coding Reading on Graph ADT for Wednesday Note: these have nothing to do with bar charts What are mathematical graphs? Why are they the basis of everything in CS?