Sorting and Runtime Complexity CS255
Sorting Different ways to sort: –Bubble –Exchange –Insertion –Merge –Quick –more…
Bubble Sort Compare all pairs of adjacent elements from front to back –Swap a pair if out of order This completes 1 bubble pass –Bubbles the biggest element to the end N bubble passes are needed
Bubble Sort – bubble pass
Exchange Sort Compare 1 st element to each other element –Swap a pair if out of order This completes one pass –Places the smallest element in the 1 st position Repeat for the 2 nd, 3 rd, 4 th, etc elements
Exchange Sort – single pass
Insertion Sort Build a new list in sorted order –The previous sorts sorted “in place” Start with an empty new list Pull each element from the original list, one at a time and insert into the new list in the correct position –Compare the element to each in the new list from front to back until we reach the correct position
Insertion Sort
Comparison of running times How fast is each of the 3 sorts? We would like our analysis to be independent of the type of machine –So we can’t just use milliseconds –We need something else to count For sorting we will count the number of comparisons
Running times So how many comparisons does each sort make? –Does it depend on the number of elements in the Vector? –Does it depend on the specific values in the Vector?
Bubble’s running time for (int i=0; i<v.size(); i++) { for (int j=0; j<v.size()-1; j++) { Integer a = (Integer)v.get(j); Integer b = (Integer)v.get(j+1); if (a.compareTo(b) > 0) { v.set(j, b); v.set(j+1, a); }
Is this any different? for (int i=0; i<v.size(); i++) { for (int j=0; j<v.size()–i-1; j++) { Integer a = (Integer)v.get(j); Integer b = (Integer)v.get(j+1); if (a.compareTo(b) > 0) { v.set(j, b); v.set(j+1, a); }
Insertion Sort running time Vector sortedV = new Vector(); for (int i=0; i<v.size(); i++) { Integer a = (Integer)v.get(i); int j=0; while ((j<v.size()) && (((Integer)v.get(j)).compareTo(a) < 0) { j++; } sortedV.add(j, a); }
Merge Sort Break the list into 2 equal pieces Sort each piece Merge the results together
Breaking Apart Vector left = new Vector(); for (int i=0; i<v.size()/2; i++) { left.add(v.get(i)); } Vector right = new Vector(); for (int i=v.size()/2; i<v.size(); i++) { right.add(v.get(i)); }
Breaking Apart – a better way Vector left = new Vector(); left.addAll(v.subList(0, v.size()/2)); Vector right = new Vector(); right.addAll(v.subList(v.size()/2, v.size());
Merging Only need to compare the first items from each list Put the smaller of the two in the new merged list Repeat until one of the lists is empty –Transfer all the remaining items from the other list into the merged list Write the code in class
Sorting Vector sortedLeft = mergeSort(left); Vector sortedRight = mergeSort(right); This is an example of “recursion” – a function defined in terms of itself
Merge sort recursion public Vector mergeSort(Vector v) { Vector sortedVector = null; if (v.size() == 1) { // Base case sortedVector = new Vector(v); } else { // Recursive case // 1. Break apart into left and right // 2. Recurse sortedLeft = mergeSort(left); sortedRight = mergeSort(right); // 3. Merge sortedLeft and sortedRight } return sortedVector; }
Quicksort Break the list into 2 pieces based on a pivot –The pivot is usually the first item in the list –All items smaller than the pivot go in the left and all items larger go in the right Sort each piece (recursion again) Combine the results together
Running times In trying to find the running times of mergesort and quicksort we need to determine where the “work” takes place –“Work” is the number of comparisons Where does the work occur for merge? Where does the work occur for quick?
MergeSort Running Time The work occurs when the merge occurs How much work is done when merging 2 vectors of length L?
MergeSort Running Time So we have N amount of work at each level If we know how many levels a full merge sort has we can compute the running time as: –O ( N * number of levels ) So how many levels are in a full merge sort?
MergeSort Running Time Does this big-O running time change depending on the input or is it always the same (best, worst, and average times are all the same)?
QuickSort Running Time Lets try and take the same approach as we did with MergeSort…. The work is done on the split How much work is done to split a vector of length L?
QuickSort Running Time So we have a problem just counting the work per level and the number of levels. First, the amount of work per level varies –Even in this single example Second, the number of levels can vary So, does the amount of work change based on the input?
QuickSort Running Time What is the best case? –What is the best case’s running time? What is the worst case? –What is the worst case’s running time?
Computational Complexity So, we have seen 5 different sorts –What is the best running time of any sort possible (not just the 5 we have seen)? To answer this question we need to prove that any sort would have to do at least some minimal amount of work (lower bound) –To do so we will use decision trees (ch 7.8)
// Sorts an array of 3 items void sortthree(int s[]) { a=s[1]; b=s[2]; c=s[3]; if (a < b) { if (b < c) { S = a,b,c; } else { if (a < c) { S = a,c,b; } else { S = c,a,b; }} } else if (b < c) { if (a < c) { S = b,a,c; } else { S = b,c,a; }} else { S = c,b,a; } a < b b < c a,b,ca < c a,c,bc,a,b a < cc,b,a b,a,cb,c,a
Decision Trees A decision tree can be created for every comparison-based sorting algorithm –The following is a decision tree for a 3 element Exchange sort Note that “c < b” means that the Exchange sort compares the array item whose current value is c with the one whose current value is b – not that it compares s[3] to s[2].
b < a c < a c < b b < ac < a c,b,ab,c,a a < bc < b b,a,c c,a,b a,c,ba,b,c
Decision Trees So what does this tell us… –Note that there are 6 leaves in each of the examples given (each N=3) In general there will be N! leaves in a decision tree corresponding to the N! permutations of the array –The number of comparisons (“work”) is equal to the depth of the tree (from root to leaf) Worst case behavior is the path from the root to the deepest leaf
Decision Trees Thus, to get a lower bound on the worst case behavior we need to find the shortest tree possible that can still hold N! leaves –No comparison-based sort could do better A tree of depth d can hold 2 d leaves –So, what is the minimal d where 2 d >= N! Solving for d we get d >= log 2 (N!) –The minimal depth must be at least log 2 (N!)
Decision Trees According to Lemma 7.4 (p. 291): log 2 (N!) >= n log 2 (n) – 1.45n Putting that together with the previous result d must be at least as great as (n log 2 (n) – 1.45n) Applying Big-O d must be at least O(n log 2 (n)) No comparison-based sorting algorithm can have a running time better than O(n log 2 (n))
Radix Sort Note that the previous theory applies only to comparison-based sorts –That is sorts based on comparison of keys –We know nothing about the keys except that they are ordered However, if we have additional information about the keys we can do even better
Radix Sort Running Time Because there are no comparisons we need to find something else to count as “work” –Moving an element into its correct sub-pile is a good thing to count (think of it as a 10-way comparison) At the top level we need to move N items into their sub-piles At the next level we need to move N items into their sub-piles Etc.
Radix Sort Running Time And there are as many levels as digits So the complete run time is: O(N D) Where N is the number of elements And D is the number of digits of each element Note that D is usually small relative to N implying that usually this reduces to O(N) But if D is large (about the size of N) then this can also turn into O(N 2 ) or higher
Radix Sort Running Time A radix sort can also be used for sorting Strings since we have buckets of length 26 rather than 10