Copyright © Curt Hill Sorting Ordering an array
Copyright © Curt Hill Considered Topics Simple sort schemes –Algorithms with some code More complicated sort schemes Performance considerations –Time to sort in terms of the number of records N –How the number of compares and moves relate to the size of N
Copyright © Curt Hill Selection Sort Basic idea Scan the entire array Find the smallest element Move to the top Remove the top from further consideration Repeat until entire array is sorted
Copyright © Curt Hill How it works Top element Least element Sorted part of array Unsorted part of array
Copyright © Curt Hill Code void sort(int ar[], int size){ int temp; for(int i=0;i<size-1;i++){ temp = i; for (int j=i+1;j<size;j++) if(ar[temp]>ar[j]) temp = j; if(temp!=i) { int val =ar[i]; ar[i] = ar[temp]; ar[temp] = val; } //swap } // outer for }
Copyright © Curt Hill Best and Worst Cases This is an unusual algorithm in that the best and worst case are almost the same Best case –Already sorted –No moves are needed –All the compares are still done Worst case –Inversely sorted –Same number of compares –N-1 moves
Copyright © Curt Hill How it performs The first element is compared with all the other elements –N-1 compares The second element is compared with remaining –N-2 compares Compares: (N-1)+(N-2)+…1 (N-1) 2 /2 Moves N-1
Copyright © Curt Hill Comparing running times Mostly we are not concerned with many of the little issues of this analysis –It is N-1 instead of N –There is a factor of N 2 /2 instead of N 2 When we have two different factors we always take the most expensive –N 2 compares instead of N moves Thus selection sort is O(N 2 )
Copyright © Curt Hill Common Os Constant time O(c) or O(1) –Hashing is constant time Logarithmic time O(log 2 N) –Binary and tree searches Linear time O(N) –File scans, bad searches N log N, O(N log 2 N) – no other name –Good sorts N Squared O(N 2 ) –Bad sorts Polynomial O(N X ) –Expensive but doable Exponential O(e N ) –Intractable
Copyright © Curt Hill Bubble Sort Basic idea Start at top Compare adjacent elements Exchange if out of order Repeat until a pass has no exchanges
Copyright © Curt Hill First Pass Small items bubble up slowly –One element per pass Large items sink quickly –Keep descending until they find a larger item or hit bottom
Copyright © Curt Hill Code void sort (int ar[], int size){ bool swapped; do { swapped = false; for(int j = 0;j<size-1;j++) if(ar[j] < ar[j+1]){ int temp = ar[j]; ar[j] = ar[j+1]; ar[j+1] = temp; swapped = true; } // if } // do while(swapped); }
Copyright © Curt Hill How it performs Bubble sort makes many moves but always a short distance It also does many redundant compares O(N 2 ) Big O notation makes this comparable with selection –Usually much worse –Have to be creative to make a worse sort
Copyright © Curt Hill Best and Worst Cases Best case –Already sorted –One pass through does no exchanges and quits Worst case –Inversely sorted –The smallest only moves up one –N-1 passes –The case of all elements sorted except first element is in last slot is almost as bad
Copyright © Curt Hill Bubble Again Consider two symmetric cases, sorted with one exception: largest or smallest as far away as possible –One takes two passes the other N-1 The problem is the direction of the scan –Items going in that direction move fast –Items going other direction slowly This suggests a fix
Copyright © Curt Hill Shaker Sort Basic idea is same as bubble sort Scan top to bottom in odd passes Scan bottom to top in even passes
Copyright © Curt Hill First and Second Passes First pass goes top to bottom Second pass goes bottom to top
Copyright © Curt Hill How it performs Insignificantly different The worst case occurs very infrequently The extra work to handle them complicates every run O(N 2 )
Copyright © Curt Hill The Previous Problems The problem with both of these is the short distance things are moved They usually move in the right direction but seldom far enough One fix is to compare non-adjacent elements How?
Copyright © Curt Hill Shell Sort Start with a gap g, where 1 g N Do a sort pass comparing elements separated by the gap and exchanging if needed Decrease the gap in each pass –Do not divide size by 2 When the gap is one it is a bubble sort but most of the large distance moving has been done
Copyright © Curt Hill First Pass First: 8 and 1 exchanged Third: 14 and 2 exchanged Fourth: 6 and 14 exchanged Gap =
Copyright © Curt Hill How it performs The analysis is extremely difficult Empirically the O(N 1.25 ) This makes it better for any all but insignificant table size than bubble or selection The break even point between O(N 1.25 ) and O(N log 2 N) is size=65000, however the constant factor on Shell is large so the break even point is much smaller Still inferior to the N log N sorts for large tables
Copyright © Curt Hill Insertion Sort Partition the array into two pieces The first one and all the rest The first part of the array is already sorted Remove the first unsorted item Insert into the correct location of the sorted part
Copyright © Curt Hill How it works Sorted part of array Unsorted part of array Remove Insert
Copyright © Curt Hill How it performs Best case is sorted Worst case is inversely sorted Yet another N 2 Moves N-1
Copyright © Curt Hill Merge Sort Merge increasingly larger sorted runs into a single much larger run Start with runs of 1 Merge two runs into a temporary area Copy it back
Copyright © Curt Hill Pass Start: Each element is a run of Run 1 Run 3 Run 2 Run 4 End of pass 1: runs of 2 Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7
Copyright © Curt Hill Pass Runs of Run 1 Run 2 Runs of 4 Run 1 Run 2 Run 3 Run
Copyright © Curt Hill Important points on Merge An item in a run can never be compared with any other element in the same run It generalizes to files nicely Requires extra copy space equal to the size of longest run In last pass that is entire array First of O(N log 2 N) sorts The insertion process generates many moves
Copyright © Curt Hill Quick Sort A complicated but very fast sort –Usually the best of the in memory sorts Never compares two items twice Always moves things in the right direction Usually moves them a relatively long distance
Copyright © Curt Hill Algorithm I The first item is called the pivot –It will be the middle element From the top look for an item that is larger From the bottom look for an item that is smaller The two items are respectively in the wrong “half” of the table –Recall the pivot will be the middle item Exchange the two
Copyright © Curt Hill Algorithm II When searches collide move the pivot there Now have three partitions: –Lower – sort it by itself –Pivot – nothing more needs to be done –Higher – sort it by itself
Copyright © Curt Hill Quick Sort Start, pivot is 8 start looking st exch 2 nd exch Pivot exch Done found Three partitions
Copyright © Curt Hill Performance A pair of distinct keys are never compared twice The trick is partitioning the array into two separate pieces that never interact again (½ N) 2 + (½ N) 2 < N 2 –20 2 =400 – = 200 O(N log 2 N)
Copyright © Curt Hill More on Performance A happy accident is that the pivot may be placed in a CPU register –It is the only value compared to the entire array –This makes it free and quick to access Notice the recursive nature of this algorithm –The array is partitioned into two pieces –These are different sizes and the sort is recursively invoked on them
Copyright © Curt Hill Best and Worst Case It does better on unsorted file than sorted –Counter-intuitive The worst case is the sorted or inversely sorted file –The chosen partition divides the table into two, not three, partitions –N 2 In this case
Copyright © Curt Hill Improvements 1 The worst case makes one think about choosing a different pivot Any searching for a pivot will slow the average process with a search The case of a sorted array to be sorted is extremely unlikely –For 10 elements –2 chances in for it to be already sorted
Copyright © Curt Hill Improvements 2 The partitioning scheme is complicated enough that it does worse than simple sorts in very small arrays: 6-12 entries –Recursion to sort an table of length 3 is wasteful in memory and CPU cycles The only real improvement is to use a simpler sort when the partition size gets small –If the partition is small just use a simple N 2 sort
Two more thoughts Virtual memory can disrupt sorting when pieces of the array are paged out –True for any sort –If possible fix the pages Quick sort could use threads –Spawn a thread for one of the partitions if it were of sufficient size –Would need to be large to make thread overhead worth while Copyright © Curt Hill
Heap Sort Builds a binary tree in the array The positions of the left and right sub-trees are implicit rather than needing pointers Also O(N log 2 N) sort Rather complicated Will not be shown
Copyright © Curt Hill Heap Sort Performance Slowest of the O(N log 2 N) sorts Advantages: –Does not need recursion of quicksort –Does not need extra space of mergesort –Worst case is still O(N log 2 N) unlike other two
Summary Several sorts with varying performance: –N 2 : Selection, Bubble, Shaker, Insertion –N 1.25 : Shell –N log 2 N: Merge, Quick, Heap Copyright © Curt Hill