Copyright (C) Gal Kaminka Data Structures and Algorithms Sorting II: Divide and Conquer Sorting Gal A. Kaminka Computer Science Department
2 Last week: in-place sorting Bubble Sort – O(n 2 ) comparisons O(n) best case comparisons, O(n 2 ) exchanges Selection Sort - O(n 2 ) comparisons O(n 2 ) best case comparisons O(n) exchanges (always) Insertion Sort – O(n 2 ) comparisons O(n) best case comparisons Fewer exchanges than bubble sort Best in practice for small lists (<30)
3 This week Mergesort O(n log n) always O(n) storage Quick sort O(n log n) average, O(n^2) worst Good in practice (>30), O(log n) storage
4 MergeSort A divide-and-conquer technique Each unsorted collection is split into 2 Then again Then again ……. Until we have collections of size 1 Now we merge sorted collections Then again Then again Until we merge the two halves
5 MergeSort(array a, indexes low, high) 1. If (low < high) 2. middle (low + high)/2 3. MergeSort(a,low,middle) // split 1 4. MergeSort(a,middle+1,high) // split 2 5. Merge(a,low,middle,high) // merge 1+2
6 Merge(arrays a, index low, mid, high) 1. b empty array, t mid+1, i low, tl low 2. while (tl<=mid AND t<=high) 3. if (a[tl]<=a[t]) 4. b[i] a[tl] 5. i i+1, tl tl+1 6. else 7. b[i] a[t] 8. i i+1, t t+1 9. if tl<=mid copy a[tl…mid] into b[i…] 10. else if t<=high copy a[t…high] into b[i…] 11. copy b[low…high] onto a[low…high]
7 An example Initial: Split: Merge: Merge: Merge:
8 The complexity of MergeSort Every split, we half the collection How many times can this be done? We are looking for x, where 2 x = n x = log 2 n So there are a total of log n splits
9 The complexity of MergeSort Each merge is of what run-time? First merge step: n/2 merges of 2 n Second merge step: n/4 merges of 4 n Third merge step: n/8 merges of 8 n …. How many merge steps? Same as splits log n Total: n log n steps
10 Storage complexity of MergeSort Every merge, we need to hold the merged array:
11 Storage complexity of MergeSort So we need temporary storage for merging Which is the same size as the two collections together To merge the last two sub-arrays (each size n/2) We need n/2+n/2 = n temporary storage Total: O(n) storage
12 MergeSort summary O(n log n) runtime (best and worst) O(n) storage (not in-place) Very naturally done using recursion But note can be done without recursion! In practice: Can be improved by combining with insertion sort Split down to arrays of size 20-30, then insert-sort Then merge
13 QuickSort Key idea: Select a item (called the pivot) Put it into its proper FINAL position Make sure: All greater item are on one side (side 1) All smaller item are on other side (side 2) Repeat for side 1 Repeat for side 2
14 Short example Let’s select 25 as our initial pivot. We move items such that: All left of 25 are smaller All right of 25 are larger As a result 25 is now in its final position
15 Now, repeat (recursively) for left and right sides Sort 12 Sort needs no sorting For the other side, we repeat the process Select a pivot item (let’s take 57) Move items around such that left items are smaller, etc.
Changes into And now we repeat the process for left And for the right
17 QuickSort(array a; index low, hi) 1. if (low >= hi) 2. return ; // a[low..hi] is sorted 3. pivot find_pivot(a,low,hi) 4. p_index=partition(a,low,high,pivot) 5. QuickSort(a,low,p_index-1) 6. QuickSort(a,p_index+1,hi)
18 Key questions How do we select an item ( FindPivot ())? If we always select the largest item as the pivot Then this process becomes Selection Sort Which is O(n 2 ) So this works only if we select items “in the middle” Since then we will have log n divisions How do we move items around efficiently ( Partition ()?) This offsets the benefit of partitioning
19 FindPivot To find a real median (middle item) takes O(n) In practice however, we want this to be O(1) So we approximate: Take the first item (a[low]) as the pivot Take the median of {a[low],a[hi],a[(low+hi)/2]} FindPivot(array a; index low, high) 1. return a[low]
20 Partition (in O(n)) Key idea: Keep two indexes into the array up points at lowest item >= pivot down points at highest item <= pivot We move up, down in the array Whenever they point inconsistently, interchange At end: up and down meet in location of pivot
21 partition(array a; index low,hi ; pivot; index pivot_i) 1. down low, up hi 2. while(down<up) 3. while (a[down]<=pivot && down<hi) 4. down down while (a[hi]>pivot) 6. up up – 1 7. if (down < up) 8. swap(a[down],a[up]) 9. a[pivot_i]=a[up] 10. a[up] = pivot 11. return up
22 Example: partition() with pivot=25 First pass through loop on line 2: down up
23 Example: partition() with pivot=25 First pass through loop on line 2: down up We go into loop in line 3 (while a[down]<=pivot)
24 Example: partition() with pivot=25 First pass through loop on line 2: down up We go into loop in line 5 (while a[up]>pivot)
25 Example: partition() with pivot=25 First pass through loop on line 2: down up We go into loop in line 5 (while a[up]>pivot)
26 Example: partition() with pivot=25 First pass through loop on line 2: down up Now we found an inconsistency!
27 Example: partition() with pivot=25 First pass through loop on line 2: down up So we swap a[down] with a[up]
28 Example: partition() with pivot=25 Second pass through loop on line 2: down up
29 Example: partition() with pivot=25 Second pass through loop on line 2: down up Move down again (increasing) – loop on line 3
30 Example: partition() with pivot=25 Second pass through loop on line 2: down up Now we begin to move up again – loop on line 5
31 Example: partition() with pivot=25 Second pass through loop on line 2: down up Again – loop on line 5
32 Example: partition() with pivot=25 Second pass through loop on line 2: down up down < up? No. So we don’t swap.
33 Example: partition() with pivot=25 Second pass through loop on line 2: down up Instead, we are done. Just put pivot in place.
34 Example: partition() with pivot=25 Second pass through loop on line 2: down up Instead, we are done. Just put pivot in place. (swap it with a[up] – for us a[low] was the pivot)
35 Example: partition() with pivot=25 Second pass through loop on line 2: down up Now we return 2 as the new pivot index
36 Notes We need the initial pivot_index in partition() For instance, change FindPivot(): return pivot (a[low]), as well as initial pivot_index (low) Then use pivot_index in the final swap QuickSort: Average O(n log n), Worst case O(n 2 ) works very well in practice (collections >30) Average O(n log n), Worst case O(n 2 ) Space requirements O(log n) – for recursion