Algorithms Sorting – Part 3
Sorting Insertion sort Bubble Sort Design approach: Sorts in place: Best case: Worst case: Bubble Sort Running time: incremental Yes (n) (n2) incremental Yes (n2)
Sorting Selection sort Merge Sort Design approach: Sorts in place: Running time: Merge Sort incremental Yes (n2) divide and conquer No Let’s see!!
Algorithm 1: Selection Sort procedure selectionSort(Array X of size n): for i = 1 to n { let minIndex = i; for j = i+1 to n { if X[j] < X[minIndex] minIndex = j } X[i] <--> X[minIndex] (swap in place) return X
SelectionSort Simulation 10 2 37 5 9 55 20 minElement: 2 minIndex: 2
SelectionSort Simulation 2 10 37 5 9 55 20 minElement: 5 minIndex: 4
SelectionSort Simulation 2 5 37 10 9 55 20 minElement: 9 minIndex: 5
SelectionSort Simulation 2 5 9 10 37 55 20 minElement: 10 minIndex: 4
SelectionSort Simulation 2 5 9 10 37 55 20 minElement: 20 minIndex: 7
SelectionSort Simulation 2 5 9 10 20 55 37 minElement: 37 minIndex: 7
SelectionSort Simulation 2 5 9 10 20 37 55 Final Output
Analysis of Selection Sort Q: How much time (# operations) does SelectionSort take? procedure selectionSort(Array X of size n): for i = 1 to n { let minIndex = i; for j = i+1 to n { if X[j] < X[minIndex] minIndex = j } X[i] <--> X[minIndex] return X 1 Op 1 Op 1 Op 1 Op 3 Ops 1 Op 1 Op
**SelectionSort takes (3n2 + 3n)/2 time on an input of size n.** Analysis of Selection Sort Inner Loop: 3(n-1) + 3(n-2) + … + 3 Outer Loop: n Initial Assignment: n Swap: n Total: **SelectionSort takes (3n2 + 3n)/2 time on an input of size n.**
Divide-and-Conquer Divide the problem into a number of sub-problems Similar sub-problems of smaller size Conquer the sub-problems Solve the sub-problems recursively Sub-problem size small enough solve the problems in straightforward manner Combine the solutions of the sub-problems Obtain the solution for the original problem
Merge Sort Approach To sort an array A[p . . r]: Divide Conquer Divide the n-element sequence to be sorted into two subsequences of n/2 elements each Conquer Sort the subsequences recursively using merge sort When the size of the sequences is 1 there is nothing more to do Combine Merge the two sorted subsequences
Merge Sort The merge sort algorithm is defined recursively: If the list is of size 1, it is sorted—we are done; Otherwise: Divide an unsorted list into two sub-lists, Sort each sub-list recursively using merge sort, and Merge the two sorted sub-lists into a single sorted list This is the first significant divide-and-conquer algorithm we will see Question: How quickly can we recombine the two sub-lists into a single sorted list?
Merge Sort Alg.: MERGE-SORT(A, p, r) if p < r Check for base case q r Alg.: MERGE-SORT(A, p, r) if p < r Check for base case then q ← (p + r)/2 Divide MERGE-SORT(A, p, q) Conquer MERGE-SORT(A, q + 1, r) Conquer MERGE(A, p, q, r) Combine Initial call: MERGE-SORT(A, 1, n) 1 2 3 4 5 6 7 8 5 2 4 7 1 3 2 6
Example – n Power of 2 Divide 7 5 5 7 5 4 7 q = 4 1 2 3 4 5 6 7 8 1 2
Example – n Power of 2 Conquer and Merge 7 5 5 7 5 4 7 1 2 3 4 5 6 7 8
Will simulate how the third step is done. MergeSort (Divide & Conquer) Assume n is power of 2. (Doesn’t really matter) procedure mergeSort(Array X of size n): 1. mergeSort(left subarray X[1,…,n/2]) 2. mergeSort(right subarray X[n/2+1,…,n]) 3. combine the left & right sorted halves Will simulate how the third step is done.
Example – n Not a Power of 2 6 2 5 3 7 4 1 8 9 10 11 q = 6 Divide 4 1 6 2 7 3 5 8 9 10 11 q = 9 q = 3 2 7 4 1 3 6 5 8 9 10 11 7 4 1 2 3 6 5 8 9 10 11 4 1 7 2 6 5 3 8
Example – n Not a Power of 2 7 6 5 4 3 2 1 8 9 10 11 Conquer and Merge 7 6 4 2 1 3 5 8 9 10 11 7 4 2 1 3 6 5 8 9 10 11 2 3 4 6 5 9 10 11 1 7 8 7 4 1 2 6 5 3 8
MergeSort Downward-Recursive Steps 105 7 13 8 14 1 19 11 4 10 98 16 31 5 21 12 105 7 13 8 14 1 19 11 4 10 98 16 31 5 21 12 105 7 13 8 14 1 19 11 4 10 98 16 31 5 21 12 105 7 13 8 14 1 19 11 4 10 98 16 31 5 21 12 105 7 13 8 14 1 19 11 4 10 98 16 31 5 21 12
MergeSort Upward-Recursive Steps 1 4 5 7 8 10 11 12 13 14 19 21 31 96 98 105 1 7 8 11 13 14 19 105 4 5 10 12 21 31 96 98 7 8 13 105 1 11 14 19 4 10 96 98 5 12 21 31 7 105 8 13 1 14 11 19 4 10 16 98 5 31 12 21 105 7 13 8 14 1 19 11 4 10 98 16 31 5 21 12
Merging Input: Array A and indices p, q, r such that p ≤ q < r 1 2 3 4 5 6 7 8 p r q Input: Array A and indices p, q, r such that p ≤ q < r Subarrays A[p . . q] and A[q + 1 . . r] are sorted Output: One single sorted subarray A[p . . r]
Merging Idea for merging: Two piles of sorted cards 1 2 3 4 5 6 7 8 p r q Idea for merging: Two piles of sorted cards Choose the smaller of the two top cards Remove it and place it in the output pile Repeat the process until one pile is empty Take the remaining input pile and place it face-down onto the output pile A1 A[p, q] A[p, r] A2 A[q+1, r]
Pseudocode for Merge Subroutine procedure merge(sorted lists L,R of size m/2): Out = empty array of size m i = 1; j = 1; for k = 1 to m: if L[i] < R[j]: Out[k] = L[i]; i++; else: Out[k] = R[j]; j++;
Example: MERGE(A, 9, 12, 16) p r q
Example: MERGE(A, 9, 12, 16)
Example (cont.)
Example (cont.)
Example (cont.) Done!
**MergeSort takes 7nlog2(n) time on an input of size .** Analysis of MergeSort msort(n) ≤ 7n msort(n/2) msort(n/2) ≤ 2(7n/2) = 7n ≤ 4(7n/4) = 7n msort(n/4) msort(n/4) msort(n/4) msort(n/4) …. …. …. …. …. ≤n(7) = 7n msort(1) msort(1) msort(1) msort(1) Total = 7n*#levels Total = 7n*log2(n) **MergeSort takes 7nlog2(n) time on an input of size .**
Merge - Pseudocode Alg.: MERGE(A, p, q, r) Compute n1 and n2 3 4 5 6 7 8 p r q n1 n2 Alg.: MERGE(A, p, q, r) Compute n1 and n2 Copy the first n1 elements into L[1 . . n1 + 1] and the next n2 elements into R[1 . . n2 + 1] L[n1 + 1] ← ; R[n2 + 1] ← i ← 1; j ← 1 for k ← p to r do if L[ i ] ≤ R[ j ] then A[k] ← L[ i ] i ←i + 1 else A[k] ← R[ j ] j ← j + 1 p q 7 5 4 2 6 3 1 r q + 1 L R
Running Time of Merge (assume last for loop) Initialization (copying into temporary arrays): (n1 + n2) = (n) Adding the elements to the final array: - n iterations, each taking constant time (n) Total time for Merge: (n)
Analyzing Divide-and Conquer Algorithms The recurrence is based on the three steps of the paradigm: T(n) – running time on a problem of size n Divide the problem into a subproblems, each of size n/b: takes D(n) Conquer (solve) the subproblems aT(n/b) Combine the solutions C(n) (1) if n ≤ c T(n) = aT(n/b) + D(n) + C(n) otherwise
MERGE-SORT Running Time Divide: compute q as the average of p and r: D(n) = (1) Conquer: recursively solve 2 subproblems, each of size n/2 2T (n/2) Combine: MERGE on an n-element subarray takes (n) time C(n) = (n) (1) if n =1 T(n) = 2T(n/2) + (n) if n > 1
Sorting Challenge 1 Problem: Sort a file of huge records with tiny keys Example application: Reorganize your MP-3 files Which method to use? merge sort, guaranteed to run in time NlgN selection sort bubble sort a custom algorithm for huge records/tiny keys insertion sort
Sorting Files with Huge Records and Small Keys Insertion sort or bubble sort? NO, too many exchanges Selection sort? YES, it takes linear time for exchanges Merge sort or custom method? Probably not: selection sort simpler, does less swaps
Sorting Challenge 2 Problem: Sort a huge randomly-ordered file of small records Application: Process transaction record for a phone company Which sorting method to use? Bubble sort Selection sort Mergesort guaranteed to run in time NlgN Insertion sort
Sorting Huge, Randomly - Ordered Files Selection sort? NO, always takes quadratic time Bubble sort? NO, quadratic time for randomly-ordered keys Insertion sort? Mergesort? YES, it is designed for this problem
Sorting Challenge 3 Problem: sort a file that is already almost in order Applications: Re-sort a huge database after a few changes Doublecheck that someone else sorted a file Which sorting method to use? Mergesort, guaranteed to run in time NlgN Selection sort Bubble sort A custom algorithm for almost in-order files Insertion sort
Sorting Files That are Almost in Order Selection sort? NO, always takes quadratic time Bubble sort? NO, bad for some definitions of “almost in order” Ex: B C D E F G H I J K L M N O P Q R S T U V W X Y Z A Insertion sort? YES, takes linear time for most definitions of “almost in order” Mergesort or custom method? Probably not: insertion sort simpler and faster