Insertion Sort Merge Sort QuickSort
Sorting Problem Definition an algorithm that puts elements of a list in a certain order Numerical order and Lexicographical order Input Sequence of elements Output A permutation of the input, such that a’1 a’2 … a’n 2
Sorting Problem Two general types of Sorting Internal Sorting, in which all elements are sorted in main memory External Sorting, in which elements must be sorted on disk or tape The focus here is on internal sorting techniques 3
Sorting Algorithms Example: Input: Output: General sorting algorithms: Bubble sort, Insertion sort, Merge sort, Quicksort, Selection sort 4
Sorting Algorithms Computational Complexity Computational Complexity of element comparisons in terms of the size of the list (N) Recursion recursive or non recursive algorithms Stability stable sorting algorithms maintain the relative order of records with equal keys 5
Insertion Sort Insertion Sort works by taking elements from the list one by one and inserting them in their correct position into a new sorted list. Insertion is expensive, requiring shifting all following elements over by one Efficient for small lists and mostly-sorted lists Commonly used by human when playing card games 6
Insertion Sort Algorithm we have a n-element sequence and p elements (defined when sorting starts) in the correct order Steps: 1) Initially p = 1. 2) Let the first p elements be sorted. 3) Insert the (p+1)th element properly in the list so that now p+1 elements are sorted. 4) increment p and go to step (2) 7
Example Original824936Position Moved After p= After p= After p= After p= After p=
Implementation for( int p = 1; p < a.size(); p++ ) { Comparable tmp = a[ p ]; for( int j = p; j > 0 && tmp < a[ j - 1 ] ; j-- ) a[ j ] = a[ j - 1 ]; a[ j ] = tmp; } 9 Sorted partial result Unsorted data ≤X>XX... Sorted partial result Unsorted data ≤XX>X...
Running time Analysis (Best Case) The input is presorted When inserting a[p] into the sorted a[0..p-1], only need to compare a[p] with a[p-1] and there is no data movement. the test in the inner for loop always fails immediately linear running time O(n) if the input is almost sorted, insertion sort will run quickly 10
Running time Analysis (Worst Case) Because of the nested loops (including outer and inner for- loops), and each loop can take at most N iterations, insertion sort runs in O(N 2 ) time. Consider an input in a reverse sorted list When a[p] is inserted into the sorted a[0..p-1], we need to compare a[p] with all elements in a[0..p-1] and move each element one position to the right (p) steps. Inner loop is executed p times, for each p=1..(N-1). The total number of steps is at least (N(N-1)/2) = (N 2 ). The bound is tight (N 2 ), which is the time complexity for the worst case inputs (worst case time complexity). Space requirement is O(N). 11
Merge Sort and Divide-and-Conquer a classic divide-and-conquer strategy Divide the problem into a number of sub- problems (similar to the original problem but smaller) Conquer the sub-problems by solving them recursively (if a sub-problem is small enough, just solve it in a straightforward manner Combine the solutions to the sub-problems into the solution for the original problem 12
Divide-and-Conquer issues Recursion Divide-and-conquer algorithms are naturally implemented as recursive procedures Explicit stack Divide and conquer algorithms can also be implemented by a non-recursive program that stores the partial sub- problems in some explicit data structure, such as a stack, queue, or priority queue Sharing repeated sub-problems the branched recursion may end up evaluating the same sub-problem many times over. It may be worth identifying and saving the solutions to overlapping sub-problems 13
Merge Sort Algorithm 14 Step 1 Divide the list into two smaller lists of about equal sizes Step 2 Sort each smaller list recursively Step 3 Merge the two sorted lists to get one sorted list Note: during the recursion, if the subsequence has only one element, then do nothing
Dividing Phase Input list is an array A[0..N-1]. Dividing the array takes O(1) time. We can represent a sublist by two integers left and right : to divide A[left..Right], we compute center=(left+right)/2 and obtain A[left..Center] and A[center+1..Right]. void mergesort(vector & A, int left, int right) { if ( left < right ) { int center = ( left + right ) / 2; mergesort( A, left, center); mergesort( A, center+1, right); merge( A, left, center+1, right); } 15
Example (dividing phase)
Example (Merge) recursively merge the two sorted halves together
Merge Input: two sorted arrays A and B Output: a sorted array C three counters: aptr, bptr, and cptr initially set to the beginning of their respective arrays The smaller of a[aptr] and b[bptr] is copied to the next entry in c appropriate counters are advanced array Aarray B array C aptr bptr cptr
Merge(Cont.) array Aarray Barray C aptr bptr cptr aptr bptr cptr aptr bptr cptr
Merge(Cont.) 20 array Aarray B array C aptr bptr cptr aptr bptr cptr aptr bptr cptr
Merge(Cont.) input list of A is exhausted, the remainder of the list B is copied to c The time to merge two sorted lists is linear, O(size(A)+size(B)) 21 array Aarray B array C aptr bptr cptr aptr bptr cptr
Implementation(Merge) 22
Analyzing Merge Sort Let T(N) denote the worst-case running time of merge sort to sort N numbers. T(n) Merge-Sort A[1.. n] Divide step O(1)1. if n = 1, done; Otherwise, p = n/2 divide into A[ 1.. p ] and A[ p+1.. n ] Conquer step 2T(n/2)2. Recursively sort A[ 1.. p ] and A[ p+1.. n ] Combine step O(n)3. “Merge” the 2 sorted lists 23
Recurrence for T(n) T(1) = 1 for n >1, we have T(n) = 2T(n/2) + n How to calculate T(n)? Solving the recurrence 24
Recursion Tree Solve T(n) = 2T(n/2) + n, assume n = 2 k n n/2 n/2 n n/4 n/4 n/4 n/4 n T(1) = 1 … … … … n Total = O(n logn) 25 #leaves = n h = log n
Comparison O(nlogn) grows more slowly than O(n 2 ) merge sort asymptotically beats insertion sort in the worst case. 26
Experiment Code from textbook (using template) Unix/Linux time utility 27
Disadvantage of Merge Sort Space requirement merging two sorted lists requires linear extra memory additional work to copy to the temporary array and back hardly ever used for main memory sorts merging routine is the cornerstone of most external sorting algorithms 28
QuickSort QuickSort, like merge sort, is based on the divide-and-conquer recursive algorithm Fastest known sorting algorithm in practice Best case running time: O(N log N) Worst case running time: O(N 2 ) But, the worst case seldom happens
QuickSort Divide step: Pick an element (pivot) v in S Partition S – {v} into two groups < S1 = {x S – {v} | x < v} S2 = {x S – {v} | x v} Question: Is the pivot element already in its FINAL POSITION after this step? Conquer step: recursively sort S1 and S2 Combine step: the sorted S1 (by the time returned from recursion), followed by v, followed by the sorted S2 (i.e., nothing extra needs to be done) v v S1 S2 S
Pseudocode Input: an array A[p, r] Quicksort (A, p, r) { Divide if (p < r) { q = Partition (A,p,r) //q is the position of the pivot element Quicksort (A,p,q-1) Quicksort (A,q+1,r) } } Conquer
Two main problems Problem 1: How to choose the pivot element in each step? Problem 2: If the pivot element is chosen, how to perform the partition?
Problem2: Partition Key step of the quicksort algorithm Let’s say we want to sort an array A[p::r]. Assume that We select x = A[q] as a pivot. Then how can we partition the array into two Segments S1, S2?
Easiest way If we know the size of S1,S2 and have 2 additional arrays of size S1 and S2, we can scan the array and allocate the elements to S1 or S2 according to their values comparing to x. S1 S2 6 S1 S1 S2 S
Partition However, in practice, we neither know the size of S1 and S2, nor have additional spaces for S1 and S2. What should we do? Probing: Start from both ends, grow the size of S1, S2 until they meet. Swaping: Swap the elements in wrong places = Allocate 2 elements to their right places simultaneously.
Partitioning Strategy First, get the pivot element out of the way by swapping it with the last element. (Swap pivot and A[right]) Let i start at the first element and j start at the next-to-last element (i = left, j = right-1). To partition A[left … right-1] pivot ij
Partitioning Strategy Two Requirements: A[p] <= pivot, for p < i A[p] >= pivot, for p > j <=pivot >=pivot ij When i < j Move i right, skipping over elements smaller than the pivot (In other words, move i right, compare, stop when A[i] >= pivot). Move j left, skipping over elements greater than the pivot (In other wards, move j left, compare, stop when A[j] <= pivot or j is out of range).
Partitioning Strategy When i and j have stopped and i is to the left of j Swap A[i] and A[j] The large element is pushed to the right and the small element is pushed to the left Repeat the process until i and j cross ij pivot ij pivot swap 6
Partitioning Strategy When i and j have crossed Swap A[i] and pivot, where A[i] >= pivot. Results: A[p] <= pivot, for p < i A[p] >= pivot, for p > i pivot j < i swap j i
Problem 1: Pivot Selection Pivot selection affects the performance of quick sort remarkably. It can be proved that quick sort achieves its best performance if the median element is chosen as the pivot in each step Goal: Select the median element (impossible) Approaches: 1.Use the first/last element 2.Choose randomly 3.Use the median of three(leftmost, rightmost, center)
Pseudocode of Quick Sort void quicksort(A[], p, r) { int i = p; int j = r-1; if(i <= j) { int pivot = A[r]; // pick the rightmost element as pivot //begin partitioning for(;;) { while(A[i]<pivot && i<r) { i++; } while(A[j]>pivot && j>p) { j--; } if(i < j) swap(A[i], A[j]); else break; } swap(A[i], A[r]); // recursive sort each partition quicksort(A, p, i-1); // Sort left partition quicksort(A, i+1, r); // Sort right partition }
Running time analysis To analyze the running time, we focus on the number of comparisons needed. Let T(n) denote the running time of quicksort. The total running time involves the following. Pivot selection: O(1) time Partitioning: O(n) time Running time of two recursive steps Therefore T(n) = T(i) + T(n - i - 1) + cn where i is the number of elements in the first partition and c is a constant.
The worst case occurs when the pivot is the smallest element or the largest element (i=0 or n-1), all the time. The partition is always UNBALANCED. Worst Case
44 Sorting Algorithms Comparison Method Average Time Best Time Worst Time Auxiliary Space Sort In Place Stability Simple Sort (Selection, Insertion, Bubble, …) O(n 2 ) O(n 2 ) / O(n) / O(n) O(n 2 )O(1)Yes Quick SortO(nlogn) O(n 2 ) (logn) (stack size) YesNo Heap SortO(nlogn) O(1)YesNo Merge SortO(nlogn) O(n)NoYes
45 Timing Comparisons O(n 2 ) Sorting Algorithms
46 Timing Comparisons O(n log n) Sorting Algorithms
47 How to sort … 1. Distinct Integers in Reverse Order Radix Sort is best, if space is not a factor. Insertion Sort: O(n 2 ) – also worst case Selection Sort: always O(n 2 ) Bubble Sort: O(n 2 ) – also worst case Quicksort: Simple Quicksort: O(n 2 ) –worst case Median-of-3 pivot picking, O(n log n) Mergesort: always O(n log n)
48 How to sort … 2. Distinct Real Numbers in Random Order Quicksort is best. Mergesort is also good if space is not a factor. Insertion Sort: O(n 2 ) Selection Sort: always O(n 2 ) Bubble Sort: O(n 2 ) Quicksort: O(n log n) in average case (instable) Mergesort: always O(n log n) (stable)
49 How to sort … 1. Distinct Integers with One Element Out of Place Insertion Sort is best. If the element is “later” than its proper place then Bubble Sort (to the smallest end) is also good.
50 How to sort … 3. Distinct Integers with One Element Out of Place Insertion Sort: O(n) Selection Sort: always O(n 2 ) Bubble Sort: “Later”: O(n). “Earlier”: O(n 2 ). Quicksort: Simple Quicksort: O(n 2 ) –close to the worst case Median-of-3 pivot picking, O(n log n) Mergesort: always O(n log n)
51 How to sort … 4. Distinct Real Numbers, “Almost Sorted” Insertion Sort is best, Bubble Sort almost as good Insertion Sort: Almost O(n). Selection Sort: always O(n 2 ) Bubble Sort: Almost O(n). Quicksort: depending on data, somewhere between O(n 2 ) and O(n log n) Mergesort: always O(n log n)