Sorting 1 Devon M. Simmonds University of North Carolina, Wilmington TIME: Tuesday/Thursday 11:11:50am in 1012 & Thursday 3:30-5:10pm in Office hours: TR 1-2pm or by appointment. Office location: CI
Objectives To introduce basic sort algorithms: –Exchange/bubble sort –Insertion sort –Selection soft –Quicksort –mergesort 2
3 Sorting: The Big Picture Given n comparable elements in an array, sort them in an increasing (or decreasing) order. Simple algorithms: O(n 2 ) Fancier algorithms: O(n log n) Comparison lower bound: (n log n) Specialized algorithms: O(n) Handling huge data sets Insertion sort Selection sort Bubble sort Shell sort … Heap sort Merge sort Quick sort … Bucket sort Radix sort External sorting CSE 326: Data Structures Sorting Ben Lerner Summer 2007
4 Exchange (Bubble) Sort Algorithm (for a list a with n elements) –Repeat until list a is sorted for each i from n-1 downto 1 – if(a[i] < a[i-1]) »exchange (a[i], a[i-1]) –Result: After the k th pass, the first k elements are sorted.
Example
6 Try it out: Bubble sort Insert 31, 16, 54, 4, 2, 17, 6
7 Bubble Sort Code def bubbleSort(self, a): for i in range(len(a)): for j in range(len(a)-1, 0, -1): if(a[j] < a[j-1]): #exchange items temp = a[j-1] a[j-1] = a[j] a[j] = temp
8 Insertion Sort: Idea Algorithm (for a list a[0..n] with n elements) –for each i from 1 to n-1 put the i th element in the correct place among the first i+1 elements –At the k th step, put the k th input element in the correct place among the first k elements –Result: After the k th pass, the first k elements are sorted.
9 Example CSE 326: Data Structures Sorting Ben Lerner Summer 2007
10 Example CSE 326: Data Structures Sorting Ben Lerner Summer 2007
11 Try it out: Insertion sort Insert 31, 16, 54, 4, 2, 17, 6 CSE 326: Data Structures Sorting Ben Lerner Summer 2007
12 Insertion Sort Code def insertionSort(self, a): for i in range(1, len(a)): #insert a(i) in correct position in a(0).. a(i) temp = a[i] j = i-1 while (j >= 0 and a[j] > temp): if(a[j] > temp): a[j+1] = a[j] j -= 1 a[j+1] = temp
13 Selection Sort: idea Find the smallest element, put it 1 st Find the next smallest element, put it 2 nd Find the next smallest, put it 3 rd And so on … CSE 326: Data Structures Sorting Ben Lerner Summer 2007
14 Selection Sort: idea Algorithm (for a list a with n elements) –for each i from 1 to n-1 Find the smallest element, put it in position i-1 –Result: After the k th pass, the first k elements are sorted.
15 Selection Sort: Code void SelectionSort (Array a[0..n-1]) { for (i=0, i<n; ++i) { j = Find index of smallest entry in a[i..n-1] Swap(a[i],a[j]) } Runtime: worst case : best case : average case : CSE 326: Data Structures Sorting Ben Lerner Summer 2007
16 Try it out: Selection sort Insert 31, 16, 54, 4, 2, 17, 6 CSE 326: Data Structures Sorting Ben Lerner Summer 2007
17 Selection Sort Code def selectionSort(self, a): for i in range(len(a)-1): #find smallest of items i, i+1, i+2,.., size()-1 and #exchange smallest with item in position i sIndex = i smallest = a[i] #j = i + 1 #while (j < len(a)): for j in range(i+1, len(a)): if(a[j] < smallest): sIndex = j smallest = a[j] #exchange items temp = a[i] a[i] = smallest a[sIndex] = temp
18 Merge Sort MergeSort (Array [1..n]) 1. Split Array in half 2. Recursively sort each half 3. Merge two halves together Merge (a1[1..n],a2[1..n]) i1=1, i2=1 While (i1<n, i2<n) { if (a1[i1] < a2[i2]) { Next is a1[i1] i1++ } else { Next is a2[i2] i2++ } Now throw in the dregs… “The 2-pointer method” CSE 326: Data Structures Sorting Ben Lerner Summer 2007
19 Mergesort example Divide element _ Merge Merge Final: CSE 326: Data Structures Sorting Ben Lerner Summer 2007
20 Mergesort example CSE 326: Data Structures Sorting Ben Lerner Summer 2007 ms([ ]) Dividelh=[ ] rh=[ ] 3 function Calls ms(lh)ms(rh) merge(lh, rh) ms([ ]) Divide 3 function Calls ms(lh)ms(rh) merge(lh, rh) lh=[8 2] rh=[9 4] def mergeSort(self, alist): if len(alist)>1: mid = len(alist)//2 lefthalf = alist[:mid] righthalf = alist[mid:] self.mergeSort(lefthalf) self.mergeSort(righthalf) self.merge(lefthalf, righthalf, alist) DivideDivide ms([8 2]) Divide 3 function Calls ms(lh)ms(rh) merge(lh, rh) lh=[8] rh=[2] def mergeSort(self, alist): if len(alist)>1: mid = len(alist)//2 lefthalf = alist[:mid] righthalf = alist[mid:] self.mergeSort(lefthalf) self.mergeSort(righthalf) self.merge(lefthalf, righthalf, alist) [2, 8] ms([9 4]) Divide 3 function Calls ms(lh)ms(rh) merge(lh, rh) lh=[9] rh=[4] [4, 9]
21 Mergesort example CSE 326: Data Structures Sorting Ben Lerner Summer 2007 ms([ ]) Dividelh=[ ] rh=[ ] 3 function Calls ms(lh)ms(rh) merge(lh, rh) ms([ ]) Divide 3 function Calls ms(lh)ms(rh) merge(lh, rh) lh=[8 2] rh=[9 4] def mergeSort(self, alist): if len(alist)>1: mid = len(alist)//2 lefthalf = alist[:mid] righthalf = alist[mid:] self.mergeSort(lefthalf) self.mergeSort(righthalf) self.merge(lefthalf, righthalf, alist) DivideDivide ms([8 2]) [2, 8] ms([9 4]) [4, 9] merge([2, 8], [4 9]) [2, 4, 8, 9] What is next? ms([ ])
22 Try it out: Merge sort Insert 31, 16, 54, 4, 2, 17, 6 CSE 326: Data Structures Sorting Ben Lerner Summer 2007
23 MergeSort Code def mergeSort(self, alist): if len(alist)>1: mid = len(alist)//2 lefthalf = alist[:mid] righthalf = alist[mid:] print("Splitting ",alist, "into", lefthalf, "and", righthalf) self.mergeSort(lefthalf) self.mergeSort(righthalf) self.merge(lefthalf, righthalf, alist) def merge(self, lefthalf, righthalf, alist): print("Merging ", lefthalf, "and", righthalf) i=0 j=0 k=0 while i<len(lefthalf) and j<len(righthalf): if lefthalf[i]<righthalf[j]: alist[k]=lefthalf[i] i=i+1 else: alist[k]=righthalf[j] j=j+1 k=k+1 while i<len(lefthalf): alist[k]=lefthalf[i] i=i+1 k=k+1 while j<len(righthalf): alist[k]=righthalf[j] j=j+1 k=k+1
24 The steps of QuickSort S select pivot value S1S1 S2S2 partition S S1S S2S2 QuickSort(S 1 ) and QuickSort(S 2 ) S Presto! S is sorted [Weiss] CSE 326: Data Structures Sorting Ben Lerner Summer 2007
ij QuickSort Example Choose the pivot as the median of three. Place the pivot and the largest at the right and the smallest at the left CSE 326: Data Structures Sorting Ben Lerner Summer
26 QuickSort Example 1)Move i to the right to first element larger than pivot. 2)Move j to the left to first element smaller than pivot. 3)Swap elements at I and j 4)Repeat until i and j cross 5)Swap pivot with element at i 6)Repeat steps 1-5 for the two partitions 1)Elements > pivot 2)Elements < pivot CSE 326: Data Structures Sorting Ben Lerner Summer ij
27 Recursive Quicksort Quicksort(A[]: integer array, left,right : integer): { pivotindex : integer; if left + CUTOFF right then pivot := median3(A,left,right); pivotindex := Partition(A,left,right-1,pivot); Quicksort(A, left, pivotindex – 1); Quicksort(A, pivotindex + 1, right); else Insertionsort(A,left,right); } Don’t use quicksort for small arrays. CUTOFF = 10 is reasonable. CSE 326: Data Structures Sorting Ben Lerner Summer 2007
28 Try it out: Recursive quicksort Insert 31, 16, 54, 4, 2, 17, 6 CSE 326: Data Structures Sorting Ben Lerner Summer 2007
29 QuickSort: Average case complexity Turns out to be O(n log n) See Section for an idea of the proof. Don’t need to know proof details for this course. CSE 326: Data Structures Sorting Ben Lerner Summer 2007
30 QuickSort Code def quickSort(self, array, start, end): left = start right= end if (right - left < 1): return else:#at least 2 elements to be sorted pivot = array[start] while (right> left): while (array[left] <= pivot and left < right): left += 1 while (array[right] > pivot and right >= left): right -=1 if (right> left): swap(array, left, right) right -= 1 left += 1 #swap array[start] and array[right] temp = array[start] array[start] = array[right] array[right] = temp self.quickSort(array, start, right- 1); self.quickSort(array, right+ 1, end)
31 Features of Sorting Algorithms In-place –Sorted items occupy the same space as the original items. (No copying required, only O(1) extra space if any.) Stable –Items in input with the same value end up in the same order as when they began. CSE 326: Data Structures Sorting Ben Lerner Summer 2007
32 Sort Properties Are the following: stable?in-place? Bubble Sort?NoYesCan Be NoYes Insertion Sort?NoYesCan Be NoYes Selection Sort?NoYesCan Be NoYes MergeSort?NoYesCan Be NoYes QuickSort?NoYesCan Be NoYes Your Turn CSE 326: Data Structures Sorting Ben Lerner Summer 2007
33 How fast can we sort? Heapsort, Mergesort, and Quicksort all run in O(N log N) best case running time Can we do any better? No, if the basic action is a comparison. CSE 326: Data Structures Sorting Ben Lerner Summer 2007
34 Sorting Model Recall our basic assumption: we can only compare two elements at a time –we can only reduce the possible solution space by half each time we make a comparison Suppose you are given N elements –Assume no duplicates How many possible orderings can you get? –Example: a, b, c (N = 3) CSE 326: Data Structures Sorting Ben Lerner Summer 2007
35 Permutations How many possible orderings can you get? –Example: a, b, c (N = 3) –(a b c), (a c b), (b a c), (b c a), (c a b), (c b a) –6 orderings = = 3! (ie, “3 factorial”) –All the possible permutations of a set of 3 elements For N elements –N choices for the first position, (N-1) choices for the second position, …, (2) choices, 1 choice –N(N-1)(N-2) (2)(1)= N! possible orderings CSE 326: Data Structures Sorting Ben Lerner Summer 2007
36 BucketSort (aka BinSort) If all values to be sorted are known to be between 1 and K, create an array count of size K, increment counts while traversing the input, and finally output the result. Example K=5. Input = (5,1,3,4,3,2,1,1,5,4,5) count array Running time to sort n items? CSE 326: Data Structures Sorting Ben Lerner Summer 2007
37 BucketSort Complexity: O(n+K) Case 1: K is a constant –BinSort is linear time Case 2: K is variable –Not simply linear time Case 3: K is constant but large (e.g ) –??? CSE 326: Data Structures Sorting Ben Lerner Summer 2007
38 Fixing impracticality: RadixSort Radix = “The base of a number system” –We’ll use 10 for convenience, but could be anything Idea: BucketSort on each digit, least significant to most significant (lsd to msd) CSE 326: Data Structures Sorting Ben Lerner Summer 2007
Bucket sort by 1’s digit Input data This example uses B=10 and base 10 digits for simplicity of demonstration. Larger bucket counts should be used in an actual implementation. Radix Sort Example (1 st pass) After 1 st pass CSE 326: Data Structures Sorting Ben Lerner Summer 2007
40 Bucket sort by 10’s digit Radix Sort Example (2 nd pass) After 1 st passAfter 2 nd pass CSE 326: Data Structures Sorting Ben Lerner Summer 2007
41 Bucket sort by 100’s digit Radix Sort Example (3 rd pass) After 2 nd pass After 3 rd pass Invariant: after k passes the low order k digits are sorted. CSE 326: Data Structures Sorting Ben Lerner Summer 2007
42 RadixSort Input:126, 328, 636, 341, 416, 131, BucketSort on lsd: BucketSort on next-higher digit: BucketSort on msd: Your Turn CSE 326: Data Structures Sorting Ben Lerner Summer 2007
43 Radixsort: Complexity How many passes? How much work per pass? Total time? Conclusion? In practice –RadixSort only good for large number of elements with relatively small values –Hard on the cache compared to MergeSort/QuickSort CSE 326: Data Structures Sorting Ben Lerner Summer 2007
44 Internal versus External Sorting So far assumed that accessing A[i] is fast – Array A is stored in internal memory (RAM) –Algorithms so far are good for internal sorting What if A is so large that it doesn’t fit in internal memory? –Data on disk or tape –Delay in accessing A[i] – e.g. need to spin disk and move head CSE 326: Data Structures Sorting Ben Lerner Summer 2007
45 Internal versus External Sorting Need sorting algorithms that minimize disk/tape access time External sorting – Basic Idea: –Load chunk of data into RAM, sort, store this “run” on disk/tape –Use the Merge routine from Mergesort to merge runs –Repeat until you have only one run (one sorted chunk) –Text gives some examples CSE 326: Data Structures Sorting Ben Lerner Summer 2007
46 Summary of sorting Sorting choices: –O(N 2 ) – Bubblesort, Insertion Sort –O(N log N) average case running time: Heapsort: In-place, not stable. Mergesort: O(N) extra space, stable. Quicksort: claimed fastest in practice, but O(N 2 ) worst case. Needs extra storage for recursion. Not stable. –O(N) – Radix Sort: fast and stable. Not comparison based. Not in-place. CSE 326: Data Structures Sorting Ben Lerner Summer 2007
Fundamentals of Python: From First Programs Through Data Structures47 Search Algorithms We now present several algorithms that can be used for searching and sorting lists –We first discuss the design of an algorithm, –We then show its implementation as a Python function, and, –Finally, we provide an analysis of the algorithm’s computational complexity To keep things simple, each function processes a list of integers
Fundamentals of Python: From First Programs Through Data Structures48 Search for a Minimum Python’s min function returns the minimum or smallest item in a list Alternative version: n – 1 comparisons for a list of size n O(n)
Fundamentals of Python: From First Programs Through Data Structures49 Linear Search of a List Python’s in operator is implemented as a method named __contains__ in the list class –Uses a sequential search or a linear search Python code for a linear search function: –Analysis is different from previous one
Fundamentals of Python: From First Programs Through Data Structures50 Best-Case, Worst-Case, and Average- Case Performance Analysis of a linear search considers three cases: –In the worst case, the target item is at the end of the list or not in the list at all O(n) –In the best case, the algorithm finds the target at the first position, after making one iteration O(1) –Average case: add number of iterations required to find target at each possible position; divide sum by n O(n)
Fundamentals of Python: From First Programs Through Data Structures51 Binary Search of a List A linear search is necessary for data that are not arranged in any particular order When searching sorted data, use a binary search
Fundamentals of Python: From First Programs Through Data Structures52 Binary Search of a List (continued) –More efficient than linear search Additional cost has to do with keeping list in order
Summary of Searching Linear versus binary O(n) vs O(lgn) 53
54 ______________________ Devon M. Simmonds Computer Science Department University of North Carolina Wilmington _____________________________________________________________ Qu es ti ons? Reading from course text:
55 What is sorting? Given n elements, arrange them in an increasing or decreasing order by some attribute. Simple algorithms: O(n 2 ) Fancier algorithms: O(n log n) Comparison lower bound: (n log n) Specialized algorithms: O(n) Handling huge data sets Insertion sort Selection sort Bubble sort Shell sort … Heap sort Merge sort Quick sort … Bucket sort Radix sort External sorting