Recitation 13 Searching and Sorting
Announcements Project 5 is posted Milestone due on Thursday December 3rd Final submission due on Thursday December 10th
Questions?
Searching We have a collection of data an array of integers a list of 1 million names We are asked to find a given data find the number 10 in the array find the name “Nikola Tesla” in the list If we don’t know anything about the data The numbers are randomly generated The names are not in any order We have to search every element linear search or brute force approach: O(n)
Linear Search Very lucky if at the begining of the array public static int find( int[] array, int number ) { for( int i = 0; i < array.length; i++ ) if( array[i] == number ) return i; return -1; } Very lucky if at the begining of the array Very unlucky if at the end of the array On average, we need to look at n/2 elements O(n/2) = O(n) What if the list is in order, i.e. sorted ? BINARY SEARCH!
Binary Search Assuming the list is in ascending order public static int binarySearch( int[] array, int number ) { int left = 0; int right = array.length – 1; while (right >= left) { int middle = (left + right) / 2; if (array[middle] == number) return middle; else if (array[middle] > number) right = middle - 1; else //array[middle] < number left = middle + 1; } return -1;
Binary Search Given a sorted array in ascending order, find a specific number in this array. Say, looking for 35. left middle right 26 28 30 33 35 37 41 46 0 1 2 3 4 5 6 7 left middle right 26 28 30 33 35 37 41 46 0 1 2 3 4 5 6 7 left right 26 28 30 33 35 37 41 46 0 1 2 3 4 5 6 7
Binary Search What if the number is not in the array? Say, looking for 29. left middle right 26 28 30 33 35 37 41 46 0 1 2 3 4 5 6 7 left middle right 26 28 30 33 35 37 41 46 0 1 2 3 4 5 6 7 left middle right 26 28 30 33 35 37 41 46 0 1 2 3 4 5 6 7 right left 26 28 30 33 35 37 41 46 0 1 2 3 4 5 6 7 right smaller than left, the number is not in this array, return -1
Running time of Binary Search Each time, we divide the array into two and search in only one half later on. Running time how many times n could be divide by 2 O(log n) times Remember: we mean log base 2 -> O(log2 n) If n = 1024, log n = 10 we can find any number in at most 10 iterations compare to average time n/2= 512 of linear search
Sorting Binary search requires a sorted list How can we sort an array of numbers bubble sort: O(n2) insertion sort: O(n2) merge sort: O(n logn) bucket sort: O(n+m) m is the range of the numbers
Bubble sort Idea: Large elements “bubble” to the end of the list for( int i = 0; i < array.length – 1; i++ ) for( int j = 0; j < array.length - 1; j++ ) if( array[j] > array[j + 1] ) { int temp = array[j]; array[j] = array[j + 1]; array[j + 1] = temp; } Idea: Large elements “bubble” to the end of the list until nothing is out of order What is wrong with this code? it always goes through the array n-1 times even if it is in sorted order much earlier What happens if the list is already sorted at the beginning?
Bubble sort v2 Stops when the list gets sorted boolean workToDo = true; while (workToDo) { workToDo = false; for( int j = 0; j < array.length - 1; j++ ) if( array[j] > array[j + 1] ) { int temp = array[j]; array[j] = array[j + 1]; array[j + 1] = temp; workToDo = true; } } Stops when the list gets sorted stops going through the array if there were no swaps on the preceding iteration, i.e. workToDo = false Now, what happens if the list is already sorted at the beginning?
Insertion sort Idea: keep a growing sorted list Add the ith number to the sorted numbers for( int i = 1; i < array.length; i++ ) for( int j = i; j > 0; j-- ) //count back if( array[j - 1] > array[j] ) { int temp = array[j]; array[j] = array[j - 1]; array[j - 1] = temp; } else break; Idea: keep a growing sorted list In each inner loop, array[0] ... array[i-1] are sorted. Add one more new number into the sorted numbers by swapping with larger numbers. After iteration i, array[0] ... array[i] are in perfect sorted order. Swap with larger numbers until it is in the correct position Stop when reach some smaller number
Insertion sort 45 24 13 77 71 58 66 10 1 number is sorted Add this one to the right place 24 45 13 77 71 58 66 10 2 numbers are sorted 13 24 45 77 71 58 66 10 No need for swapping 13 24 45 77 71 58 66 10 Should go here 13 24 45 71 77 58 66 10 array[0] ... array[i-1] are sorted array[i] 13 24 45 58 71 77 66 10 Now, array[0] ... array[i] are sorted
Insertion sort We should do this n-1 times to sort n numbers 13 24 45 58 71 77 66 10 13 24 45 58 66 71 77 10 10 13 24 45 58 66 71 77 Running time = 1 + 2 + 3 + … + (n-1) = (n-1)n / 2 = O(n2)
Insertion sort in descending order Swap with smaller numbers this time. swap when they are not in correct order. Just change “>” to “<” in the if statement Should go here 77 71 45 24 13 58 66 10 i-1 numbers sorted Add this one to the right place 77 71 58 45 24 13 66 10
Merge Sort Divide and conquer It has two steps: It takes O(nlogn) time Split: keep dividing in half until you have one element in your lists Merge: merge the lists while sorting in order It takes O(nlogn) time The best time in comparison based sorting algorithms
Merge Sort 45 24 13 77 71 58 66 10 45 24 13 77 71 58 66 10 45 24 13 77 71 58 66 10 45 24 13 77 71 58 66 10 24 45 13 77 58 71 10 66 13 24 45 77 10 58 66 71 10 13 24 45 58 66 71 77
Bucket Sort A better sorting algorithm: O(n) If the numbers are in a small range For example: exam grades are in [1,100] let m be the length of the range, i.e. number of possible values Algorithm: Create an array of m, array of values Traverse your array of integers for a particular value, increment the count in the corresponding index in the array of values Traverse the values array and print the index as many times as the count it holds Idea: Count how many times a particular number occurs in the array.
Bucket Sort array of integers to sort 6 2 3 5 8 9 10 1 7 array of values 2 3 1 1 2 3 4 5 6 7 8 9 10 array of integers after sorting 1 2 3 5 6 7 8 9 10
Bucket Sort Here’s bucket sort in code with a range of [min, max]: int[] values = new int[max - min + 1]; for( int i = 0; i < array.length; i++ ) values[array[i] - min]++; int count = 0; for( int i = 0; i < values.length; i++ ) { for( int j = 0; j < values[i]; j++ ) { array[count] = i + min; count++; }
Bucket Sort Actually, it runs in O(n+m) times m is the length of the range, for exam grades m=100 But since m <= n, at the worst case it is O(2n) = O(n) remember, we can drop constants in asymptotic analysis
Questions?