Sorting And Searching CSE116A,B 4/6/2019 B.Ramamurthy
Introduction The problem of sorting a collection of keys to produce an ordered collection is one of the richest in computer science. The richness derives from the fact that there are a number of ways of solving the sorting problem. Sorting is a fascinating operation that provides a model for investigating many problems in Computer Science. Ex: analysis and comparison of algorithms, divide and conquer, space and time tradeoff, recursion etc. 4/6/2019 B.Ramamurthy
Sorting Comparison based: selection, insertion sort (with online animations) Divide and conquer: merge sort, quick sort (with animations from supplements of your text book) Priority Queue /Heap based: heap sort Assume: Ascending order for all our discussion 4/6/2019 B.Ramamurthy
Selection Sort Select the first smallest element, place it in first location (by exchanging contents of locations), find the next smallest, place it in the next location, and so on. We will look at an example, an algorithm, analyze the algorithm, and look at code implementation. 4/6/2019 B.Ramamurthy
Selection Sort: Example 10 8 16 2 6 4 2 8 6 10 16 4 2 4 6 10 16 8 2 4 6 10 16 8 Sorted array 2 4 6 8 16 10 2 4 6 8 10 16 4/6/2019 B.Ramamurthy
Selection Sort Pseudo Code Let cursor be pointer to first element of an n element list to be sorted. Repeat until cursor points to last but one element. 2.1 Search for the smallest element in the list starting from the cursor. Let it be at location target. 2.2 Exchange elements at cursor and target. 2.3 Update cursor to point to next element. 4/6/2019 B.Ramamurthy
Sort Analysis Let el be the list; n be the number of elements; cursor = 0; target = 0; while (cursor < n-1) 2.1.1 target = cursor; 2.1.2 for (i= cursor+1; i < n; i++) if (el[i] < el[target]) target =i; 2.2 exchange (el[target], el[cursor]); // takes 3 assignments 2.3 cursor++; (n-1) *4 + 2 *(n + (n-1) + (n-2) ..1) 4n – 4 + 2*n(n+1)/2 = 4n – 4 + n2 + n = n2 + 5n - 4 An2 + B n + C where A= 1, B = 5, C = -4; for large n, drop the constant, lower order term in n, and the multiplicative constant to get big-O notation O(n2) – quadratic sort 4/6/2019 B.Ramamurthy
Insertion Sort 10 8 16 2 6 4 Unsorted array 10 Trivially sorted 8 10 4/6/2019 B.Ramamurthy
Insertion Sort Pseudo Code Single element is trivially sorted; start with first element; Repeat for second to nth element of the list: 2.1 cursor = next location; 2.2 Find a location to insert for list[cursor] by comparing and shifting; let the location be target; 2.3 Insert list[target] = list[cursor] 4/6/2019 B.Ramamurthy
Insertion Sort Analysis cursor = 0; while (cursor < n) 2.1 cursor = cursor + 1; 2.2 temp = list[cursor] // save element to inserted 2.2.2 j = cursor; //find location 2.2.3 while (j > 0 && list[j-1] > temp ) list[j] = list[j-1]; //shift right j = j –1; // assert : location found or hit left end(j=0) 2.3 list[j] = temp; Worst case: O(n2) quadratic Best case : linear (when the list is already sorted) 4/6/2019 B.Ramamurthy
Merge Sort Divide and Conquer Divide the list into two subsets s1, and s2 (recurse) Sort s1 and s2 by divide and conquer (conquer) Merge the sorted s1 and s2. O(n log n) algorithm 4/6/2019 B.Ramamurthy
Merge Sort (s) mergeSort(s): If S.size() > 1 1. S1, S2 partition (S, n/2) 2. mergeSort(s1); 3. mergeSort(s2); 4. S merge(s1,s2) Lets look at examples. 4/6/2019 B.Ramamurthy
Example partition 10 8 16 2 6 4 10 8 6 2 16 4 S1 S2 10 8 6 2 16 4 S21 S22 S11 S12 8 6 16 4 S121 S122 S221 S222 4/6/2019 B.Ramamurthy
Example merge 2 4 10 8 6 16 6 8 10 2 4 16 S1 S2 10 6 8 2 4 16 S21 S22 S11 S12 8 6 16 4 S121 S122 S221 S222 4/6/2019 B.Ramamurthy
Algorithm list merge(s1, s2) s empty list while (!s1.empty() && !s2.empty()) // Assert: both lists non empty 2.1 if s1.firstElem() < s2.firstElem() s.insertLast(s1.removeFirst()); else s.insertLast(s2.removeFirst()); 3. //Assert: s1 is empty or s2 is empty 3.1 while (!s1.empty()) 3.2 while (!s2.empty()) 4. return s; 2*k n-k n + k n + c*n (c+1)*n O(n) 4/6/2019 B.Ramamurthy
Analysis of merge sort Running time is time spent each level merging the nodes: Number of levels: 1+ ceiling(log n) Since the time spent at each of the is O(n), we have the following result: Algorithm mergesort sorts a list of size n in O(n log n) time in the worst case. 4/6/2019 B.Ramamurthy
Quick Sort Recursive sort; divide and conquer Divide: select an element called the pivot; typically last or first element is chosen to be the pivot; partition the list into three lists: L: elements in S less than pivot E: elements in S equal to pivot (single element for list of distinct elements) G: elements in S greater than pivot Recurse: Recursively quick sort the lists L and G. Conquer: Form the sorted list by concatenating L, E and G. 4/6/2019 B.Ramamurthy
Quicksort Example pivot 10 8 2 6 16 4 2 10 8 6 16 10 8 6 null null Partition around pivot 4/6/2019 B.Ramamurthy
Quicksort Example pivot 10 8 2 6 16 4 2 4 6 8 10 16 2 16 10 8 6 16 6 8 null 6 8 10 10 8 8 10 null Concatenate {L}{pivot}{G} 10 null 4/6/2019 B.Ramamurthy
Quicksort Worst case: when the list is already sorted. Let si be the sum of all sizes of the nodes to be sorted at level i. In the worst case the number of levels is n. S0 = n S1 = n –1 since every element skews to one side; S2 = n –2 and so on. worst case running time is O(n + (n-1) + (n-2) + ..1) = O(n2) best case: O(n log n) 4/6/2019 B.Ramamurthy
Heap : Definition Heap is a loosely ordered complete binary tree. A heap is a complete binary tree with values stored in its nodes such that no child has a value greater than the value of its parent. A heap is a complete binary tree : 1. That is empty or 2a. Whose root contains a search key greater than or equal to both its children node. 2b. Whose left subtree and right subtree are heaps. 4/6/2019 B.Ramamurthy
Types of heaps Heaps can be “max” heaps or “min” heaps. Above definition was for a “max” heap. In a max-heap the root is higher than or equal to the values in its left and right child. In a min-heap the root is smaller than or equal to the values in its left and right child. 4/6/2019 B.Ramamurthy
ADT Heap createHeap ( ) destroyHeap ( ) empty ( ) heapInsert (Object newItem) Object heapDelete ( ) // always the root 4/6/2019 B.Ramamurthy
Example Consider data set: 6 3 5 9 2 10 1. Implement it as a complete binary tree. 2. Heap left sub-tree. 3. Heap right sub-tree. 4. Heap the root, left node and right node of root. Note : When heaping choose the largest of the two children node to move up for “max” heap. 4/6/2019 B.Ramamurthy
Example 1 6 9 5 3 2 10 6 3 5 9 2 10 2 3 6 9 10 3 2 5 10 9 6 3 2 5 4 4/6/2019 B.Ramamurthy
Delete Root Item In a max heap the root item is the largest and is the chosen one for deletion. 1. After deletion of root, two disjoint heaps result. 2. Place last node as a root, to form a semi-heap. 3. Use trickle-down to form a heap. Running time : 3*log N + 1 = O(log N) Consider a heap example discussed above. Delete root item. 4/6/2019 B.Ramamurthy
Insert An Item 1. Insert a node as a last node. 2. Trickle up (repeat for various levels) to form a heap. Consider inserting 15 into the heap of the “delete” example. Insert is also a O(log N) operation. 4/6/2019 B.Ramamurthy
Insert Node : Example 9 5 6 3 2 9 5 6 3 2 15 1 2 9 5 15 3 2 6 15 2 repeat 5 9 3 2 6 4/6/2019 B.Ramamurthy
Priority Queue Priority Queue implemented as a heap. Priority queue inserts are done according to some criteria known as “priority” Insert location are according to priority and delete are to the head of queue. PQueue constructor PQueueInsert PQueueDelete Data: Heap pQ; 4/6/2019 B.Ramamurthy
Heap Sort Heapsort: 1. Represent elements as a min-heap. 2. Delete item at root and add to result, as it is the smallest. 3. Heap, repeat step 2, until one item is left. 4. Delete the last item and add to result. O(N*log N) in the worst case! 4/6/2019 B.Ramamurthy
Heap sort: Example 10 8 16 2 6 4 10 8 4 6 16 2 2 Min-heap 4 8 10 16 6 Lets do that rest by hand. 4/6/2019 B.Ramamurthy
Heap Sort in place Since heap is a complete binary tree, the nodes can be stored in contiguous storage such as an array. Assuming root is at 0: Parent of a node(j) is node ((j-1)/2) Left child of node(j) is node(2j+1) if 2j < n Right child of node(j) is node(2(j+1)) if 2(j+1)<n Adding last leaf is equivalent to adding a element as last element of the contiguous storage. 4/6/2019 B.Ramamurthy
Priority Queue(PQ) Queue that maintains a list of elements according to some priority. Queue front is the element removed. Element added as last. For linear list PQ: insert is O(n) since we have to search through the list to find the right place. Strict ordering. For heap PQ: insert is O(log n) since we insert it a leaf and siftUp() which in O(log n); loose ordering. 4/6/2019 B.Ramamurthy
Heap and Priority Queue implements extends 4/6/2019 B.Ramamurthy
Heap and Priority Queue (code) Heap will have other support methods: root(), parent(), rightChild(), leftChild() etc. Homework: To reinforce the concepts, implement the heap and implement the PQ using the class diagram given. Adapt the code in your textbook p.312-313 4/6/2019 B.Ramamurthy
Quicksort in place quickSort(T, left, right) 1. if left < right 1.1 pivot = partition(T, left, right); // assert: T is partially sorted during partition 1.2 quickSort(T, left, pivot –1); 1.3 quickSort(T, pivot+1, right); 2. return; 4/6/2019 B.Ramamurthy
Partition(T,first,last) 1. Initialize: i = first; j = last-1; 2. while ( i < j) 2.1 Increase i: while((i<j) && T(i) <= T(last)), i = i+1; 2.2 Decrease j: while ((j>i) &&T(j) > T(last)), j = j –1; 2.3 Exchange/Done: 2.3.1 If j > i, exchange(T(i), T(j)); 2.3.2 else exchange(T(i),T(last)); return i; // done; i is the pivot between // partitions 4/6/2019 B.Ramamurthy
Example 85 24 63 45 17 31 96 50 4/6/2019 B.Ramamurthy
Comparison Small data set: selection sort Insert into already sorted list: insertion sort Medium size data set to be sorted in place: quick sort Very large data set on disk/tape: merge sort 4/6/2019 B.Ramamurthy
Summary of Sorting Algorithms Time Notes selection-sort O(n2) slow in-place for small data sets (< 1K) insertion-sort heap-sort O(n log n) fast for large data sets (1K — 1M) merge-sort sequential data access for huge data sets (> 1M) 4/6/2019 Goodrich and Tamassia's
Search Linear Search: for a small set of data; data can be unsorted Binary Search: efficient; a for large list of data; Pre-condition: the data is sorted 4/6/2019 B.Ramamurthy
Linear Search int linearSearch(array, value) for (i =0; i<n; i++) if array[i] == value return i; end return –1; Analysis: go through the loop n times in the worst case. O(n) 4/6/2019 B.Ramamurthy
Binary Search (sorted data) int binarySearch(list, target, first, last) { { int mid; if (first > last) return -1; // not found mid = (first + last) / 2; if (list.get(mid).equals(target)) return mid; if (list.get(mid) < target) return binarySearch(list, target, mid+1, last); return binarySearch(list, target, first, mid-1); } 4/6/2019 B.Ramamurthy
Binary Search Divide and Conquer O(log n) Demo from Princeton University 4/6/2019 B.Ramamurthy
Comparing Linear and Binary Search 69 44 87 38 75 73 67 56 47 53 50 34 36 39 76 16 elements 97 1 2 3 4 5 69 44 87 38 75 73 67 56 47 53 50 34 36 39 76 97 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 2 3 4 found 4/6/2019 B.Ramamurthy
Binary Search: divide and conquer 34 36 38 39 44 47 50 69 87 75 73 67 56 53 76 97 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 56 67 69 73 87 75 76 97 75 76 87 97 log2n 87 97 97 4/6/2019 B.Ramamurthy
More on Binary Search Double the size to 32 Consider larger values of n: build a tree and check it out. 4/6/2019 B.Ramamurthy
Summary We studied Sorting methods and search methods Use of recursion Analysis of algorithms Divide and conquer method Your homework is to look into the java of heap/priority queue 4/6/2019 B.Ramamurthy