CSE 326: Data Structures: Sorting Lecture 13: Wednesday, Feb 5, 2003
Today Finish extensible hash tables Sorting Read Chapter 7 ! Will take several lectures Read Chapter 7 ! Except Shellsort (7.4)
Hash Tables on Secondary Storage (Disks) Main differences: One bucket = one block, hence may hold multiple keys Open chaining: use overflow blocks when needed Closed chaining never used
Hash Table Example Assume 1 bucket (block) stores 2 keys + pointers h(e)=0 h(b)=h(f)=1 h(g)=2 h(a)=h(c)=3 e b f g a c 1 2 3
Searching in a Hash Table Search for a: Compute h(a)=3 Read bucket 3 1 disk access e b f g a c 1 2 3
Insertion in Hash Table Place in right bucket, if space E.g. h(d)=2 e b f g d a c 1 2 3
Insertion in Hash Table Create overflow block, if no space E.g. h(k)=1 More over- flow blocks may be needed e b f g d a c k 1 2 3
Hash Table Performance Excellent, if no overflow blocks Degrades considerably when number of keys exceeds the number of buckets (I.e. many overflow blocks).
Extensible Hash Table Allows has table to grow, to avoid performance degradation Assume a hash function h that returns numbers in {0, …, 2k – 1} Start with n = 2i << 2k , only look at first i most significant bits
Extensible Hash Table E.g. i=1, n=2i=2, k=4 Note: we only look at the first bit (0 or 1) i=1 0(010) 1 1 1(011) 1
Insertion in Extensible Hash Table 0(010) 1 1 1(011) 1(110) 1
Insertion in Extensible Hash Table Now insert 1010 Need to extend table, split blocks i becomes 2 i=1 0(010) 1 1 1(011) 1(110), 1(010) 1
Insertion in Extensible Hash Table 0(010) 1 00 01 10(11) 10(10) 2 10 11 11(10) 2
Insertion in Extensible Hash Table Now insert 0000, then 0101 Need to split block i=2 0(010) 0(000), 0(101) 1 00 01 10(11) 10(10) 2 10 11 11(10) 2
Insertion in Extensible Hash Table After splitting the block 00(10) 00(00) 2 i=2 01(01) 2 00 01 10(11) 10(10) 2 10 11 11(10) 2
Extensible Hash Table How many buckets (blocks) do we need to touch after an insertion ? How many entries in the hash table do we need to touch after an insertion ? Only one block: that which overflowed But we need to copy all hash table entries from the old table to the new table.
Performance Extensible Hash Table No overflow blocks: access always O(1) More precisely: exactly one disk I/O BUT: Extensions can be costly and disruptive After an extension table may no longer fit in memory
Sorting Perhaps the most common operation in programs The authoritative text: D. Knuth, The Art of Computer Programming, Vol. 3
Material to be Covered Sorting by comparision: Bubble Sort Selection Sort Merge Sort QuickSort Efficient list-based implementations Formal analysis Theoretical limitations on sorting by comparison Sorting without comparing elements Sorting and the memory hierarchy
Bubble Sort Idea We want A[1] A[2] … A[N] Bubble sort idea: If A[i-1] > A[i] then swap A[i-1] and A[i] Do this for i = 1, …, n-1 Repeat this until it’s sorted
Bubble Sort procedure BubbleSort (Array A, int N) repeat { isSorted = true; for (i=1 to N-1) { if ( A[i-1] > A[i] ){ swap( A[i-1], A[i] ); isSorted = false; } until isSorted
Bubble Sort Improvements After the 1st iteration: largest element A[n-1] After the 2nd iteration: Second largest element A[n-2] Question: what is the max number of iterations, and, hence the worst case running time ? Improvement: stop the iterations earlier: for (i=1 to N-1) for (i=1 to N-2) ... for (i=1 to 1) In fact we may be lucky, and be able decrease i more aggresively
Bubble Sort procedure BubbleSort (Array A, int N) m = N; repeat { newM = 1; for (i=1 to m-1) { if ( A[i-1] > A[i] ){ swap( A[i-1], A[i] ); newM = i-1; } m = newM; while m > 1
Bubble Sort So the worst-case running time is T(n) = O(n2) Is the worst-case running time also (n2) ? You need to find a worst-case input of size n for which the running time is n2.
Find minimum, move to A[i] Selection Sort procedure SelectSort (Array A, int N) for (i=0 to N-2) { /* find the minimum among A[i],...,A[n-1] */ /* place it in A[i] */ m = i; for (j=i+1 to N-1) if ( A[m] > A[j] ) m = j; swap(A[i], A[m]); } A[0] ... A[i] A[i+1] A[n-1] Finished Find minimum, move to A[i]
Selection Sort Worst case running time: T(n) = O( ?? ) T(n) = ( ?? )
Sorted, but not necessarily finished Insertion Sort procedure InsertSort (Array A, int N) for (i=1 to N-1) { /* A[0], A[1], ..., A[i-1] is sorte */ /* now insert A[i] in the right place */ x = A[i]; for (j=i-1; j>0 && A[j] > x; j--) A[j+1] = A[j]; A[j] = x; } A[0] ... A[i] A[i+1] A[n-1] Sorted, but not necessarily finished insert A[i] to the left
Insertion Sort Worst case running time: T(n) = O( ?? ) T(n) = ( ?? )
Merge Sort The Merge Operation: given two sorted sequences: A[0] A[1] ... A[m-1] B[0] B[1] ... B[n-1] Construct another sorted sequence that is their union Merge (A[0..m-1],B[0..n-1]) i1=0, i2=0 While i1<m, i2<n If T1[i1] < T2[i2] Next is T1[i1] i1++ Else Next is T2[i2] i2++ End If End While Merging Cars by key [Aggressiveness of driver]. Most aggressive goes first. Photo from http://www.nrma.com.au/inside-nrma/m-h-m/road-rage.html
Merge Sort Function MergeSort (Array A[0..n-1]) if n 1 return A Merge(MergeSort(A[0..n/2-1]), MergeSort(A[n/2..n-1]))
Merge Sort Running Time Any difference best / worse case? T(1) = b T(n) = 2T(n/2) + cn for n>1 T(n) = 2T(n/2)+cn T(n) = 4T(n/4) +cn +cn substitute T(n) = 8T(n/8)+cn+cn+cn substitute T(n) = 2kT(n/2k)+kcn inductive leap T(n) = nT(1) + cn log n where k = log n select value for k T(n) = (n log n) simplify This is the same sort of analysis as see before Here’s a function defined in terms of itself. WORK THROUGH Answer: O(n log n) Generally, then, the strategy is to keep expanding these things out until you see a pattern. Then, write the general form. Finally, sub in for the series bounds to make T(?) come out to a known value and solve all the series. Tip: Look for powers/multiples of the numbers that appear in the original equation.
Merge Sort Works great with lists, or files Problems with arrays: We need a scratch array, cannot sort ‘in situ’
Heap Sort Recall: a heap is a tree where the min is at the root A heap is stored in an array A[1], ..., A[n]
Heap Sort Start with an unsorted array A[1], ..., A[n] Build a heap How much time does it take ? Get minimum, store in out array; repeat n times: A[0] ... A[i] A[i+1] A[n-1] B[0] ... B[i]
Heap Sort But then we need an extra array ! How can we do it ‘in situ’ ?
Heap Sort Input: unordered array A[1..N] Build a max heap (largest element is A[1]) For i = 1 to N-1: A[N-i+1] = Delete_Max() 7 50 22 15 4 40 20 10 35 25 50 40 20 25 35 15 10 22 4 7 40 35 20 25 7 15 10 22 4 50 35 25 20 22 7 15 10 4 40 50
Properties of Heap Sort Worst case time complexity O(n log n) Build_heap O(n) n Delete_Max’s for O(n log n) In-place sort – only constant storage beyond the array is needed
QuickSort Pick a “pivot”. Divide list into two lists: Picture from PhotoDisc.com Pick a “pivot”. Divide list into two lists: One less-than-or-equal-to pivot value One greater than pivot Sort each sub-problem recursively Answer is the concatenation of the two solutions
QuickSort: Array-Based Version Pick pivot: 7 2 8 3 5 9 6 Partition with cursors 7 2 8 3 5 9 6 < > 2 goes to less-than 7 2 8 3 5 9 6 < >
QuickSort Partition (cont’d) 6, 8 swap less/greater-than 7 2 6 3 5 9 8 < > 3,5 less-than 9 greater-than 7 2 6 3 5 9 8 Partition done. 7 2 6 3 5 9 8
QuickSort Partition (cont’d) Put pivot into final position. 5 2 6 3 7 9 8 Recursively sort each side. 2 3 5 6 7 8 9
QuickSort Complexity QuickSort is fast in practice, but has (N2) worst-case complexity Friday we will see why