CSE 326: Data Structures: Sorting

Lecture 13: Wednesday, Feb 5, 2003

2 Today Finish extensible hash tables Sorting Read Chapter 7 !
Will take several lectures Read Chapter 7 ! Except Shellsort (7.4)

3 Hash Tables on Secondary Storage (Disks)
Main differences: One bucket = one block, hence may hold multiple keys Open chaining: use overflow blocks when needed Closed chaining never used

4 Hash Table Example Assume 1 bucket (block) stores 2 keys + pointers
h(e)=0 h(b)=h(f)=1 h(g)=2 h(a)=h(c)=3 e b f g a c 1 2 3

5 Searching in a Hash Table
Search for a: Compute h(a)=3 Read bucket 3 1 disk access e b f g a c 1 2 3

6 Insertion in Hash Table
Place in right bucket, if space E.g. h(d)=2 e b f g d a c 1 2 3

7 Insertion in Hash Table
Create overflow block, if no space E.g. h(k)=1 More over- flow blocks may be needed e b f g d a c k 1 2 3

8 Hash Table Performance
Excellent, if no overflow blocks Degrades considerably when number of keys exceeds the number of buckets (I.e. many overflow blocks).

9 Extensible Hash Table Allows has table to grow, to avoid performance degradation Assume a hash function h that returns numbers in {0, …, 2k – 1} Start with n = 2i << 2k , only look at first i most significant bits

10 Extensible Hash Table E.g. i=1, n=2i=2, k=4
Note: we only look at the first bit (0 or 1) i=1 0(010) 1 1 1(011) 1

11 Insertion in Extensible Hash Table
0(010) 1 1 1(011) 1(110) 1

12 Insertion in Extensible Hash Table
Now insert 1010 Need to extend table, split blocks i becomes 2 i=1 0(010) 1 1 1(011) 1(110), 1(010) 1

13 Insertion in Extensible Hash Table
0(010) 1 00 01 10(11) 10(10) 2 10 11 11(10) 2

14 Insertion in Extensible Hash Table
Now insert 0000, then 0101 Need to split block i=2 0(010) 0(000), 0(101) 1 00 01 10(11) 10(10) 2 10 11 11(10) 2

15 Insertion in Extensible Hash Table
After splitting the block 00(10) 00(00) 2 i=2 01(01) 2 00 01 10(11) 10(10) 2 10 11 11(10) 2

16 Extensible Hash Table How many buckets (blocks) do we need to touch after an insertion ? How many entries in the hash table do we need to touch after an insertion ? Only one block: that which overflowed But we need to copy all hash table entries from the old table to the new table.

17 Performance Extensible Hash Table
No overflow blocks: access always O(1) More precisely: exactly one disk I/O BUT: Extensions can be costly and disruptive After an extension table may no longer fit in memory

18 Sorting Perhaps the most common operation in programs
The authoritative text: D. Knuth, The Art of Computer Programming, Vol. 3

19 Material to be Covered Sorting by comparision:
Bubble Sort Selection Sort Merge Sort QuickSort Efficient list-based implementations Formal analysis Theoretical limitations on sorting by comparison Sorting without comparing elements Sorting and the memory hierarchy

20 Bubble Sort Idea We want A[1]  A[2]  …  A[N] Bubble sort idea:
If A[i-1] > A[i] then swap A[i-1] and A[i] Do this for i = 1, …, n-1 Repeat this until it’s sorted

21 Bubble Sort procedure BubbleSort (Array A, int N) repeat {
isSorted = true; for (i=1 to N-1) { if ( A[i-1] > A[i] ){ swap( A[i-1], A[i] ); isSorted = false; } until isSorted

22 Bubble Sort Improvements
After the 1st iteration: largest element  A[n-1] After the 2nd iteration: Second largest element  A[n-2] Question: what is the max number of iterations, and, hence the worst case running time ? Improvement: stop the iterations earlier: for (i=1 to N-1) for (i=1 to N-2) ... for (i=1 to 1) In fact we may be lucky, and be able decrease i more aggresively

23 Bubble Sort procedure BubbleSort (Array A, int N) m = N; repeat {
newM = 1; for (i=1 to m-1) { if ( A[i-1] > A[i] ){ swap( A[i-1], A[i] ); newM = i-1; } m = newM; while m > 1

24 Bubble Sort So the worst-case running time is T(n) = O(n2)
Is the worst-case running time also (n2) ? You need to find a worst-case input of size n for which the running time is n2.

25 Find minimum, move to A[i]
Selection Sort procedure SelectSort (Array A, int N) for (i=0 to N-2) { /* find the minimum among A[i],...,A[n-1] */ /* place it in A[i] */ m = i; for (j=i+1 to N-1) if ( A[m] > A[j] ) m = j; swap(A[i], A[m]); } A[0] ... A[i] A[i+1] A[n-1] Finished Find minimum, move to A[i]

26 Selection Sort Worst case running time: T(n) = O( ?? ) T(n) = ( ?? )

27 Sorted, but not necessarily finished
Insertion Sort procedure InsertSort (Array A, int N) for (i=1 to N-1) { /* A[0], A[1], ..., A[i-1] is sorte */ /* now insert A[i] in the right place */ x = A[i]; for (j=i-1; j>0 && A[j] > x; j--) A[j+1] = A[j]; A[j] = x; } A[0] ... A[i] A[i+1] A[n-1] Sorted, but not necessarily finished insert A[i] to the left

28 Insertion Sort Worst case running time: T(n) = O( ?? ) T(n) = ( ?? )

29 Merge Sort The Merge Operation: given two sorted sequences: A[0]  A[1]  ...  A[m-1] B[0]  B[1]  ...  B[n-1] Construct another sorted sequence that is their union Merge (A[0..m-1],B[0..n-1]) i1=0, i2=0 While i1<m, i2<n If T1[i1] < T2[i2] Next is T1[i1] i1++ Else Next is T2[i2] i2++ End If End While Merging Cars by key [Aggressiveness of driver]. Most aggressive goes first. Photo from

30 Merge Sort Function MergeSort (Array A[0..n-1]) if n  1 return A
Merge(MergeSort(A[0..n/2-1]), MergeSort(A[n/2..n-1]))

31 Merge Sort Running Time
Any difference best / worse case? T(1) = b T(n) = 2T(n/2) + cn for n>1 T(n) = 2T(n/2)+cn T(n) = 4T(n/4) +cn +cn substitute T(n) = 8T(n/8)+cn+cn+cn substitute T(n) = 2kT(n/2k)+kcn inductive leap T(n) = nT(1) + cn log n where k = log n select value for k T(n) = (n log n) simplify This is the same sort of analysis as see before Here’s a function defined in terms of itself. WORK THROUGH Answer: O(n log n) Generally, then, the strategy is to keep expanding these things out until you see a pattern. Then, write the general form. Finally, sub in for the series bounds to make T(?) come out to a known value and solve all the series. Tip: Look for powers/multiples of the numbers that appear in the original equation.

32 Merge Sort Works great with lists, or files Problems with arrays:
We need a scratch array, cannot sort ‘in situ’

33 Heap Sort Recall: a heap is a tree where the min is at the root
A heap is stored in an array A[1], ..., A[n]

34 Heap Sort Start with an unsorted array A[1], ..., A[n] Build a heap
How much time does it take ? Get minimum, store in out array; repeat n times: A[0] ... A[i] A[i+1] A[n-1] B[0] ... B[i]

35 Heap Sort But then we need an extra array !
How can we do it ‘in situ’ ?

36 Heap Sort Input: unordered array A[1..N]
Build a max heap (largest element is A[1]) For i = 1 to N-1: A[N-i+1] = Delete_Max() 7 50 22 15 4 40 20 10 35 25 50 40 20 25 35 15 10 22 4 7 40 35 20 25 7 15 10 22 4 50 35 25 20 22 7 15 10 4 40 50

37 Properties of Heap Sort
Worst case time complexity O(n log n) Build_heap O(n) n Delete_Max’s for O(n log n) In-place sort – only constant storage beyond the array is needed

38 QuickSort Pick a “pivot”. Divide list into two lists:
Picture from Pick a “pivot”. Divide list into two lists: One less-than-or-equal-to pivot value One greater than pivot Sort each sub-problem recursively Answer is the concatenation of the two solutions

39 QuickSort: Array-Based Version
Pick pivot: 7 2 8 3 5 9 6 Partition with cursors 7 2 8 3 5 9 6 < > 2 goes to less-than 7 2 8 3 5 9 6 < >

40 QuickSort Partition (cont’d)
6, 8 swap less/greater-than 7 2 6 3 5 9 8 < > 3,5 less-than 9 greater-than 7 2 6 3 5 9 8 Partition done. 7 2 6 3 5 9 8

41 QuickSort Partition (cont’d)
Put pivot into final position. 5 2 6 3 7 9 8 Recursively sort each side. 2 3 5 6 7 8 9

42 QuickSort Complexity QuickSort is fast in practice, but has (N2) worst-case complexity Friday we will see why

