CSE 326: Data Structures: Sorting

Slides:



Advertisements
Similar presentations
Chapter 7: Sorting Algorithms
Advertisements

1 Today’s Material Divide & Conquer (Recursive) Sorting Algorithms –QuickSort External Sorting.
TDDB56 DALGOPT-D DALG-C Lecture 8 – Sorting (part I) Jan Maluszynski - HT Sorting: –Intro: aspects of sorting, different strategies –Insertion.
TTIT33 Algorithms and Optimization – Dalg Lecture 2 HT TTIT33 Algorithms and optimization Lecture 2 Algorithms Sorting [GT] 3.1.2, 11 [LD] ,
1 Lecture 19: B-trees and Hash Tables Wednesday, November 12, 2003.
Chapter 7 (Part 2) Sorting Algorithms Merge Sort.
CSE 373 Data Structures Lecture 19
Sorting (Part II: Divide and Conquer) CSE 373 Data Structures Lecture 14.
CSE 373 Data Structures Lecture 15
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
CSCE 3110 Data Structures & Algorithm Analysis Sorting (I) Reading: Chap.7, Weiss.
HKOI 2006 Intermediate Training Searching and Sorting 1/4/2006.
Merge Sort. What Is Sorting? To arrange a collection of items in some specified order. Numerical order Lexicographical order Input: sequence of numbers.
CS 61B Data Structures and Programming Methodology July 28, 2008 David Sun.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest.
Sorting CSIT 402 Data Structures II. 2 Sorting (Ascending Order) Input ›an array A of data records ›a key value in each data record ›a comparison function.
Searching and Sorting Recursion, Merge-sort, Divide & Conquer, Bucket sort, Radix sort Lecture 5.
1 CSE 326: Data Structures A Sort of Detour Henry Kautz Winter Quarter 2002.
Review 1 Selection Sort Selection Sort Algorithm Time Complexity Best case Average case Worst case Examples.
1 Lecture 21: Hash Tables Wednesday, November 17, 2004.
CSE 326: Data Structures Lecture 23 Spring Quarter 2001 Sorting, Part 1 David Kaplan
1 CSE 326: Data Structures Sorting by Comparison Zasha Weinberg in lieu of Steve Wolfman Winter Quarter 2000.
Priority Queues and Heaps. John Edgar  Define the ADT priority queue  Define the partially ordered property  Define a heap  Implement a heap using.
Algorithm Design Techniques, Greedy Method – Knapsack Problem, Job Sequencing, Divide and Conquer Method – Quick Sort, Finding Maximum and Minimum, Dynamic.
Chapter 23 Sorting Jung Soo (Sue) Lim Cal State LA.
Advanced Sorting 7 2  9 4   2   4   7
Prof. U V THETE Dept. of Computer Science YMA
Chapter 11 Sorting Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and Mount.
Sorting.
Lecture 2 Sorting.
Sorting With Priority Queue In-place Extra O(N) space
Fundamental Data Structures and Algorithms
Subject Name: Design and Analysis of Algorithm Subject Code: 10CS43
Lecture 21: Hash Tables Monday, February 28, 2005.
Chapter 4 Divide-and-Conquer
Chapter 7 Sorting Spring 14
Sorting.
Chapter 4: Divide and Conquer
Quick Sort (11.2) CSE 2011 Winter November 2018.
CSC212 Data Structure - Section RS
Ch 7: Quicksort Ming-Te Chi
Data Structures Review Session
Divide-and-Conquer 7 2  9 4   2   4   7
8/04/2009 Many thanks to David Sun for some of the included slides!
Topic: Divide and Conquer
Sub-Quadratic Sorting Algorithms
CSE373: Data Structure & Algorithms Lecture 21: Comparison Sorting
Database Systems (資料庫系統)
CSE 326: Data Structures Sorting
CSE 332: Data Abstractions Sorting I
CS 3343: Analysis of Algorithms
Divide-and-Conquer 7 2  9 4   2   4   7
CSE 373 Data Structures and Algorithms
Monday, 5/13/2002 Hash table indexes, query optimization
Wednesday, 5/8/2002 Hash table indexes, physical operators
Topic: Divide and Conquer
CSE 326: Data Structures: Sorting
CSCE 3110 Data Structures & Algorithm Analysis
CSCE 3110 Data Structures & Algorithm Analysis
CS4433 Database Systems Indexing.
The Selection Problem.
CSE 332: Sorting II Spring 2016.
CSCE 3110 Data Structures & Algorithm Analysis
CS203 Lecture 15.
Divide and Conquer Merge sort and quick sort Binary search
Advanced Sorting Methods: Shellsort
Divide-and-Conquer 7 2  9 4   2   4   7
Sorting Popular algorithms:
Presentation transcript:

CSE 326: Data Structures: Sorting Lecture 13: Wednesday, Feb 5, 2003

Today Finish extensible hash tables Sorting Read Chapter 7 ! Will take several lectures Read Chapter 7 ! Except Shellsort (7.4)

Hash Tables on Secondary Storage (Disks) Main differences: One bucket = one block, hence may hold multiple keys Open chaining: use overflow blocks when needed Closed chaining never used

Hash Table Example Assume 1 bucket (block) stores 2 keys + pointers h(e)=0 h(b)=h(f)=1 h(g)=2 h(a)=h(c)=3 e b f g a c 1 2 3

Searching in a Hash Table Search for a: Compute h(a)=3 Read bucket 3 1 disk access e b f g a c 1 2 3

Insertion in Hash Table Place in right bucket, if space E.g. h(d)=2 e b f g d a c 1 2 3

Insertion in Hash Table Create overflow block, if no space E.g. h(k)=1 More over- flow blocks may be needed e b f g d a c k 1 2 3

Hash Table Performance Excellent, if no overflow blocks Degrades considerably when number of keys exceeds the number of buckets (I.e. many overflow blocks).

Extensible Hash Table Allows has table to grow, to avoid performance degradation Assume a hash function h that returns numbers in {0, …, 2k – 1} Start with n = 2i << 2k , only look at first i most significant bits

Extensible Hash Table E.g. i=1, n=2i=2, k=4 Note: we only look at the first bit (0 or 1) i=1 0(010) 1 1 1(011) 1

Insertion in Extensible Hash Table 0(010) 1 1 1(011) 1(110) 1

Insertion in Extensible Hash Table Now insert 1010 Need to extend table, split blocks i becomes 2 i=1 0(010) 1 1 1(011) 1(110), 1(010) 1

Insertion in Extensible Hash Table 0(010) 1 00 01 10(11) 10(10) 2 10 11 11(10) 2

Insertion in Extensible Hash Table Now insert 0000, then 0101 Need to split block i=2 0(010) 0(000), 0(101) 1 00 01 10(11) 10(10) 2 10 11 11(10) 2

Insertion in Extensible Hash Table After splitting the block 00(10) 00(00) 2 i=2 01(01) 2 00 01 10(11) 10(10) 2 10 11 11(10) 2

Extensible Hash Table How many buckets (blocks) do we need to touch after an insertion ? How many entries in the hash table do we need to touch after an insertion ? Only one block: that which overflowed But we need to copy all hash table entries from the old table to the new table.

Performance Extensible Hash Table No overflow blocks: access always O(1) More precisely: exactly one disk I/O BUT: Extensions can be costly and disruptive After an extension table may no longer fit in memory

Sorting Perhaps the most common operation in programs The authoritative text: D. Knuth, The Art of Computer Programming, Vol. 3

Material to be Covered Sorting by comparision: Bubble Sort Selection Sort Merge Sort QuickSort Efficient list-based implementations Formal analysis Theoretical limitations on sorting by comparison Sorting without comparing elements Sorting and the memory hierarchy

Bubble Sort Idea We want A[1]  A[2]  …  A[N] Bubble sort idea: If A[i-1] > A[i] then swap A[i-1] and A[i] Do this for i = 1, …, n-1 Repeat this until it’s sorted

Bubble Sort procedure BubbleSort (Array A, int N) repeat { isSorted = true; for (i=1 to N-1) { if ( A[i-1] > A[i] ){ swap( A[i-1], A[i] ); isSorted = false; } until isSorted

Bubble Sort Improvements After the 1st iteration: largest element  A[n-1] After the 2nd iteration: Second largest element  A[n-2] Question: what is the max number of iterations, and, hence the worst case running time ? Improvement: stop the iterations earlier: for (i=1 to N-1) for (i=1 to N-2) ... for (i=1 to 1) In fact we may be lucky, and be able decrease i more aggresively

Bubble Sort procedure BubbleSort (Array A, int N) m = N; repeat { newM = 1; for (i=1 to m-1) { if ( A[i-1] > A[i] ){ swap( A[i-1], A[i] ); newM = i-1; } m = newM; while m > 1

Bubble Sort So the worst-case running time is T(n) = O(n2) Is the worst-case running time also (n2) ? You need to find a worst-case input of size n for which the running time is n2.

Find minimum, move to A[i] Selection Sort procedure SelectSort (Array A, int N) for (i=0 to N-2) { /* find the minimum among A[i],...,A[n-1] */ /* place it in A[i] */ m = i; for (j=i+1 to N-1) if ( A[m] > A[j] ) m = j; swap(A[i], A[m]); } A[0] ... A[i] A[i+1] A[n-1] Finished Find minimum, move to A[i]

Selection Sort Worst case running time: T(n) = O( ?? ) T(n) = ( ?? )

Sorted, but not necessarily finished Insertion Sort procedure InsertSort (Array A, int N) for (i=1 to N-1) { /* A[0], A[1], ..., A[i-1] is sorte */ /* now insert A[i] in the right place */ x = A[i]; for (j=i-1; j>0 && A[j] > x; j--) A[j+1] = A[j]; A[j] = x; } A[0] ... A[i] A[i+1] A[n-1] Sorted, but not necessarily finished insert A[i] to the left

Insertion Sort Worst case running time: T(n) = O( ?? ) T(n) = ( ?? )

Merge Sort The Merge Operation: given two sorted sequences: A[0]  A[1]  ...  A[m-1] B[0]  B[1]  ...  B[n-1] Construct another sorted sequence that is their union Merge (A[0..m-1],B[0..n-1]) i1=0, i2=0 While i1<m, i2<n If T1[i1] < T2[i2] Next is T1[i1] i1++ Else Next is T2[i2] i2++ End If End While Merging Cars by key [Aggressiveness of driver]. Most aggressive goes first. Photo from http://www.nrma.com.au/inside-nrma/m-h-m/road-rage.html

Merge Sort Function MergeSort (Array A[0..n-1]) if n  1 return A Merge(MergeSort(A[0..n/2-1]), MergeSort(A[n/2..n-1]))

Merge Sort Running Time Any difference best / worse case? T(1) = b T(n) = 2T(n/2) + cn for n>1 T(n) = 2T(n/2)+cn T(n) = 4T(n/4) +cn +cn substitute T(n) = 8T(n/8)+cn+cn+cn substitute T(n) = 2kT(n/2k)+kcn inductive leap T(n) = nT(1) + cn log n where k = log n select value for k T(n) = (n log n) simplify This is the same sort of analysis as see before Here’s a function defined in terms of itself. WORK THROUGH Answer: O(n log n) Generally, then, the strategy is to keep expanding these things out until you see a pattern. Then, write the general form. Finally, sub in for the series bounds to make T(?) come out to a known value and solve all the series. Tip: Look for powers/multiples of the numbers that appear in the original equation.

Merge Sort Works great with lists, or files Problems with arrays: We need a scratch array, cannot sort ‘in situ’

Heap Sort Recall: a heap is a tree where the min is at the root A heap is stored in an array A[1], ..., A[n]

Heap Sort Start with an unsorted array A[1], ..., A[n] Build a heap How much time does it take ? Get minimum, store in out array; repeat n times: A[0] ... A[i] A[i+1] A[n-1] B[0] ... B[i]

Heap Sort But then we need an extra array ! How can we do it ‘in situ’ ?

Heap Sort Input: unordered array A[1..N] Build a max heap (largest element is A[1]) For i = 1 to N-1: A[N-i+1] = Delete_Max() 7 50 22 15 4 40 20 10 35 25 50 40 20 25 35 15 10 22 4 7 40 35 20 25 7 15 10 22 4 50 35 25 20 22 7 15 10 4 40 50

Properties of Heap Sort Worst case time complexity O(n log n) Build_heap O(n) n Delete_Max’s for O(n log n) In-place sort – only constant storage beyond the array is needed

QuickSort Pick a “pivot”. Divide list into two lists: Picture from PhotoDisc.com Pick a “pivot”. Divide list into two lists: One less-than-or-equal-to pivot value One greater than pivot Sort each sub-problem recursively Answer is the concatenation of the two solutions

QuickSort: Array-Based Version Pick pivot: 7 2 8 3 5 9 6 Partition with cursors 7 2 8 3 5 9 6 < > 2 goes to less-than 7 2 8 3 5 9 6 < >

QuickSort Partition (cont’d) 6, 8 swap less/greater-than 7 2 6 3 5 9 8 < > 3,5 less-than 9 greater-than 7 2 6 3 5 9 8 Partition done. 7 2 6 3 5 9 8

QuickSort Partition (cont’d) Put pivot into final position. 5 2 6 3 7 9 8 Recursively sort each side. 2 3 5 6 7 8 9

QuickSort Complexity QuickSort is fast in practice, but has (N2) worst-case complexity Friday we will see why