Quicksort and Randomized Algs

Slides:



Advertisements
Similar presentations
Introduction to Algorithms Quicksort
Advertisements

David Luebke 1 4/22/2015 CS 332: Algorithms Quicksort.
Algorithms Analysis Lecture 6 Quicksort. Quick Sort Divide and Conquer.
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 6.
Spring 2015 Lecture 5: QuickSort & Selection
Analysis of Algorithms CS 477/677 Randomizing Quicksort Instructor: George Bebis (Appendix C.2, Appendix C.3) (Chapter 5, Chapter 7)
Median/Order Statistics Algorithms
Quicksort Ack: Several slides from Prof. Jim Anderson’s COMP 750 notes. UNC Chapel Hill1.
Quicksort Many of the slides are from Prof. Plaisted’s resources at University of North Carolina at Chapel Hill.
Ch. 7 - QuickSort Quick but not Guaranteed. Ch.7 - QuickSort Another Divide-and-Conquer sorting algorithm… As it turns out, MERGESORT and HEAPSORT, although.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu.
Quicksort CIS 606 Spring Quicksort Worst-case running time: Θ(n 2 ). Expected running time: Θ(n lg n). Constants hidden in Θ(n lg n) are small.
1 QuickSort Worst time:  (n 2 ) Expected time:  (nlgn) – Constants in the expected time are small Sorts in place.
Computer Algorithms Lecture 10 Quicksort Ch. 7 Some of these slides are courtesy of D. Plaisted et al, UNC and M. Nicolescu, UNR.
Chapter 7 Quicksort Ack: This presentation is based on the lecture slides from Hsu, Lih- Hsing, as well as various materials from the web.
Order Statistics The ith order statistic in a set of n elements is the ith smallest element The minimum is thus the 1st order statistic The maximum is.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
Order Statistics David Kauchak cs302 Spring 2012.
CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest.
Deterministic and Randomized Quicksort Andreas Klappenecker.
Divide-and-Conquer The most-well known algorithm design strategy: 1. Divide instance of problem into two or more smaller instances 2.Solve smaller instances.
QuickSort (Ch. 7) Like Merge-Sort, based on the three-step process of divide- and-conquer. Input: An array A[1…n] of comparable elements, the starting.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 7.
COSC 3101A - Design and Analysis of Algorithms 4 Quicksort Medians and Order Statistics Many of these slides are taken from Monica Nicolescu, Univ. of.
CSC317 1 Quicksort on average run time We’ll prove that average run time with random pivots for any input array is O(n log n) Randomness is in choosing.
CSC317 1 Hiring problem-review Cost to interview (low C i ) Cost to fire/hire … (expensive C h ) n number of candidates m hired O (c i n + c h m) Independent.
1 Chapter 7 Quicksort. 2 About this lecture Introduce Quicksort Running time of Quicksort – Worst-Case – Average-Case.
Analysis of Algorithms CS 477/677
Order Statistics.
Quick Sort Divide: Partition the array into two sub-arrays
Order Statistics Comp 122, Spring 2004.
Linear-Time Sorting Continued Medians and Order Statistics
Randomized Algorithms
Sorting We have actually seen already two efficient ways to sort:
CSC 413/513: Intro to Algorithms
Chapter 4: Divide and Conquer
Sorting We have actually seen already two efficient ways to sort:
Quick Sort (11.2) CSE 2011 Winter November 2018.
Ch 7: Quicksort Ming-Te Chi
Randomized Algorithms
Lecture 3 / 4 Algorithm Analysis
Medians and Order Statistics
Topic: Divide and Conquer
Sorting … and Insertion Sort.
Lecture No 6 Advance Analysis of Institute of Southern Punjab Multan
CS 3343: Analysis of Algorithms
Order Statistics Comp 550, Spring 2015.
Data Structures Sorting Haim Kaplan & Uri Zwick December 2014.
CS 583 Analysis of Algorithms
EE 312 Software Design and Implementation I
CS 3343: Analysis of Algorithms
CS 1114: Sorting and selection (part two)
CS 332: Algorithms Quicksort David Luebke /9/2019.
Tomado del libro Cormen
Ack: Several slides from Prof. Jim Anderson’s COMP 202 notes.
Chapter 7 Quicksort.
Quick-Sort 4/25/2019 8:10 AM Quick-Sort     2
Topic: Divide and Conquer
Data Structures & Algorithms
Order Statistics Comp 122, Spring 2004.
The Selection Problem.
Design and Analysis of Algorithms
CS203 Lecture 15.
Quicksort Quick sort Correctness of partition - loop invariant
CS200: Algorithm Analysis
Sorting We have actually seen already two efficient ways to sort:
Medians and Order Statistics
Presentation transcript:

Quicksort and Randomized Algs David Kauchak cs161 Summer 2009

Administrative Homework 2 is out Staple homework! Put your name on each page Homework 1 solution will be up soon

What does it do?

A A[r] is called the pivot Partitions the elements A[p…r-1] in to two sets, those ≤ pivot and those > pivot Operates in place Final result: p pivot r A ≤ pivot > pivot

… 5 7 1 2 8 4 3 6 … p r

i … 5 7 1 2 8 4 3 6 … p r

i j … 5 7 1 2 8 4 3 6 … p r

i j … 5 7 1 2 8 4 3 6 … p r

i j … 5 7 1 2 8 4 3 6 … p r

i j … 5 7 1 2 8 4 3 6 … p r

i j … 5 7 1 2 8 4 3 6 … p r

i j … 5 7 1 2 8 4 3 6 … p r

i j … 5 7 1 2 8 4 3 6 … p r

i j … 5 1 7 2 8 4 3 6 … p r

i j … 5 1 7 2 8 4 3 6 … p r

i j … 5 1 7 2 8 4 3 6 … p r

i j … 5 1 2 7 8 4 3 6 … p r

i j … 5 1 2 7 8 4 3 6 … p r

i j … 5 1 2 7 8 4 3 6 … p r What’s happening?

i j … 5 1 2 7 8 4 3 6 … p r ≤ pivot > pivot unprocessed

i j … 5 1 2 7 8 4 3 6 … p r

i j … 5 1 2 4 8 7 3 6 … p r

i j … 5 1 2 4 3 7 8 6 … p r

i j … 5 1 2 4 3 6 8 7 … p r

i j … 5 1 2 4 3 6 8 7 … p r

Is Partition correct? Loop Invariant: Partitions the elements A[p…r-1] in to two sets, those ≤ pivot and those > pivot? Loop Invariant:

Is Partition correct? Partitions the elements A[p…r-1] in to two sets, those ≤ pivot and those > pivot? Loop Invariant: A[p…i] ≤ A[r] and A[i+1…j-1] > A[r]

Proof by induction Base case: A[p…i] and A[i+1…j-1] are empty Loop Invariant: A[p…i] ≤ A[r] and A[i+1…j-1] > A[r] Base case: A[p…i] and A[i+1…j-1] are empty Assume it holds for j -1 two cases: A[j] > A[r] A[p…i] remains unchanged A[i+1…j] contains one additional element, A[j] which is > A[r]

Proof by induction 2nd case: A[j] ≤ A[r] i is incremented Loop Invariant: A[p…i] ≤ A[r] and A[i+1…j-1] > A[r] 2nd case: A[j] ≤ A[r] i is incremented A[i] swapped with A[j] – A[p…i] constains one additional element which is ≤ A[r] A[i+1…j-1] will contain the same elements, except the last element will be the old first element

Partition running time?

Quicksort

8 5 1 3 6 2 7 4

8 5 1 3 6 2 7 4

1 3 2 4 6 8 7 5

1 3 2 4 6 8 7 5

1 3 2 4 6 8 7 5

1 2 3 4 6 8 7 5

1 2 3 4 6 8 7 5

1 2 3 4 6 8 7 5

1 2 3 4 6 8 7 5

1 2 3 4 6 8 7 5

1 2 3 4 5 8 7 6 What happens here?

1 2 3 4 5 8 7 6

1 2 3 4 5 8 7 6

1 2 3 4 5 6 7 8

1 2 3 4 5 6 7 8

Some observations Divide and conquer: different than MergeSort – do the work before recursing How many times is/can an element selected for as a pivot? What happens after an element is selected as a pivot? 1 3 2 4 6 8 7 5

Is Quicksort correct? Assuming Partition is correct Proof by induction Base case: Quicksort works on a list of 1 element Inductive case: Assume Quicksort sorts arrays for arrays of smaller < n elements, show that it works to sort n elements If partition works correctly then we have: and, by our inductive assumption, we have: pivot A sorted sorted ≤ pivot > pivot

Running time of Quicksort? Worst case? Each call to Partition splits the array into an empty array and n-1 array

Quicksort: Worse case running time Which is? Θ(n2) When does this happen? sorted reverse sorted near sorted/reverse sorted

Quicksort best case? Each call to Partition splits the array into two equal parts O(n log n) When does this happen? random data ?

Quicksort Average case? How close to “even” splits do they need to be to maintain an O(n log n) running time? Say the Partition procedure always splits the array into some constant ratio b-to-a, e.g. 9-to-1 What is the recurrence?

cn

cn

cn

Level 0: cn Level 1: Level 2: Level 3: Level d:

What is the depth of the tree? Leaves will have different heights Want to pick the deepest leave Assume a < b

What is the depth of the tree? Assume a < b …

Cost of the tree Cost of each level ≤ cn ?

Cost of the tree Cost of each level ≤ cn Times the maximum depth Why not?

Quicksort average case: take 2 What would happen if half the time Partition produced a “bad” split and the other half “good”? “good” 50/50 split cn

Quicksort average case: take 2 cn “bad” split

Quicksort average case: take 2 cn “bad” split “good” 50/50 split recursion cost partition cost

Quicksort average case: take 2 cn “bad” split “good” 50/50 split We absorb the “bad” partition. In general, we can absorb any constant number of “bad” partitions

How can we avoid the worst case? Inject randomness into the data

What is the running time of randomized Quicksort? Worst case? Still could get very unlucky and pick “bad” partitions at every step O(n2)

randomized Quicksort: expected running time How many calls are made to Partition for an input of size n? What is the cost of a call to Partition? Cost is proportional to the number of iterations of the for loop n the total number of comparisons will give us a bound on the running time

Counting the number of comparisons Let zi of z1, z2,…, zn be the i th smallest element Let Zij be the set of elements Zij= zi, zi+1,…, zj A = [3, 9, 7, 2] z1 = 2 z2 = 3 z3 = 7 z4 = 9 Z24 =

Counting the number of comparisons Let zi of z1, z2,…, zn be the i th smallest element Let Zij be the set of elements Zij= zi, zi+1,…, zj A = [3, 9, 7, 2] z1 = 2 z2 = 3 z3 = 7 z4 = 9 Z24 = [3, 7, 9]

Counting comparisons How many times can zi be compared to zj? (indicator random variable) How many times can zi be compared to zj? At most once. Why? Total number of comparisons

Counting comparisons: average running time expectation of sums is the sum of expectations remember,

? The pivot element separates the set of numbers into two sets (those less than the pivot and those larger). Elements from one set will never be compared to elements of the other set If a pivot x is chosen zi < x < zj then zi and zj how many times will zi and zj be compared? What is the only time that zi and zj will be compared? In Zij, when will zi and zj will be compared?

? p(a,b) = p(a)+p(b) for independent events pivot is chosen randomly over j-i+1 elements

E[X] ? Let k = j-i

Memory usage? Quicksort only uses O(1) additional memory How does randomized Quicksort compare to Mergesort?

Medians The median of a set of numbers is the number such that half of the numbers are larger and half smaller How might we calculate the median of a set? Sort the numbers, then pick the n/2 element A = [50, 12, 1, 97, 30]

Medians The median of a set of numbers is the number such that half of the numbers are larger and half smaller How might we calculate the median of a set? Sort the numbers, then pick the n/2 element A = [50, 12, 1, 97, 30] A = [1, 12, 30, 50, 97] Θ(n log n)

Medians Can we do better? By sorting the data, it seems like we’re doing more work than we need to. Partition takes Θ(n) time and performs a similar operation Selection problem: find the kth smallest element of A If we could solve the selection problem, how would this help us?

Selection: divide and conquer Partition splits the data into 3 sets: those < A[q], = A[q] and > A[q] With each recursive call, we can narrow the search in to one of these sets Like binary search on unsorted data Selection(A, k, p, r) q <- Partition(A,p,r) if k = q Return A[q] else if k < q Selection(A, k, p, q-1) else // k > q Selection(A, k, q+1, r)

5 7 1 4 8 3 2 6 Selection(A, 3, 1, 8) Selection(A, k, p, r) 5 7 1 4 8 3 2 6 Selection(A, k, p, r) q <- Partition(A,p,r) if k = q Return A[q] else if k < q Selection(A, k, p, q-1) else // k > q Selection(A, k, q+1, r)

5 1 4 3 2 6 8 7 Selection(A, 3, 1, 8) Selection(A, k, p, r) 5 1 4 3 2 6 8 7 Selection(A, k, p, r) q <- Partition(A,p,r) if k = q Return A[q] else if k < q Selection(A, k, p, q-1) else // k > q Selection(A, k, q+1, r)

5 1 4 3 2 6 8 7 Selection(A, 3, 1, 8) Selection(A, k, p, r) 5 1 4 3 2 6 8 7 Selection(A, k, p, r) q <- Partition(A,p,r) if k = q Return A[q] else if k < q Selection(A, k, p, q-1) else // k > q Selection(A, k, q+1, r)

Selection(A, 3, 1, 8) 5 1 4 3 2 6 8 7 At each call, discard part of the array Selection(A, k, p, r) q <- Partition(A,p,r) if k = q Return A[q] else if k < q Selection(A, k, p, q-1) else // k > q Selection(A, k, q+1, r)

1 2 4 3 5 6 8 7 Selection(A, 3, 1, 8) Selection(A, k, p, r) 1 2 4 3 5 6 8 7 Selection(A, k, p, r) q <- Partition(A,p,r) if k = q Return A[q] else if k < q Selection(A, k, p, q-1) else // k > q Selection(A, k, q+1, r)

1 2 4 3 5 6 8 7 Selection(A, 3, 1, 8) Selection(A, k, p, r) 1 2 4 3 5 6 8 7 Selection(A, k, p, r) q <- Partition(A,p,r) if k = q Return A[q] else if k < q Selection(A, k, p, q-1) else // k > q Selection(A, k, q+1, r)

1 2 4 3 5 6 8 7 Selection(A, 3, 1, 8) Selection(A, k, p, r) 1 2 4 3 5 6 8 7 Selection(A, k, p, r) q <- Partition(A,p,r) if k = q Return A[q] else if k < q Selection(A, k, p, q-1) else // k > q Selection(A, k, q+1, r)

1 2 4 3 5 6 8 7 Selection(A, 3, 1, 8) Selection(A, k, p, r) 1 2 4 3 5 6 8 7 Selection(A, k, p, r) q <- Partition(A,p,r) if k = q Return A[q] else if k < q Selection(A, k, p, q-1) else // k > q Selection(A, k, q+1, r)

1 2 4 3 5 6 8 7 Selection(A, 3, 1, 8) Selection(A, k, p, r) 1 2 4 3 5 6 8 7 Selection(A, k, p, r) q <- Partition(A,p,r) if k = q Return A[q] else if k < q Selection(A, k, p, q-1) else // k > q Selection(A, k, q+1, r)

1 2 4 3 5 6 8 7 Selection(A, 3, 1, 8) Selection(A, k, p, r) 1 2 4 3 5 6 8 7 Selection(A, k, p, r) q <- Partition(A,p,r) if k = q Return A[q] else if k < q Selection(A, k, p, q-1) else // k > q Selection(A, k, q+1, r)

1 2 3 4 5 6 8 7 Selection(A, 3, 1, 8) Selection(A, k, p, r) 1 2 3 4 5 6 8 7 Selection(A, k, p, r) q <- Partition(A,p,r) if k = q Return A[q] else if k < q Selection(A, k, p, q-1) else // k > q Selection(A, k, q+1, r)

1 2 3 4 5 6 8 7 Selection(A, 3, 1, 8) Selection(A, k, p, r) 1 2 3 4 5 6 8 7 Selection(A, k, p, r) q <- Partition(A,p,r) if k = q Return A[q] else if k < q Selection(A, k, p, q-1) else // k > q Selection(A, k, q+1, r)

Running time of Selection? Best case? Each call to Partition throws away half the data Recurrence? O(n)

Running time of Selection? Worst case? Each call to Partition only reduces our search by 1 Recurrence? O(n2)

How can randomness help us? RSelection(A, k, p, r) q <- RPartition(A,p,r) if k = q Return A[q] else if k < q Selection(A, k, p, q-1) else // k > q Selection(A, k, q+1, r)

Running time of RSelection? Best case O(n) Worst case Still O(n2) As with Quicksort, we can get unlucky Average case?

Average case Depends on how much data we throw away at each step

Average case We’ll call a partition “good” if the pivot falls within within the 25th and 75th percentile Or, each of the partions contains at least 25% of the data What is the probability of a “good” partition? Half of the elements lie within this range and half outside, so 50% chance

Average case Recall, that like Quicksort, we can absorb the cost of a number of “bad” partitions

Average case On average, how many times will Partition need to be called before be get a good partition? Let E be the number of times Recurrence: half the time we get a good partition on the first try and half of the time, we have to try again

Average case Another look. Let p be the probability of success Let X be the number of calls required

Average case ? roll in the cost of the “bad” partitions

Average case We throw away at least ¼ of the data roll in the cost of the “bad” partitions

Randomized algorithms We used randomness to minimize the possibility of worst case running time Other uses approximation algorithms, i.e. random walks initialization, e.g. K-means clustering contention resolution reinforcement learning/interacting with an ill-defined world