Probabilistic (Average-Case) Analysis and Randomized Algorithms Two different approaches –Probabilistic analysis of a deterministic algorithm –Randomized.

Slides:



Advertisements
Similar presentations
Order Statistics Sorted
Advertisements

Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 6.
Theory of Computing Lecture 3 MAS 714 Hartmut Klauck.
CSE 3101: Introduction to the Design and Analysis of Algorithms
Lecture 3: Randomized Algorithm and Divide and Conquer II: Shang-Hua Teng.
1 Today’s Material Medians & Order Statistics – Ch. 9.
Data Structures and Algorithms (AT70.02) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: CLRS “Intro.
Divide-and-Conquer Matrix multiplication and Strassen’s algorithm Median Problem –In general finding the kth largest element of an unsorted list of numbers.
Divide and Conquer. Recall Complexity Analysis – Comparison of algorithm – Big O Simplification From source code – Recursive.
Spring 2015 Lecture 5: QuickSort & Selection
Probabilistic Analysis and Randomized Algorithm. Worst case analysis Probabilistic analysis  Need the knowledge of the distribution of the inputs Indicator.
Chapter 5. Probabilistic Analysis and Randomized Algorithms
UMass Lowell Computer Science Analysis of Algorithms Spring, 2002 Chapter 5 Lecture Randomized Algorithms Sections 5.1 – 5.3 source: textbook.
CSE 2331/5331 Topic 5: Prob. Analysis Randomized Alg.
Nattee Niparnan. Recall  Complexity Analysis  Comparison of Two Algos  Big O  Simplification  From source code  Recursive.
Quicksort CS 3358 Data Structures. Sorting II/ Slide 2 Introduction Fastest known sorting algorithm in practice * Average case: O(N log N) * Worst case:
Analysis of Algorithms CS 477/677 Randomizing Quicksort Instructor: George Bebis (Appendix C.2, Appendix C.3) (Chapter 5, Chapter 7)
Median/Order Statistics Algorithms
25 May Quick Sort (11.2) CSE 2011 Winter 2011.
Analysis of Algorithms
CSC 2300 Data Structures & Algorithms March 23, 2007 Chapter 7. Sorting.
Updated QuickSort Problem From a given set of n integers, find the missing integer from 0 to n using O(n) queries of type: “what is bit[j]
CS38 Introduction to Algorithms Lecture 7 April 22, 2014.
1 Sorting Problem: Given a sequence of elements, find a permutation such that the resulting sequence is sorted in some order. We have already seen: –Insertion.
1 Introduction to Randomized Algorithms Md. Aashikur Rahman Azim.
1 Hiring Problem and Generating Random Permutations Andreas Klappenecker Partially based on slides by Prof. Welch.
CS 253: Algorithms Chapter 7 Mergesort Quicksort Credit: Dr. George Bebis.
Probabilistic (Average-Case) Analysis and Randomized Algorithms Two different but similar analyses –Probabilistic analysis of a deterministic algorithm.
Ch. 7 - QuickSort Quick but not Guaranteed. Ch.7 - QuickSort Another Divide-and-Conquer sorting algorithm… As it turns out, MERGESORT and HEAPSORT, although.
Data Structures, Spring 2006 © L. Joskowicz 1 Data Structures – LECTURE 4 Comparison-based sorting Why sorting? Formal analysis of Quick-Sort Comparison.
Probabilistic Analysis and Randomized Algorithms
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu.
Qsort - 1 Lin / Devi Comp 122 d(n)d(n) Today’s puzzle  How far can you reach with a stack of n blocks, each 2 units long?
Quicksort!. A practical sorting algorithm Quicksort  Very efficient Tight code Good cache performance Sort in place  Easy to implement  Used in older.
Sorting Importance of sorting Quicksort
Randomized QS. Ch5, CLRS Dr. M. Sakalli, Marmara University Picture 2006, RPI.
CS2420: Lecture 11 Vladimir Kulyukin Computer Science Department Utah State University.
Design & Analysis of Algorithms COMP 482 / ELEC 420 John Greiner.
Ch. 8 & 9 – Linear Sorting and Order Statistics What do you trade for speed?
Order Statistics The ith order statistic in a set of n elements is the ith smallest element The minimum is thus the 1st order statistic The maximum is.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
BY Lecturer: Aisha Dawood. The hiring problem:  You are using an employment agency to hire a new office assistant.  The agency sends you one candidate.
Introduction to Algorithms Jiafen Liu Sept
David Luebke 1 6/3/2016 CS 332: Algorithms Analyzing Quicksort: Average Case.
Chapter 9: Selection Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic.
Order Statistics ● The ith order statistic in a set of n elements is the ith smallest element ● The minimum is thus the 1st order statistic ● The maximum.
CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest.
Randomized Algorithms CSc 4520/6520 Design & Analysis of Algorithms Fall 2013 Slides adopted from Dmitri Kaznachey, George Mason University and Maciej.
Probabilistic Analysis and Randomized Algorithm. Average-Case Analysis  In practice, many algorithms perform better than their worse case  The average.
Princeton University COS 423 Theory of Algorithms Spring 2001 Kevin Wayne Average Case Analysis.
Order Statistics(Selection Problem)
Deterministic and Randomized Quicksort Andreas Klappenecker.
QuickSort (Ch. 7) Like Merge-Sort, based on the three-step process of divide- and-conquer. Input: An array A[1…n] of comparable elements, the starting.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 7.
ICS 353: Design and Analysis of Algorithms
Introduction to Algorithms Randomized Algorithms – Ch5 Lecture 5 CIS 670.
COSC 3101A - Design and Analysis of Algorithms 4 Quicksort Medians and Order Statistics Many of these slides are taken from Monica Nicolescu, Univ. of.
Young CS 331 D&A of Algo. Topic: Divide and Conquer1 Divide-and-Conquer General idea: Divide a problem into subprograms of the same kind; solve subprograms.
CS2420: Lecture 12 Vladimir Kulyukin Computer Science Department Utah State University.
CSC317 1 Quicksort on average run time We’ll prove that average run time with random pivots for any input array is O(n log n) Randomness is in choosing.
CSC317 1 Hiring problem-review Cost to interview (low C i ) Cost to fire/hire … (expensive C h ) n number of candidates m hired O (c i n + c h m) Independent.
1 Chapter 7 Quicksort. 2 About this lecture Introduce Quicksort Running time of Quicksort – Worst-Case – Average-Case.
David Luebke 1 7/2/2016 CS 332: Algorithms Linear-Time Sorting: Review + Bucket Sort Medians and Order Statistics.
Chapter 9: Selection of Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic.
Introduction to Algorithms Prof. Charles E. Leiserson
Randomized Algorithms
Selection Selection 1 Quick-Sort Quick-Sort 10/30/16 13:52
Randomized Algorithms
Chapter 9: Selection of Order Statistics
Presentation transcript:

Probabilistic (Average-Case) Analysis and Randomized Algorithms Two different approaches –Probabilistic analysis of a deterministic algorithm –Randomized algorithm Example Problems/Algorithms –Hiring problem (Chapter 5) –Sorting/Quicksort (Chapter 7) Tools from probability theory –Indicator variables –Linearity of expectation

Probabilistic (Average-Case) Analysis Algorithm is deterministic; for a fixed input, it will run the same every time Analysis Technique –Assume a probability distribution for your inputs –Analyze item of interest over probability distribution Caveats –Specific inputs may have much worse performance –If distribution is wrong, analysis may give misleading picture

Randomized Algorithm “Randomize” the algorithm; for a fixed input, it will run differently depending on the result of random “coin tosses” Randomization examples/techniques –Randomize the order that candidates arrive –Randomly select a pivot element –Randomly select from a collection of deterministic algorithms Key points –Works well with high probability on every inputs –May fail on every input with low probability

Common Tools Indicator variables –Suppose we want to study random variable X that represents a composite of many random events –Define a collection of “indicator” variables X i that focus on individual events; typically X =  X i Linearity of expectations –Let X, Y, and Z be random variables s.t. X = Y + Z –Then E[X] = E[Y+Z] = E[Y] + E[Z] Recurrence Relations

Hiring Problem Input –A sequence of n candidates for a position –Each has a distinct quality rating that we can determine in an interview Algorithm –Current = 0; –For k = 1 to n If candidate(k) is better than Current, hire(k) and Current = k; Cost: –Number of hires Worst-case cost is n

Analyze Hiring Problem Assume a probability distribution –Each of the n! permutations is equally likely Analyze item of interest over probability distribution –Define random variables Let X = random variable corresponding to # of hires Let X i = “indicator variable” that i th interviewed candidate is hired –Value 0 if not hired, 1 if hired X =  i = 1 to n X i –E[X i ] = ? Explain why: –E[X] = E[  i = 1 to n X i ] =  i = 1 to n E[X i ] = ? Key observation: linearity of expectations

Alternative analysis Analyze item of interest over probability distribution –Let X i = indicator random variable that the i th best candidate is hired 0 if not hired, 1 if hired –Questions Relationship of X to X i ? E[X i ] = ?  i = 1 to n E[X i ] = ?

Questions What is the probability you will hire n times? What is the probability you will hire exactly twice? Biased Coin –Suppose you want to output 0 with probability ½ and 1 with probability ½ –You have a coin that outputs 1 with probability p and 0 with probability 1-p for some unknown 0 < p < 1 –Can you use this coin to output 0 and 1 fairly? –What is the expected running time to produce the fair output as a function of p?

Quicksort Algorithm Overview –Choose a pivot element –Partition elements to be sorted based on partition element –Recursively sort smaller and larger elements

Quicksort Walkthrough

Pseudocode Sort(A) { Quicksort(A,1,n); } Quicksort(A, low, high) { if (low < high) { pivotLocation = Partition(A,low,high); Quicksort(A,low, pivotLocation - 1); Quicksort(A, pivotLocation+1, high); }

Pseudocode int Partition(A,low,high) { pivot = A[high]; leftwall = low-1; for i = low to high-1 { if (A[i] < pivot) then { leftwall = leftwall+1; swap(A[i],A[leftwall]); } swap(A[high],A[leftwall+1]); } return leftwall+1; }

Worst Case for Quicksort

Average Case for Quicksort?

Intuitive Average Case Analysis 0 n/4 n/2 3n/4 n Anywhere in the middle half is a decent partition (3/4) h n = 1 => n = (4/3) h log(n) = h log(4/3) h = log(n) / log(4/3) < 2 log(n)

How many steps? At most 2log(n) decent partitions suffices to sort an array of n elements. But if we just take arbitrary pivot points, how often will they, in fact, be decent? Since any number ranked between n/4 and 3n/4 would make a decent pivot, half the pivots on average are decent. Therefore, on average, we will need 2 x 2log(n) = 4log(n) partitions to guarantee sorting.

Formal Average-case Analysis Let X denote the random variable that represents the total number of comparisons performed Let indicator variable X ij = the event that the ith smallest element and jth smallest element are compared –0 if not compared, 1 if compared X =  i=1 to n-1  j=i+1 to n X ij E[X] =  i=1 to n-1  j=i+1 to n E[X ij ]

Computing E[X ij ] E[X ij ] = probability that i and j are compared Observation –All comparisons are between a pivot element and another element –If an item k is chosen as pivot where i < k < j, then items i and j will not be compared E[X ij ] = –Items i or j must be chosen as a pivot before any items in interval (i..j)

Computing E[X] E[X] =  i=1 to n-1  j=i+1 to n E[X ij ] =  i=1 to n-1  j=i+1 to n 2/(j-i+1) =  i=1 to n-1  k=1 to n-i 2/(k+1) <=  i=1 to n-1 2 H n-i+1 <=  i=1 to n-1 2 H n = 2 (n-1)H n

Alternative Average-case Analysis Let T(n) denote the average time required to sort an n-element array –“Average” assuming each permutation equally likely Write a recurrence relation for T(n) –T(n) = Using induction, we can then prove that T(n) = O(n log n) –Requires some simplification and summation manipulation

Avoiding the worst-case Understanding quicksort’s worst-case Methods for avoiding it –Pivot strategies –Randomization

Understanding the worst case A B D F H J K A B D F H J A B D F H A B D F A B D A B A The worst case occur is a likely case for many applications.

Pivot Strategies Use the middle Element of the sub-array as the pivot. Use the median element of (first, middle, last) to make sure to avoid any kind of pre- sorting. What is the worst-case performance for these pivot selection mechanisms?

Randomized Quicksort Make chance of worst-case run time equally small for all inputs Methods –Choose pivot element randomly from range [low..high] –Initially permute the array

Hat Check Problem N people give their hats to a hat-check person at a restaurant The hat-check person returns the hats to the people randomly What is the expected number of people who get their own hat back?

Online Hiring Problem Input –A sequence of n candidates for a position –Each has a distinct quality rating that we can determine in an interview We know total ranking of interviewed candidates, but not with respect to candidates left to interview –We can hire only once Online Algorithm Hire(k,n) –Current = 0; –For j = 1 to k If candidate(j) is better than Current, Current = j; –For j = k+1 to n If candidate(j) is better than Current, hire(j) and return Questions: –What is probability we hire the best qualified candidate given k? –What is best value of k to maximize above probability?

Online Hiring Analysis Let S i be the probability that we successfully hire the best qualified candidate AND this candidate was the ith one interviewed Let M(j) = the candidate in 1 through j with highest score What needs to happen for S i to be true? –Best candidate is in position i: B i –No candidate in positions k+1 through i-1 are hired: O i –These two quantities are independent, so we can multiply their probabilities to get S i

Computing S B i = 1/n O i = k/(i-1) S i = k/(n(i-1)) S =  i>k S i = k/n  i>k 1/(i-1) is probability of success k/n (H n – H k ): roughly k/n (ln n – ln k) Maximized when k = n/e Leads to probability of success of 1/e