Sorting and selection – Part 2 Prof. Noah Snavely CS1114

Slides:



Advertisements
Similar presentations
David Luebke 1 4/22/2015 CS 332: Algorithms Quicksort.
Advertisements

Algorithms Analysis Lecture 6 Quicksort. Quick Sort Divide and Conquer.
1 More Sorting; Searching Dan Barrish-Flood. 2 Bucket Sort Put keys into n buckets, then sort each bucket, then concatenate. If keys are uniformly distributed.
CSC 213 – Large Scale Programming or. Today’s Goals  Begin by discussing basic approach of quick sort  Divide-and-conquer used, but how does this help?
Sorting and Selection, Part 1 Prof. Noah Snavely CS1114
Introduction to Algorithms
Introduction to Algorithms Jiafen Liu Sept
Quick Sort, Shell Sort, Counting Sort, Radix Sort AND Bucket Sort
Quicksort CSE 331 Section 2 James Daly. Review: Merge Sort Basic idea: split the list into two parts, sort both parts, then merge the two lists
Efficient Sorts. Divide and Conquer Divide and Conquer : chop a problem into smaller problems, solve those – Ex: binary search.
ISOM MIS 215 Module 7 – Sorting. ISOM Where are we? 2 Intro to Java, Course Java lang. basics Arrays Introduction NewbieProgrammersDevelopersProfessionalsDesigners.
Sorting and selection – Part 2 Prof. Noah Snavely CS1114
Quicksort CS 3358 Data Structures. Sorting II/ Slide 2 Introduction Fastest known sorting algorithm in practice * Average case: O(N log N) * Worst case:
25 May Quick Sort (11.2) CSE 2011 Winter 2011.
Updated QuickSort Problem From a given set of n integers, find the missing integer from 0 to n using O(n) queries of type: “what is bit[j]
Quick Sort. Quicksort Quicksort is a well-known sorting algorithm developed by C. A. R. Hoare. The quick sort is an in-place, divide- and-conquer, massively.
CS 171: Introduction to Computer Science II Quicksort.
CS 206 Introduction to Computer Science II 12 / 09 / 2009 Instructor: Michael Eckmann.
Lecture 25 Selection sort, reviewed Insertion sort, reviewed Merge sort Running time of merge sort, 2 ways to look at it Quicksort Course evaluations.
CSC 2300 Data Structures & Algorithms March 27, 2007 Chapter 7. Sorting.
CS 206 Introduction to Computer Science II 12 / 05 / 2008 Instructor: Michael Eckmann.
Quicksort. Quicksort I To sort a[left...right] : 1. if left < right: 1.1. Partition a[left...right] such that: all a[left...p-1] are less than a[p], and.
Quicksort.
CS 1114: Sorting and selection (part one) Prof. Graeme Bailey (notes modified from Noah Snavely, Spring 2009)
Sorting Chapter 10.
Median, order statistics. Problem Find the i-th smallest of n elements.  i=1: minimum  i=n: maximum  i= or i= : median Sol: sort and index the i-th.
Quicksort
Robustness and speed Prof. Noah Snavely CS1114
Blobs and Graphs Prof. Noah Snavely CS1114
Polygons and the convex hull Prof. Noah Snavely CS1114
Tirgul 4 Order Statistics Heaps minimum/maximum Selection Overview
Chapter 7 (Part 2) Sorting Algorithms Merge Sort.
CS 206 Introduction to Computer Science II 12 / 08 / 2008 Instructor: Michael Eckmann.
Sorting II/ Slide 1 Lecture 24 May 15, 2011 l merge-sorting l quick-sorting.
Sorting (Part II: Divide and Conquer) CSE 373 Data Structures Lecture 14.
(c) , University of Washington
Order Statistics The ith order statistic in a set of n elements is the ith smallest element The minimum is thus the 1st order statistic The maximum is.
Computer Science 101 Fast Searching and Sorting. Improving Efficiency We got a better best case by tweaking the selection sort and the bubble sort We.
Merge Sort. What Is Sorting? To arrange a collection of items in some specified order. Numerical order Lexicographical order Input: sequence of numbers.
Order Statistics. Order statistics Given an input of n values and an integer i, we wish to find the i’th largest value. There are i-1 elements smaller.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
Image segmentation Prof. Noah Snavely CS1114
LECTURE 40: SELECTION CSC 212 – Data Structures.  Sequence of Comparable elements available  Only care implementation has O(1) access time  Elements.
Order Statistics ● The ith order statistic in a set of n elements is the ith smallest element ● The minimum is thus the 1st order statistic ● The maximum.
Order Statistics David Kauchak cs302 Spring 2012.
Medians and blobs Prof. Ramin Zabih
Robustness and speed Prof. Noah Snavely CS1114
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 7.
Finding Red Pixels – Part 2 Prof. Noah Snavely CS1114
Computer Science 101 Fast Algorithms. What Is Really Fast? n O(log 2 n) O(n) O(n 2 )O(2 n )
CSC317 1 Quicksort on average run time We’ll prove that average run time with random pivots for any input array is O(n log n) Randomness is in choosing.
Quicksort This is probably the most popular sorting algorithm. It was invented by the English Scientist C.A.R. Hoare It is popular because it works well.
CS 1114: Finding things Prof. Graeme Bailey (notes modified from Noah Snavely, Spring 2009)
Quicksort
CSC 413/513: Intro to Algorithms
Quicksort 1.
CS 1114: Sorting and selection (part three)
Quick Sort (11.2) CSE 2011 Winter November 2018.
CO 303 Algorithm Analysis And Design Quicksort
Data Structures Review Session
Quickselect Prof. Noah Snavely CS1114
Sorting, selecting and complexity
EE 312 Software Design and Implementation I
Quicksort.
CS 1114: Sorting and selection (part two)
Quicksort.
Quicksort and selection
Quicksort.
Sorting and selection Prof. Noah Snavely CS1114
Presentation transcript:

Sorting and selection – Part 2 Prof. Noah Snavely CS1114

2 Administrivia  Assignment 1 due tomorrow by 5pm  Assignment 2 will be out tomorrow –Two parts: smaller part due next Friday, larger part due in two weeks  Quiz 2 next Thursday 2/12 –Coverage through next Tuesday (topics include running time, sorting) –Closed book / closed note

Recap from last time: sorting  If we sort an array, we can find the k th largest element in constant (O(1)) time –For all k, even for the median (k = n/2)  Sorting algorithm 1: Selection sort –Running time: O(n 2 )  Sorting algorithm 2: Quicksort –Running time: O(?) 3 ?

Quicksort 1. Pick an element (pivot) 2. Compare every element to the pivot and partition the array into elements pivot 3. Quicksort these smaller arrays separately 4

Quicksort: worst case  With a bad pivot this algorithm does quite poorly –Degrades to selection sort –Number of comparisons will be O(n 2 )  The worst case occurs when the array is already sorted –We could choose the average element instead of the first element 5

6 Quicksort: best case  With a good choice of pivot the algorithm does quite well  What is the best possible case? –Selecting the median  How many comparisons will we do? –Every time quicksort is called, we have to: % Compare all elements to the pivot

How many comparisons? (best case)  Suppose length(A) == n  Round 1: Compare n elements to the pivot … now break the array in half, quicksort the two halves …  Round 2: For each half, compare n / 2 elements to each pivot (total # comparisons = n) … now break each half into halves …  Round 3: For each quarter, compare n / 4 elements to each pivot (total # comparisons = n) 7

How many comparisons? (best case) How many rounds will this run for? …

How many comparisons? (best case)  During each round, we do a total of n comparisons  There are log n rounds  The total number of comparisons is n log n  In the best case quicksort is O(n log n) 9

Can we expect to be lucky?  Performance depends on the input  “Unlucky pivots” (worst-case) give O(n 2 ) performance  “Lucky pivots” give O(n log n) performance  For random inputs we get “lucky enough” – expected runtime on a random array is O(n log n)  Can we do better? 10

11 Back to the selection problem  Can solve with sorting  Is there a better way?  Rev. Charles L. Dodgson’s problem –Based on how to run a tennis tournament –Specifically, how to award 2 nd prize fairly

12 How many teams were in the tournament? How many games were played? Which is the second-best team?

Finding the second best team  Could use quicksort to sort the teams  Step 1: Choose one team as a pivot (say, Arizona)  Step 2: Arizona plays every team  Step 3: Put all teams worse than Arizona in Group 1, all teams better than Arizona in Group 2 (no ties allowed)  Step 4: Recurse on Groups 1 and 2  … eventually will rank all the teams … 13

Quicksort Tournament  (Note this is a bit silly – AZ plays 63 games)  This gives us a ranking of all teams –What if we just care about finding the 2 nd -best team? 14 Quicksort Tournament Step 1: Choose one team (say, Arizona) Step 2: Arizona plays every team Step 3: Put all teams worse than Arizona in Group 1, all teams better than Arizona in Group 2 (no ties allowed) Step 4: Recurse on groups 1 and 2 … eventually will rank all the teams … Quicksort Tournament Step 1: Choose one team (say, Arizona) Step 2: Arizona plays every team Step 3: Put all teams worse than Arizona in Group 1, all teams better than Arizona in Group 2 (no ties allowed) Step 4: Recurse on groups 1 and 2 … eventually will rank all the teams …

Modifying quicksort to select  Suppose Arizona beats 36 teams, and loses to 27 teams  If we just want to know the 2 nd -best team, how can we save time? 15 < 27 teams36 teams < {{ Group 2Group 1

Modifying quicksort to select – Finding the 2 nd best team 16 < 27 teams36 teams < {{ Group 2Group 1 < 10 teams16 teams < { Group 2.2 { Group 2.1 < 2 teams 7 teams <

Modifying quicksort to select – Finding the 32 nd best team 17 < 27 teams36 teams < {{ Group 2Group 1 < 15 teams20 teams < { Group 1.2 { Group Q: Which group do we visit next? - The 32 nd best team overall is the 4 th best team in Group 1

18 Find k th largest element in A (< than k-1 others) MODIFIED QUICKSORT(A, k):  Pick an element in A as the pivot, call it x  Divide A into A1 ( x)  If k < length(A3) –MODIFIED QUICKSORT (A3, k)  If k > length(A2) + length(A3) –Let j = k – [length(A2) + length(A3)] –MODIFIED QUICKSORT (A1, j)  Otherwise, return x A = [ ]

Modified quicksort  We’ll call this quickselect  Let’s consider the running time… 19 MODIFIED QUICKSORT(A, k):  Pick an element in A as the pivot, call it x  Divide A into A1 ( x)  If k < length(A3) –Find the element < k others in A3  If k > length(A2) + length(A3) –Let j = k – [length(A2) + length(A3)] –Find the element < j others in A1  Otherwise, return x MODIFIED QUICKSORT(A, k):  Pick an element in A as the pivot, call it x  Divide A into A1 ( x)  If k < length(A3) –Find the element < k others in A3  If k > length(A2) + length(A3) –Let j = k – [length(A2) + length(A3)] –Find the element < j others in A1  Otherwise, return x

20 What is the running time of:  Finding the 1 st element? –O(1) (effort doesn’t depend on input)  Finding the biggest element? –O(n) (constant work per input element)  Finding the median by repeatedly finding and removing the biggest element? –O(n 2 ) (linear work per input element)  Finding the median using quickselect? –Worst case? O(n^2) –Best case? O(n)

Quickselect – “medium” case  Suppose we split the array in half each time (i.e., happen to choose the median as the pivot)  How many comparisons will there be? 21

How many comparisons? (“medium” case)  Suppose length(A) == n  Round 1: Compare n elements to the pivot … now break the array in half, quickselect one half …  Round 2: For remaining half, compare n / 2 elements to the pivot (total # comparisons = n / 2) … now break the half in half …  Round 3: For remaining quarter, compare n / 4 elements to the pivot (total # comparisons = n / 4) 22

Number of comparisons = n + n / 2 + n / 4 + n / 8 + … + 1 = ?  The “medium” case is O(n)! 23 How many comparisons? (“medium” case)

24 Quickselect  For random input this method actually runs in linear time (beyond the scope of this class)  The worst case is still bad  Quickselect gives us a way to find the k th element without actually sorting the array!

Quickselect  It’s possible to select in guaranteed linear time (1973) –Rev. Dodgson’s problem –But the code is a little messy And the analysis is messier  Beyond the scope of this course 25

26 Back to the lightstick  By using quickselect we can find the 5% largest (or smallest) element –This allows us to efficiently compute the trimmed mean

27 What about the median?  Another way to avoid our bad data points: –Use the median instead of the mean kids, avg. weight= 40 lbs 1 Arnold, weight = 236 lbs Median: 40 lbsMean: (12 x ) / 13 = 55 lbs

Median vector  Mean, like median, was defined in 1D –For a 2D mean we used the centroid –Mean of x coordinates and y coordinates separately Call this the “mean vector” –Does this work for the median also? 28

29 What is the median vector?  In 1900, statisticians wanted to find the “geographical center of the population” to quantify westward shift  Why not the centroid? –Someone being born in San Francisco changes the centroid much more than someone being born in Indiana  What about the “median vector”? –Take the median of the x coordinates and the median of the y coordinates separately

30

31 Median vector  A little thought will show you that this doesn’t really make a lot of sense –Nonetheless, it’s a common solution, and we will implement it for CS1114 –In situations like ours it works pretty well  It’s almost never an actual datapoint  It depends upon rotations!

32 Can we do even better?  None of what we described works that well if we have widely scattered red pixels –And we can’t figure out lightstick orientation  Is it possible to do even better? –Yes!  We will focus on: –Finding “blobs” (connected red pixels) –Summarizing the shape of a blob –Computing orientation from this  We’ll need brand new tricks!

Next time: graphs 33 Next time: