Data Structures, Spring 2006 © L. Joskowicz 1 Data Structures – LECTURE 4 Comparison-based sorting Why sorting? Formal analysis of Quick-Sort Comparison.

Slides:



Advertisements
Similar presentations
Introduction to Algorithms Quicksort
Advertisements

110/6/2014CSE Suprakash Datta datta[at]cse.yorku.ca CSE 3101: Introduction to the Design and Analysis of Algorithms.
David Luebke 1 4/22/2015 CS 332: Algorithms Quicksort.
QuickSort Average Case Analysis An Incompressibility Approach Brendan Lucier August 2, 2005.
Algorithms Analysis Lecture 6 Quicksort. Quick Sort Divide and Conquer.
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
Theory of Computing Lecture 3 MAS 714 Hartmut Klauck.
Sorting Comparison-based algorithm review –You should know most of the algorithms –We will concentrate on their analyses –Special emphasis: Heapsort Lower.
Using Divide and Conquer for Sorting
Spring 2015 Lecture 5: QuickSort & Selection
Quicksort CS 3358 Data Structures. Sorting II/ Slide 2 Introduction Fastest known sorting algorithm in practice * Average case: O(N log N) * Worst case:
25 May Quick Sort (11.2) CSE 2011 Winter 2011.
Updated QuickSort Problem From a given set of n integers, find the missing integer from 0 to n using O(n) queries of type: “what is bit[j]
1 Sorting Problem: Given a sequence of elements, find a permutation such that the resulting sequence is sorted in some order. We have already seen: –Insertion.
Quicksort Many of the slides are from Prof. Plaisted’s resources at University of North Carolina at Chapel Hill.
September 19, Algorithms and Data Structures Lecture IV Simonas Šaltenis Nykredit Center for Database Research Aalborg University
Comp 122, Spring 2004 Lower Bounds & Sorting in Linear Time.
CSC 2300 Data Structures & Algorithms March 27, 2007 Chapter 7. Sorting.
DAST, Spring © L. Joskowicz 1 Data Structures – LECTURE 1 Introduction Motivation: algorithms and abstract data types Easy problems, hard problems.
© 2004 Goodrich, Tamassia Sorting Lower Bound1. © 2004 Goodrich, Tamassia Sorting Lower Bound2 Comparison-Based Sorting (§ 10.3) Many sorting algorithms.
Data Structures, Spring 2004 © L. Joskowicz 1 Data Structures – LECTURE 3 Recurrence equations Formulating recurrence equations Solving recurrence equations.
Data Structures, Spring 2004 © L. Joskowicz 1 Data Structures – LECTURE 7 Heapsort and priority queues Motivation Heaps Building and maintaining heaps.
Data Structures, Spring 2006 © L. Joskowicz 1 Data Structures – LECTURE 3 Recurrence equations Formulating recurrence equations Solving recurrence equations.
TTIT33 Algorithms and Optimization – Dalg Lecture 2 HT TTIT33 Algorithms and optimization Lecture 2 Algorithms Sorting [GT] 3.1.2, 11 [LD] ,
Quicksort CIS 606 Spring Quicksort Worst-case running time: Θ(n 2 ). Expected running time: Θ(n lg n). Constants hidden in Θ(n lg n) are small.
Divide and Conquer Sorting
Sorting Lower Bound Andreas Klappenecker based on slides by Prof. Welch 1.
DAST 2005 Week 4 – Some Helpful Material Randomized Quick Sort & Lower bound & General remarks…
David Luebke 1 7/2/2015 Merge Sort Solving Recurrences The Master Theorem.
1 QuickSort Worst time:  (n 2 ) Expected time:  (nlgn) – Constants in the expected time are small Sorts in place.
DAST, Spring © L. Joskowicz 1 Data Structures – LECTURE 1 Introduction Motivation: algorithms and abstract data types Easy problems, hard problems.
Sorting Lower Bound1. 2 Comparison-Based Sorting (§ 4.4) Many sorting algorithms are comparison based. They sort by making comparisons between pairs of.
Computer Algorithms Lecture 10 Quicksort Ch. 7 Some of these slides are courtesy of D. Plaisted et al, UNC and M. Nicolescu, UNR.
Lower Bounds for Comparison-Based Sorting Algorithms (Ch. 8)
Computer Algorithms Lecture 11 Sorting in Linear Time Ch. 8
Ch. 8 & 9 – Linear Sorting and Order Statistics What do you trade for speed?
Instructor Neelima Gupta Introduction to some tools to designing algorithms through Sorting Iterative Divide and Conquer.
Order Statistics The ith order statistic in a set of n elements is the ith smallest element The minimum is thus the 1st order statistic The maximum is.
C++ Programming: Program Design Including Data Structures, Fourth Edition Chapter 19: Searching and Sorting Algorithms.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
Sorting Fun1 Chapter 4: Sorting     29  9.
CS 61B Data Structures and Programming Methodology July 28, 2008 David Sun.
September 29, Algorithms and Data Structures Lecture V Simonas Šaltenis Aalborg University
Fall 2015 Lecture 4: Sorting in linear time
Introduction to Algorithms Jiafen Liu Sept
Data Structures Using C++ 2E Chapter 10 Sorting Algorithms.
David Luebke 1 6/3/2016 CS 332: Algorithms Analyzing Quicksort: Average Case.
Mudasser Naseer 1 11/5/2015 CSC 201: Design and Analysis of Algorithms Lecture # 8 Some Examples of Recursion Linear-Time Sorting Algorithms.
Chapter 18: Searching and Sorting Algorithms. Objectives In this chapter, you will: Learn the various search algorithms Implement sequential and binary.
QuickSort (Ch. 7) Like Merge-Sort, based on the three-step process of divide- and-conquer. Input: An array A[1…n] of comparable elements, the starting.
CSCE 411H Design and Analysis of Algorithms Set 10: Lower Bounds Prof. Evdokia Nikolova* Spring 2013 CSCE 411H, Spring 2013: Set 10 1 * Slides adapted.
David Luebke 1 2/19/2016 Priority Queues Quicksort.
QuickSort. Yet another sorting algorithm! Usually faster than other algorithms on average, although worst-case is O(n 2 ) Divide-and-conquer: –Divide:
Sorting Lower Bounds n Beating Them. Recap Divide and Conquer –Know how to break a problem into smaller problems, such that –Given a solution to the smaller.
SORTING AND ASYMPTOTIC COMPLEXITY Lecture 13 CS2110 – Fall 2009.
Computer Sciences Department1. Sorting algorithm 4 Computer Sciences Department3.
CS6045: Advanced Algorithms Sorting Algorithms. Sorting So Far Insertion sort: –Easy to code –Fast on small inputs (less than ~50 elements) –Fast on nearly-sorted.
Sorting and Runtime Complexity CS255. Sorting Different ways to sort: –Bubble –Exchange –Insertion –Merge –Quick –more…
1 Chapter 7 Quicksort. 2 About this lecture Introduce Quicksort Running time of Quicksort – Worst-Case – Average-Case.
Chapter 11 Sorting Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and Mount.
David Kauchak cs062 Spring 2010
CSC 413/513: Intro to Algorithms
CO 303 Algorithm Analysis And Design Quicksort
Ch 7: Quicksort Ming-Te Chi
Lecture 3 / 4 Algorithm Analysis
CS 583 Analysis of Algorithms
David Kauchak cs302 Spring 2012
The Selection Problem.
Algorithms CSCI 235, Spring 2019 Lecture 17 Quick Sort II
Presentation transcript:

Data Structures, Spring 2006 © L. Joskowicz 1 Data Structures – LECTURE 4 Comparison-based sorting Why sorting? Formal analysis of Quick-Sort Comparison sorting: lower bound Summary of comparison-sorting algorithms

Data Structures, Spring 2006 © L. Joskowicz 2 Sorting Definition Input: A sequence of n numbers A = (a 1, a 2, …, a n ) Output: A permutation (reordering) (a 1,…, a’ n ) such that a’ 1 ≤ … ≤ a’ n Why sorting? –Fundamental problem in Computer Science –Many algorithms use it as a key subroutine –Wide variety with a rich set of techniques –Known lower bounds, asymptotically optimal –Many programming and implementation issues come up!

Data Structures, Spring 2006 © L. Joskowicz 3 Sorting algorithms Two types of sorting algorithms: 1.Comparison sorting: the basic operation is the comparison between two elements: a i ≤ a j –Merge-Sort, Insertion-Sort, Bubble-Sort –Quick-Sort: analysis with recurrence equations –Lower bounds for comparison sorting: T(n) = Ω(n lg n) and S(n) = Ω(n) –Heap Sort with priority queues (later) 2.Non comparison-based: does not use comparisons! –Requires additional assumptions –Sorting in linear time: T(n) = Ω(n) and S(n) = Ω(n)

Data Structures, Spring 2006 © L. Joskowicz 4 Quick-Sort Uses a “Divide-and-Conquer” strategy: Split A[Left..Right] into A[Left..Middle –1 ] and A[Middle+1..Right] Elements of A[Left..Middle –1] are smaller or equal than those in A[Middle+1..Right] Sort each part recursively Quick-Sort(A, Left, Right) 1.if Left < Right then do 2. Middle  Partition(A, Left, Right) 3. Quick-Sort(A, Left, Middle –1) 4. Quick-Sort(A, Middle +1, Right)

Data Structures, Spring 2006 © L. Joskowicz 5 Partition Rearranges the array and returns the partitioning index The partition is the leftmost element larger than the last Partition(A, Left, Right) 1. Pivot  A[Right] 2. i  Left – 1 3. for j  Left to Right–1 4. do if (A[j] ≤ Pivot) 5. then i  i Exchange(A[i], A[j]) 7. Exchange (A[i+1], A[Right]) /* put pivot in place 8. return i +1

Data Structures, Spring 2006 © L. Joskowicz 6 Example (1) 1 st iteration L j iR Pivot L i j R j R j R 2 swapped with itself

Data Structures, Spring 2006 © L. Joskowicz 7 Example (2) L R Pivot j i L j i R L j i R L i R 1 and 8 swapped 3 and 7 swapped

Data Structures, Spring 2006 © L. Joskowicz 8 Example (3) Pivot L i R Left listRight list 2 nd iteration Unrestricted list L i R Left list A[i] ≤ Pivot Right list A[i] > Pivot general pattern Pivot j

Data Structures, Spring 2006 © L. Joskowicz 9 Quick-Sort complexity The complexity of Quick-Sort depends on whether the partitioning is balanced or unbalanced, which depends on which elements are used for partitioning 1.Unbalanced partition: there is no partition, so the sub-problems are of size n – 1 and 0. 2.Perfect partition: the partition is always in the middle, so the sub-problems are both of size ≤ n/2. 3.Balanced partition: the partition is somewhere in the middle, so the sub-problems are of size n – k and k. Let us study each case separately!

Data Structures, Spring 2006 © L. Joskowicz 10 Unbalanced partition The recurrence equation is: T(n) = T(n – 1) + T(0) + Θ(n)

Data Structures, Spring 2006 © L. Joskowicz 11 Perfect partition The recurrence equation is: T(n) ≤ T(n/2) + T(n/2) + Θ(n)

Data Structures, Spring 2006 © L. Joskowicz 12 General case The recurrence equation is: Average case is somewhere between unbalanced and perfect partition: which one dominates?

Data Structures, Spring 2006 © L. Joskowicz 13 Example: 9-to-1 proportional split Suppose that the partitioning algorithm always produces a 9-to-1 proportional split. The complexity is: T(n) = T(n/10) + T(9n/10) + Θ(n) At every level, the boundary condition is reached at depth log 10 n with cost Θ(n). The recursion terminates at depth log 10/9 n Therefore, the complexity is T(n) = O(n lg n) In fact, this is true for any proportional split!

Data Structures, Spring 2006 © L. Joskowicz 14 Worst-case analysis: proof (1) Claim: Proof: Base of induction:True for n=1. Induction step: Assume for n < n’, and prove for n’.

Data Structures, Spring 2006 © L. Joskowicz 15 Worst-case analysis: proof (2) To prove the claim, we need to show that this is smaller than, or equivalently that: Since q(n’-q) is always greater than n/2, as can be easily verified by checking the two cases: or we can pick c such that the inequality holds.

Data Structures, Spring 2006 © L. Joskowicz 16 Average case complexity We must first define what is an average case The behavior is determined by the relative ordering of the elements, not by the elements themselves. Thus, we are interested in the average of all permutations, where each permutation is equally likely to appear (uniformly random input). The average complexity is the number of steps averaged over a uniformly random input. The complexity is determined by the number of “bad splits” and the number of “good splits”.

Data Structures, Spring 2006 © L. Joskowicz 17 Bad splits and good splits -- intuition n 0 n –1 (n –1)/2 –1(n –1)/2 Alternate bad splitGood split Θ(n)Θ(n) Θ(n)Θ(n) In both cases, the complexity is Θ(n). Thus the bad split was “absorbed” by a good one! n (n –1)/2

Data Structures, Spring 2006 © L. Joskowicz 18 Randomization and average complexity One way of studying the average case analysis is to analyze the performance of a randomized version of the algorithm. In the randomized version, choices are made with a uniform probability, and this mimicks input generality – essentially, we reduce the chances of hitting the worst input! Randomization ensures that the performance is good without making assumptions on the input Randomness is one of the most important concepts and tools in modern Computer Science!

Data Structures, Spring 2006 © L. Joskowicz 19 Randomized Quick-Sort Randomized Complexity: The number of steps, (for the WORST input !) averaged over the random choices of the algorithm. For Quick-Sort, the pivot determines the number of good and bad splits We chose the leftmost element to select a pivot. What if we choose instead any element randomly? In Partition, use Pivot  A[Random(Left,Right)] instead of Pivot  A[Left] Note that the algorithm remains correct!

Data Structures, Spring 2006 © L. Joskowicz 20 Randomized complexity Randomized-case recurrence: The pivot is equally likely to be in any place, and since there are n places, each case occurs in 1/n of the inputs. We get: This is “Recurrence with Full History”, since it depends on all previous sizes of the problem. It can be proven, using methods which we will not get into this time, that the solution for this recurrence satisfies:

Data Structures, Spring 2006 © L. Joskowicz 21 Sorting with comparisons The basic operation of all the sorting algorithms we have seen so far is the comparison between two elements: a i ≤ a j The sorted order they determine is based only on comparisons between the input elements! We would like to prove that any comparison sorting algorithm must make Ω(n lg n) comparisons in the worst case to sort n elements (lower bound). Sorting without comparisons takes Ω(n) in the worst case, but we must make assumptions about the input.

Data Structures, Spring 2006 © L. Joskowicz 22 Comparison sorting – lower bound We want to prove a lower bound (Ω) on the worst-case complexity sorting for ANY sorting algorithm that uses comparisons. We will use the decision tree model to evaluate the number of comparisons that are needed in the worst case. Every algorithm A has its own decision tree T, depending on how it does the comparisons between elements. The length of the longest path from the root to the leaves in this tree T will determine the maximum number of comparisons that the algorithm must perform.

Data Structures, Spring 2006 © L. Joskowicz 23 Decision tree for 3 elements

Data Structures, Spring 2006 © L. Joskowicz 24 Decision tree for 3 elements π( A)=(6,7,9) 9 > 6 (7,9,6) 7 ≤ 9 7 > 6 Longest path: 3

Data Structures, Spring 2006 © L. Joskowicz 25 Decision trees A decision tree is a full binary tree that represents the comparisons between elements that are performed by a particular algorithm. The tree has internal nodes, leaves, and branches: –Internal node: two indices i:j for 1 ≤ i, j ≤ n –Leaf: a permutation of the input π(1), … π(n) –Branches: result of a comparison a i ≤ a j (left) or a i > a j (right)

Data Structures, Spring 2006 © L. Joskowicz 26 Paths in decision trees The execution of sorting algorithm A on input I corresponds to tracing a path in T from the root to a leaf Each internal node is associated with a yes/no question, regarding the input, and the two edges that are coming out of it are associated with one of the two possible answers to the question. The leaves are associated with one possible outcome of the tree, and no edge is coming out of them. At the leaf, the permutation π is the one that sorts the elements!

Data Structures, Spring 2006 © L. Joskowicz 27 Decision tree for 3 elements π( A)=(6,7,9) 9 > 6 (7,9,6) 7 ≤ 9 7 > 6 Longest path: 3

Data Structures, Spring 2006 © L. Joskowicz 28 Decision tree computation The computation for an input starts at the root, and progresses down the tree from one node to the next according to the answers to the questions at the nodes. The computation ends when we get to a leaf. ANY correct algorithm MUST be able to produce each permutation of the input. There are at most n! permutations and they must all appear in the leafs of the tree.

Data Structures, Spring 2006 © L. Joskowicz 29 Worst case complexity The worst-case number of comparisons is the length of the longest root-to-leaf path in the decision tree. The lower bound on the length of the longest path for a given algorithm gives a lower bound on the worst- case number of comparisons the algorithm requires. Thus, finding a lower bound on the length of the longest path for a decision tree based on comparisons provided a lower bound on the worst case complexity of comparison based sorting algorithms!

Data Structures, Spring 2006 © L. Joskowicz 30 Comparison-based sorting algorithms Any comparison-based sorting algorithm can be described by a decision tree T. The number of leaves in the tree of any comparison based sorting algorithm must be at least n!, since the algorithm must give a correct answer to every possible input, and there are n! possible answers. Why “at least”? Because there might be more than one leaf with the same answer, corresponding to different ways the algorithm treats different inputs.

Data Structures, Spring 2006 © L. Joskowicz 31 Length of the longest path (1) n! different possible answers. Consider all full trees with n! leaves. In each one, consider the longest path. Let d be the depth (height) of the tree. The minimum length of such longest path must be such that n! ≤ 2 d Therefore, log (n!) ≤ log (2 d ) = d Quick check: (n/2) (n/2) ≤ n! ≤ n n (n/2) log (n/2) ≤ log (n!) ≤ n log n log (n!) = Θ(n log n)

Data Structures, Spring 2006 © L. Joskowicz 32 Length of the longest path (2) Claim: Proof: On the other hand: This is the lower bound on the number of comparisons in any comparison-based sorting algorithm. for n ≥ 2

Data Structures, Spring 2006 © L. Joskowicz 33 Complexity of comparison-sorting algorithms SpaceWorst case Best caseAverage case Random. case Bubble-SortO(n)O(n)O(n 2 )O(n)O(n)O(n2)O(n2) Insertion-SortO(n)O(n)O(n2)O(n2)O(n)O(n)O(n2)O(n2)O(n2)O(n2) Merge-SortO(n lg n) --- Quick-SortO(n)O(n)O(n2)O(n2)O(n lg n) Lower bounds for comparison sorting is T(n) = Ω(n lg n) and S(n) = Ω(n) for worst and average case, deterministic and randomized algorithms.