Fundamental Data Structures and Algorithms

Slides:



Advertisements
Similar presentations
Algorithms Analysis Lecture 6 Quicksort. Quick Sort Divide and Conquer.
Advertisements

Theory of Computing Lecture 3 MAS 714 Hartmut Klauck.
1 Divide-and-Conquer The most-well known algorithm design strategy: 1. Divide instance of problem into two or more smaller instances 2. Solve smaller instances.
ISOM MIS 215 Module 7 – Sorting. ISOM Where are we? 2 Intro to Java, Course Java lang. basics Arrays Introduction NewbieProgrammersDevelopersProfessionalsDesigners.
Spring 2015 Lecture 5: QuickSort & Selection
25 May Quick Sort (11.2) CSE 2011 Winter 2011.
Data Structures Data Structures Topic #13. Today’s Agenda Sorting Algorithms: Recursive –mergesort –quicksort As we learn about each sorting algorithm,
CS 206 Introduction to Computer Science II 12 / 09 / 2009 Instructor: Michael Eckmann.
Sorting. Introduction Assumptions –Sorting an array of integers –Entire sort can be done in main memory Straightforward algorithms are O(N 2 ) More complex.
CS 104 Introduction to Computer Science and Graphics Problems Data Structure & Algorithms (3) Recurrence Relation 11/11 ~ 11/14/2008 Yang Song.
Chapter 7 (Part 2) Sorting Algorithms Merge Sort.
A. Levitin “Introduction to the Design & Analysis of Algorithms,” 3rd ed., Ch. 5 ©2012 Pearson Education, Inc. Upper Saddle River, NJ. All Rights Reserved.
1 Lecture 16: Lists and vectors Binary search, Sorting.
Merge Sort. What Is Sorting? To arrange a collection of items in some specified order. Numerical order Lexicographical order Input: sequence of numbers.
Quicksort, Mergesort, and Heapsort. Quicksort Fastest known sorting algorithm in practice  Caveats: not stable  Vulnerable to certain attacks Average.
1 Sorting Algorithms Sections 7.1 to Comparison-Based Sorting Input – 2,3,1,15,11,23,1 Output – 1,1,2,3,11,15,23 Class ‘Animals’ – Sort Objects.
Java Methods Big-O Analysis of Algorithms Object-Oriented Programming
1 CSE 326: Data Structures A Sort of Detour Henry Kautz Winter Quarter 2002.
Review 1 Selection Sort Selection Sort Algorithm Time Complexity Best case Average case Worst case Examples.
Sorting and Lower Bounds Fundamental Data Structures and Algorithms Peter Lee February 25, 2003.
ICS201 Lecture 21 : Sorting King Fahd University of Petroleum & Minerals College of Computer Science & Engineering Information & Computer Science Department.
Sorting Fundamental Data Structures and Algorithms Aleks Nanevski February 17, 2004.
CSE 326: Data Structures Lecture 23 Spring Quarter 2001 Sorting, Part 1 David Kaplan
Sorting Lower Bounds n Beating Them. Recap Divide and Conquer –Know how to break a problem into smaller problems, such that –Given a solution to the smaller.
Fundamental Structures of Computer Science February 20, 2003 Ananda Guna Introduction to Sorting.
Sorting Fundamental Data Structures and Algorithms Klaus Sutner February 17, 2004.
Algorithm Design Techniques, Greedy Method – Knapsack Problem, Job Sequencing, Divide and Conquer Method – Quick Sort, Finding Maximum and Minimum, Dynamic.
Prof. U V THETE Dept. of Computer Science YMA
Chapter 11 Sorting Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and Mount.
Chapter 4 Divide-and-Conquer
Randomized Algorithms
Divide-and-Conquer The most-well known algorithm design strategy:
Chapter 4 Divide-and-Conquer
Divide and Conquer divide and conquer algorithms typically recursively divide a problem into several smaller sub-problems until the sub-problems are.
Divide-And-Conquer-And-Combine
Quick-Sort 9/13/2018 1:15 AM Quick-Sort     2
Chapter 7 Sorting Spring 14
Quicksort "There's nothing in your head the sorting hat can't see. So try me on and I will tell you where you ought to be." -The Sorting Hat, Harry Potter.
Divide-and-Conquer The most-well known algorithm design strategy:
Order Statistics(Selection Problem)
Chapter 4: Divide and Conquer
Quick Sort (11.2) CSE 2011 Winter November 2018.
CSC212 Data Structure - Section RS
CO 303 Algorithm Analysis And Design Quicksort
Quicksort analysis Bubble sort
Ch 7: Quicksort Ming-Te Chi
Randomized Algorithms
Data Structures Review Session
Lecture 3 / 4 Algorithm Analysis
8/04/2009 Many thanks to David Sun for some of the included slides!
Divide-and-Conquer The most-well known algorithm design strategy:
Sub-Quadratic Sorting Algorithms
Chapter 4.
CSE 326: Data Structures Sorting
Divide-and-Conquer The most-well known algorithm design strategy:
EE 312 Software Design and Implementation I
CS 3343: Analysis of Algorithms
CSE 373: Data Structures and Algorithms
CSE 373 Data Structures and Algorithms
Algorithms: Design and Analysis
CSC 380: Design and Analysis of Algorithms
CSC 380: Design and Analysis of Algorithms
Quick-Sort 4/25/2019 8:10 AM Quick-Sort     2
CSC 380: Design and Analysis of Algorithms
The Selection Problem.
Design and Analysis of Algorithms
Divide and Conquer Merge sort and quick sort Binary search
Divide-and-Conquer 7 2  9 4   2   4   7
Presentation transcript:

15-211 Fundamental Data Structures and Algorithms Sorting 15-211 Fundamental Data Structures and Algorithms Peter Lee February 20, 2003

Announcements Homework #4 is out Reading: Quiz #2 available on Tuesday Get started today! Reading: Chapter 8 Quiz #2 available on Tuesday

Objects in calendar are closer than they appear.

Introduction to Sorting

Comparison-based sorting We assume Items are stored in an array. Can be moved around in the array. Can compare any two array elements. Comparison has 3 possible outcomes: < = >

Flips and inversions An unsorted array. 24 47 13 99 105 222 inversion

Naïve sorting algorithms Bubble sort. 13 24 24 47 13 99 105 222 13 47 Keep scanning for flips, until all are fixed What is the running time?

Insertion sort Sorted sublist 105 47 13 99 30 222 47 105 13 99 30 222

Insertion sort for i = 2 to n do insert a[i] in the proper place in a[1:i-1]

How fast is insertion sort? Takes O(#inversions) steps, which is very fast if array is nearly sorted to begin with. 3 2 1 6 5 4 9 8 7 …

How many inversions? Consider the worst case: In this case, there are n n-1 … 3 2 1 0 In this case, there are n + (n-1) + (n-2) + … + 1 or

How many inversions? What about the average case? Consider: p = x1 x2 x3 … xn-1 xn For any such p, let rev(p) be its reversal. Then (xi,xj) is an inversion either in p or in rev(p). There are n(n-1)/2 pairs (xi,xj), hence the average number of inversions in a permutation is n(n-1)/4, or O(n2).

How long does it take to sort? Can we do better than O(n2)? In the worst case? In the average case

Heapsort Remember heaps: Heapsort: buildHeap has O(n) worst-case running time. deleteMin has O(log n) worst-case running time. Heapsort: Build heap. O(n) DeleteMin until empty. O(n log n) Total worst case: O(n log n)

N2 vs Nlog N

Sorting in O(n log n) Heapsort establishes the fact that sorting can be accomplished in O(n log n) worst-case running time.

Heapsort in practice The average-case analysis for heapsort is somewhat complex. In practice, heapsort consistently tends to use nearly n log n comparisons. So, while the worst case is better than n2, other algorithms sometimes work better.

Shellsort Shellsort, like insertion sort, is based on swapping inverted pairs. It achieves O(n4/3) running time. [See your book for details.]

Shellsort ... Example with sequence 3, 1. 105 47 13 99 30 222 99 47 13 Several inverted pairs fixed in one exchange. 99 30 13 105 47 222 99 30 13 105 47 222 30 99 13 105 47 222 30 13 99 105 47 222 ...

Recursive Sorting

Recursive sorting Intuitively, divide the problem into pieces and then recombine the results. If array is length 1, then done. If array is length N>1, then split in half and sort each half. Then combine the results. An example of a divide-and-conquer algorithm.

Divide-and-conquer

Divide-and-conquer

Why divide-and-conquer works Suppose the amount of work required to divide and recombine is linear, that is, O(n). Suppose also that the amount of work to complete each step is greater than O(n). Then each dividing step reduces the amount of work by greater than a linear amount, while requiring only linear work to do so.

Divide-and-conquer is big We will see several examples of divide-and-conquer in this course.

Recursive sorting If array is length 1, then done. If array is length N>1, then split in half and sort each half. Then combine the results.

Analysis of recursive sorting Suppose it takes time T(N) to sort N elements. Suppose also it takes time N to combine the two sorted arrays. Then: T(1) = 1 T(N) = 2T(N/2) + N, for N>1 Solving for T gives the running time for the recursive sorting algorithm.

Remember recurrence relations? Systems of equations such as T(1) = 1 T(N) = 2T(N/2) + N, for N>1 are called recurrence relations (or sometimes recurrence equations).

A solution A solution for is given by How to solve such equations? T(N) = 2T(N/2) + N is given by T(N) = Nlog N + N which is O(Nlog N). How to solve such equations?

Recurrence relations There are several methods for solving recurrence relations. It is also useful sometimes to check that a solution is valid. This is done by induction.

Checking a solution Base case: Inductive case: T(1) = 1log 1 + 1 = 1 Assume T(M) = Mlog M + M, all M<N. T(N) = 2T(N/2) + N

Checking a solution Base case: Inductive case: T(1) = 1log 1 + 1 = 1 Assume T(M) = Mlog M + M, all M<N. T(N) = 2T(N/2) + N

Checking a solution Base case: Inductive case: T(1) = 1log 1 + 1 = 1 Assume T(M) = Mlog M + M, all M<N. T(N) = 2T(N/2) + N = 2((N/2)(log(N/2))+N/2)+N = N(log N - log 2)+2N = Nlog N - N + 2N = Nlog N + N

Logarithms Some useful equalities. xA = B iff logxB = A log 1 = 0 log(AB) = log A + log B, if A, B > 0 log(A/B) = log A - log B, if A, B > 0 log(AB) = Blog A

Upper bounds for rec. relations Divide-and-conquer algorithms are very useful in practice. Furthermore, they all tend to generate similar recurrence relations. As a result, approximate upper-bound solutions are well-known for recurrence relations derived from divide-and-conquer algorithms.

Divide-and-Conquer Theorem Theorem: Let a, b, c  0. The recurrence relation T(1) = b T(N) = aT(N/c) + bN for any N which is a power of c has upper-bound solutions T(N) = O(N) if a<c T(N) = O(Nlog N) if a=c T(N) = O(Nlogca) if a>c a=2, b=1, c=2 for recursive sorting

Upper-bounds Corollary: Dividing a problem into p pieces, each of size N/p, using only a linear amount of work, results in an O(Nlog N) algorithm.

Upper-bounds Proof of this theorem later in the semester.

Exact solutions Recall from earlier in the semester that it is sometimes possible to derive closed-form solutions to recurrence relations. Several methods exist for doing this. As an example, consider again our current equations: T(1) = 1 T(N) = 2T(N/2) + N, for N>1

Repeated substitution method One technique is to use repeated substitution. T(N) = 2T(N/2) + N 2T(N/2) = 2(2T(N/4) + N/2) = 4T(N/4) + N T(N) = 4T(N/4) + 2N 4T(N/4) = 4(2T(N/8) + N/4) = 8T(N/8) + N T(N) = 8T(N/8) + 3N T(N) = 2kT(N/2k) + kN

Repeated substitution, cont’d We end up with T(N) = 2kT(N/2k) + kN, for all k>1 Let’s use k=log N. Note that 2log N = N. So: T(N) = NT(1) + Nlog N = Nlog N + N

Other methods There are also other methods for solving recurrence relations. For example, the “telescoping method”…

Telescoping method We start with Divide both sides by N to get T(N) = 2T(N/2) + N Divide both sides by N to get T(N)/N = (2T(N/2) + N)/N = T(N/2)/(N/2) + 1 This is valid for any N that is a power of 2, so we can write the following:

Telescoping method, cont’d Additional equations: T(N)/N = T(N/2)/(N/2) + 1 T(N/2)/(N/2) = T(N/4)/(N/4) + 1 T(N/4)/(N/4) = T(N/8)/(N/8) + 1 … T(2)/2 = T(1)/1 + 1 What happens when we sum all the left-hand and right-hand sides?

Telescoping method, cont’d Additional equations: T(N)/N = T(N/2)/(N/2) + 1 T(N/2)/(N/2) = T(N/4)/(N/4) + 1 T(N/4)/(N/4) = T(N/8)/(N/8) + 1 … T(2)/2 = T(1)/1 + 1 We are left with: T(N)/N = T(1)/1 + log(N)

Telescoping method, cont’d We are left with T(N)/N = T(1)/1 + log(N) Multiplying both sides by N gives T(N) = N log(N) + N

Mergesort

Mergesort Mergesort is the most basic recursive sorting algorithm. Divide array in halves A and B. Recursively mergesort each half. Combine A and B by successively looking at the first elements of A and B and moving the smaller one to the result array. Note: Should be a careful to avoid creating of lots of result arrays.

Mergesort

Mergesort But don’t actually want to create all of these arrays!

Mergesort Use simple indexes to perform the split. Use a single extra array to hold each intermediate result.

Analysis of mergesort Mergesort generates almost exactly the same recurrence relations shown before. T(1) = 1 T(N) = 2T(N/2) + N - 1, for N>1 Thus, mergesort is O(Nlog N).

Quicksort

Quicksort Quicksort was invented in 1960 by Tony Hoare. Although it has O(N2) worst-case performance, on average it is O(Nlog N). More importantly, it is the fastest known comparison-based sorting algorithm in practice.

Quicksort idea Choose a pivot.

Quicksort idea Choose a pivot. Rearrange so that pivot is in the “right” spot.

Quicksort idea Choose a pivot. Rearrange so that pivot is in the “right” spot. Recurse on each half and conquer!

Quicksort algorithm If array A has 1 (or 0) elements, then done. Choose a pivot element x from A. Divide A-{x} into two arrays: B = {yA | yx} C = {yA | yx} Quicksort arrays B and C. Result is B+{x}+C.

Quicksort algorithm 105 47 13 17 30 222 5 19 19 5 17 13 47 30 222 105 13 47 5 17 30 222 105 105 222

Quicksort algorithm 105 47 13 17 30 222 5 19 19 5 17 13 47 30 222 105 13 47 5 17 30 222 105 105 222 In practice, insertion sort is used once the arrays get “small enough”.

Doing quicksort in place 85 24 63 50 17 31 96 45 85 24 63 45 17 31 96 50 L R 85 24 63 45 17 31 96 50 L R 31 24 63 45 17 85 96 50 L R

Doing quicksort in place 31 24 63 45 17 85 96 50 L R 31 24 17 45 63 85 96 50 L R 31 24 17 45 63 85 96 50 R L 31 24 17 45 50 85 96 63

Quicksort is fast but hard to do Quicksort, in the early 1960’s, was famous for being incorrectly implemented many times. More about invariants next time. Quicksort is very fast in practice. Faster than mergesort because Quicksort can be done “in place”.

Informal analysis If there are duplicate elements, then algorithm does not specify which subarray B or C should get them. Ideally, split down the middle. Also, not specified how to choose the pivot. Ideally, the median value of the array, but this would be expensive to compute. As a result, it is possible that Quicksort will show O(N2) behavior.

Worst-case behavior 105 47 13 17 30 222 5 19 5 47 13 17 30 222 19 105 13 47 105 17 30 222 19 17 47 105 19 30 222 19

Analysis of quicksort Assume random pivot. T(0) = 1 T(1) = 1 T(N) = T(i) + T(N-i-1) + cN, for N>1 where I is the size of the left subarray.

Worst-case analysis If the pivot is always the smallest element, then: T(N) = T(0) + T(N-1) + cN, for N>1  T(N-1) + cN = O(N2) See the book for details on this solution.

Best-case analysis In the best case, the pivot is always the median element. In that case, the splits are always “down the middle”. Hence, same behavior as mergesort. That is, O(Nlog N).

Average-case analysis Consider the quicksort tree: 105 47 13 17 30 222 5 19

Average-case analysis The time spent at each level of the tree is O(N). So, on average, how many levels? That is, what is the expected height of the tree? If on average there are O(log N) levels, then quicksort is O(Nlog N) on average.

Average-case analysis We’ll answer this question next time…

Summary of quicksort A fast sorting algorithm in practice. Can be implemented in-place. But is O(N2) in the worst case. Average-case performance?