Sorting Importance of sorting Quicksort

Slides:



Advertisements
Similar presentations
Introduction to Algorithms Quicksort
Advertisements

Sorting As much as 25% of computing time is spent on sorting. Sorting aids searching and matching entries in a list. Sorting Definitions: –Given a list.
CSE 3101: Introduction to the Design and Analysis of Algorithms
Sorting Comparison-based algorithm review –You should know most of the algorithms –We will concentrate on their analyses –Special emphasis: Heapsort Lower.
CSE332: Data Abstractions Lecture 14: Beyond Comparison Sorting Dan Grossman Spring 2010.
§7 Quicksort -- the fastest known sorting algorithm in practice 1. The Algorithm void Quicksort ( ElementType A[ ], int N ) { if ( N < 2 ) return; pivot.
Quick Sort, Shell Sort, Counting Sort, Radix Sort AND Bucket Sort
© The McGraw-Hill Companies, Inc., Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
Using Divide and Conquer for Sorting
Spring 2015 Lecture 5: QuickSort & Selection
© Copyright 2012 by Pearson Education, Inc. All Rights Reserved. 1 Chapter 17 Sorting.
Liang, Introduction to Java Programming, Eighth Edition, (c) 2011 Pearson Education, Inc. All rights reserved Chapter 24 Sorting.
Probabilistic (Average-Case) Analysis and Randomized Algorithms Two different approaches –Probabilistic analysis of a deterministic algorithm –Randomized.
Quicksort CS 3358 Data Structures. Sorting II/ Slide 2 Introduction Fastest known sorting algorithm in practice * Average case: O(N log N) * Worst case:
Sorting Algorithms and Average Case Time Complexity
Sorting Chapter Sorting Consider list x 1, x 2, x 3, … x n We seek to arrange the elements of the list in order –Ascending or descending Some O(n.
Updated QuickSort Problem From a given set of n integers, find the missing integer from 0 to n using O(n) queries of type: “what is bit[j]
Chapter 19: Searching and Sorting Algorithms
1 Sorting Problem: Given a sequence of elements, find a permutation such that the resulting sequence is sorted in some order. We have already seen: –Insertion.
CS 171: Introduction to Computer Science II Quicksort.
September 19, Algorithms and Data Structures Lecture IV Simonas Šaltenis Nykredit Center for Database Research Aalborg University
Sorting Heapsort Quick review of basic sorting methods Lower bounds for comparison-based methods Non-comparison based sorting.
Probabilistic (Average-Case) Analysis and Randomized Algorithms Two different but similar analyses –Probabilistic analysis of a deterministic algorithm.
Lecture 25 Selection sort, reviewed Insertion sort, reviewed Merge sort Running time of merge sort, 2 ways to look at it Quicksort Course evaluations.
2 -1 Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
Tirgul 4 Sorting: – Quicksort – Average vs. Randomized – Bucket Sort Heaps – Overview – Heapify – Build-Heap.
CPSC 411, Fall 2008: Set 2 1 CPSC 411 Design and Analysis of Algorithms Set 2: Sorting Lower Bound Prof. Jennifer Welch Fall 2008.
2 -1 Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
CSE 326: Data Structures Sorting Ben Lerner Summer 2007.
Sorting Lower Bound Andreas Klappenecker based on slides by Prof. Welch 1.
The Complexity of Algorithms and the Lower Bounds of Problems
Chapter 7 (Part 2) Sorting Algorithms Merge Sort.
David Luebke 1 7/2/2015 Merge Sort Solving Recurrences The Master Theorem.
CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University1 Sorting - 3 CS 202 – Fundamental Structures of Computer Science II.
Sorting in Linear Time Lower bound for comparison-based sorting
CSE 373 Data Structures Lecture 15
1 Data Structures and Algorithms Sorting. 2  Sorting is the process of arranging a list of items into a particular order  There must be some value on.
1 Time Analysis Analyzing an algorithm = estimating the resources it requires. Time How long will it take to execute? Impossible to find exact value Depends.
Merge Sort. What Is Sorting? To arrange a collection of items in some specified order. Numerical order Lexicographical order Input: sequence of numbers.
CSC 41/513: Intro to Algorithms Linear-Time Sorting Algorithms.
Binary Heap.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
Sorting Fun1 Chapter 4: Sorting     29  9.
CS 61B Data Structures and Programming Methodology July 28, 2008 David Sun.
September 29, Algorithms and Data Structures Lecture V Simonas Šaltenis Aalborg University
1 Joe Meehean.  Problem arrange comparable items in list into sorted order  Most sorting algorithms involve comparing item values  We assume items.
1Computer Sciences Department. Book: Introduction to Algorithms, by: Thomas H. Cormen Charles E. Leiserson Ronald L. Rivest Clifford Stein Electronic:
CS 206 Introduction to Computer Science II 04 / 22 / 2009 Instructor: Michael Eckmann.
Data Structure & Algorithm Lecture 5 Heap Sort & Binary Tree JJCAO.
Priority Queues and Heaps. October 2004John Edgar2  A queue should implement at least the first two of these operations:  insert – insert item at the.
CS 2133: Data Structures Quicksort. Review: Heaps l A heap is a “complete” binary tree, usually represented as an array:
David Luebke 1 12/23/2015 Heaps & Priority Queues.
Heapsort. What is a “heap”? Definitions of heap: 1.A large area of memory from which the programmer can allocate blocks as needed, and deallocate them.
Tree Data Structures. Heaps for searching Search in a heap? Search in a heap? Would have to look at root Would have to look at root If search item smaller.
1 Heap Sort. A Heap is a Binary Tree Height of tree = longest path from root to leaf =  (lgn) A heap is a binary tree satisfying the heap condition:
Liang, Introduction to Java Programming, Ninth Edition, (c) 2013 Pearson Education, Inc. All rights reserved. 1 Chapter 25 Sorting.
David Luebke 1 2/5/2016 CS 332: Algorithms Introduction to heapsort.
Sorting Fundamental Data Structures and Algorithms Aleks Nanevski February 17, 2004.
Data Structures and Algorithms Instructor: Tesfaye Guta [M.Sc.] Haramaya University.
CS6045: Advanced Algorithms Sorting Algorithms. Sorting So Far Insertion sort: –Easy to code –Fast on small inputs (less than ~50 elements) –Fast on nearly-sorted.
Liang, Introduction to Java Programming, Tenth Edition, (c) 2013 Pearson Education, Inc. All rights reserved. 1 Chapter 23 Sorting.
Quick-Sort 9/12/2018 3:26 PM Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia,
Quick-Sort 9/13/2018 1:15 AM Quick-Sort     2
Description Given a linear collection of items x1, x2, x3,….,xn
Heaps, Heapsort, and Priority Queues
Sub-Quadratic Sorting Algorithms
Quick-Sort 4/25/2019 8:10 AM Quick-Sort     2
Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1
Presentation transcript:

Sorting Importance of sorting Quicksort Lower bounds for comparison-based methods Heapsort Non-comparison based sorting

Why don't CS profs ever stop talking about sorting?! Computers spend more time sorting than anything else, historically 25% on mainframes. Sorting is the best studied problem in computer science, with a variety of different algorithms known. Most of the interesting ideas we encounter in the course are taught in the context of sorting, such as divide-and- conquer, randomized algorithms, and lower bounds. You should have seen most of the algorithms - we will concentrate on the analysis

Applications of Sorting Closest Pair Element Uniqueness Frequency Distribution Selection of Kth largest element Convex Hulls See next slide!

Convex Hulls

Huffman Codes If you are trying to minimize the amount of space a text file is taking up, it is silly to assign each letter the same length (i.e. one byte) code. Example: e is more common than q, a is more common than z. If we were storing English text, we would want a and e to have shorter codes than q and z. To design the best possible code, the first and most important step is to sort the characters in order of frequency of use! We might do this for 2 or 3 letters together. (Or whole words?)

Example Problems a. You are given a pile of thousands of telephone bills and thousands of checks sent in to pay the bills. Find out who did not pay. b. You are given a list containing the title, author, call number and publisher of all the books in a school library and another list of 30 publishers. Find out how many of the books in the library were published by each of those 30 companies. c. You are given all the book checkout cards used in the campus library during the past year, each of which contains the name of the person who took out the book. Determine how many distinct people checked out at least one book. Assume sorting takes O(n log n)

Quicksort Although mergesort is O( n log n ), it is difficult to implement on arrays since we need space to merge. In practice, Quicksort is the fastest sorting algorithm. Example: Pivot about 10 17 12 6 23 19 8 5 10 - before 6 8 5 10 17 12 23 19 - after The pivot point is now in the correctly sorted position, and all other numbers are in the relative correct position, before or after.

Quicksort Walkthrough 17 12 6 23 19 8 5 10 6 8 5 10 17 12 23 19 5 6 8 17 12 19 23 6 8 12 17 23 6 17 5 6 8 10 12 17 19 23

Pseudocode Sort(A) { Quicksort(A,1,n); } Quicksort(A, low, high) { if (low < high) { pivotLocation = Partition(A,low,high); Quicksort(A,low, pivotLocation - 1); Quicksort(A, pivotLocation+1, high);

Pseudocode int Partition(A,low,high) { pivot = A[high]; leftwall = low-1; for i = low to high-1 { if (A[i] < pivot) then { leftwall = leftwall+1; swap(A[i],A[leftwall]); } swap(A[high],A[leftwall+1]); return leftwall+1;

Best Case for Quicksort

Worst Case for Quicksort

Intuition: The Average Case 0 n/4 n/2 3n/4 n Anywhere in the middle half is a decent partition (3/4)h n = 1 => n = (4/3)h log(n) = h log(4/3) h = log(n) / log(4/3) < 2 log(n)

What have we shown? At most 2log(n) decent partitions suffices to sort an array of n elements. But if we just take arbitrary pivot points, how often will they, in fact, be decent? Since any number ranked between n/4 and 3n/4 would make a decent pivot, we get one half the time on average. Therefore, on average we will need 2 x 2log(n) = 4log(n) partitions to guarantee sorting.

Quicksort in the real world…

Average-case Analysis Let X denote the random variable that represents the total number of comparisons performed Let Xij = probability that the ith smallest element and jth smallest element are compared E[X] = Si=1 to n-1 Sj=i+1 to n Xij

Computing Xij Observation Xij = 2/(j-i+1) All comparisons are between a pivot element and another element If an item k is chosen as pivot where i < k < j, then items i and j will not be compared Xij = 2/(j-i+1) Items i or j must be chosen as a pivot before any items in interval (i..j)

Computing E[X] E[X] = Si=1 to n-1 Sj=i+1 to n 2/(j-i+1) = Si=1 to n-1 Sk=1 to n-i 2/(k+1) <= Si=1 to n-1 2 Hn-i+1 <= Si=1 to n-1 2 Hn = 2 (n-1)Hn

Avoiding worst-case Understanding quicksort’s worst-case Methods for avoiding it Pivot strategies Randomization

Understanding the worst case A B D F H J K A B D F H J A B D F H A B D F A B D A B A The worst case occur is a likely case for many applications.

Pivot Strategies Use the middle Element of the sub-array as the pivot. Use the median element of (first, middle, last) to make sure to avoid any kind of pre-sorting. What is the worst-case performance for these pivot selection mechanisms?

Randomization Techniques Make chance of worst-case run time equally small for all inputs Methods Choose pivot element randomly from range [low..high] Initially permute the array

Is Quicksort really faster than Mergesort? Since Quicksort is (n log n) and Selection Sort is (n2), there isn’t any debate about which is faster. How can we compare two (n log n) algorithms to know which one is faster? Using the RAM model and the big Oh notation, we can't! If all of the algorithms are well implemented, Quicksort is at least 2-3 times faster than any of the others, but this only has to do with implementation details.

Comparisons Best Worst Average Insertion Sort Selection Sort Bubble Sort Merge Sort Heap Sort Quick Sort

Possible reasons for not choosing quicksort What do you know about the input data? Is the data already partially sorted? Do we know the distribution of the keys? Are your keys very long or hard to compare? Is the range of possible keys very small?

Optimizing Quicksort Using randomization: guarantees never to never have worst-case time due to bad data. Median of three: Can be slightly faster than randomization for somewhat sorted data. Leave small sub-arrays for insertion sort: Insertion sort can be faster, in practice, for small values of n. Do the smaller partition first: minimize runtime memory.

Is Linear Sorting Possible? Any comparison-based sorting program can be thought of as defining a decision tree of possible executions. Show what a decision tree might look like. A binary tree with TRUE/FALSE questions.

Example Decision Tree Running the same program twice on the same permutation causes it to do exactly the same thing, but running it on different permutations of the same data causes a different sequence of comparisons to be made on each. Draw on board Insertion sort , Bubble Sort, and Quick-Sort Claim: the height of this decision tree is the worst-case complexity of sorting.

How big is the decision tree? Since different permutations of n elements requires a different sequence of steps to sort, there must be at least n! different paths from the root to leaves in the decision tree, ie. at least n! different leaves in the tree. Since a binary tree of height h has at most 2h leaves, we know that n!  2h, or h  log(n!) By inspection, n! > (n/2)n/2 since the last n/2 elements of the product are greater than n/2. Thus h > (n/2)log(n/2)

Heaps Definition Operations Heapsort Insertion Heap construction Heap extract max Heapsort

Definition A binary heap is defined to be a binary tree with a key in each node such that: : All leaves are on, at most, two adjacent levels. : All leaves on the lowest level occur to the left, and all levels except the lowest one are completely filled. : The key in root is greater than all its children, and the left and right subtrees are again binary heaps. Conditions 1 and 2 specify shape of the tree, and condition 3 the labeling of the tree. First two: Structure (perfectly balances) Third: Content.

Example Heap

Are these legal?

Partial Order Property The ancestor relation in a heap defines a partial order on its elements, which means it is reflexive, anti-symmetric, and transitive. Reflexive: x is an ancestor of itself. Anti-symmetric: if x is an ancestor of y and y is an ancestor of x, then x=y. Transitive: if x is an ancestor of y and y is an ancestor of z, x is an ancestor of z. Partial orders can be used to model hierarchies with incomplete information or equal-valued elements. Ancestor is the same as greater than or equal to. The partial order defined by the heap structure is weaker than that of the total order, which explains Why it is easier to build. Why it is less useful than sorting (but still important). Questions: What are the minimum and maximum number of elements in a heap of height h? What is the height of a heap with n elements? Where in a heap might the smallest node reside? Is an array in reverse sorted order a heap?

Insertion Operation Heaps can be constructed incrementally, by inserting new elements into the left-most open spot in the array. If the new element is greater than its parent, swap their positions and recur. The height h of an n element heap is bounded because: so, and insertions take O(log n) time First: Discuss implementation of heaps! At each step, we replace the root of a subtree by a larger one, we preserve the heap order. Questions: What do we insert into the left-most open spot in the array?

Heap Construction The bottom up insertion algorithm gives a good way to build a heap, but Robert Floyd found a better way, using a merge procedure called heapify. Given two heaps and a fresh element, they can be merged into one by making the new entry the root and trickling down. To convert an array of integers into a heap, place them all into a binary tree, and call heapify on each node. How long would this take? Sum k xk = x/(1-x)2 for x < 1

Heapify Example Try to create a heap with the entries: 5, 3, 17, 10, 84, 19, 6, 22, 9

Heap Extract Max if heap-size(A) < 1 then error “Heap Underflow”; max = A[1]; A[1] = A[heap-size(A)]; heap-size(A)--; Heapify(A, 1); return max;

Heap Sort To sort using the heap data structure, we first build the heap, and then just repeatedly extract the maximum. Build Heap = O(n) Extract Maximum = O(log n) Therefore: Heap Sort = O(n) + n O(log n) = O(n log n)

Non-comparison Based Sorting All the sorting algorithms we have seen assume binary comparisons as the basic primitive, questions of the form “is x before y?”. Suppose you were given a deck of playing cards to sort. Most likely you would set up 13 piles and put all cards with the same number in one pile. A 2 3 4 5 6 7 8 9 10 J Q K With only a constant number of cards left in each pile, you can use insertion sort to order by suit and concatenate everything together. If we could find the correct pile for each card in constant time, and each pile gets O(1) cards, this algorithm takes O(n) time.

Bucketsort Suppose we are sorting n numbers from 1 to m, where we know the numbers are approximately uniformly distributed. We can set up n buckets, each responsible for an interval of m/n numbers from 1 to m If we use an array of buckets, each item gets mapped to the right bucket in O(1) time. With uniformly distributed keys, the expected number of items per bucket is 1. Thus sorting each bucket takes O(1) time! The total effort of bucketing, sorting buckets, and concatenating the sorted buckets together is O(n). What happened to our lower bound?? 1 m/n m/n+1 2m/n 2m/n+1 3m/n … … …

Bucketsort We can use bucketsort effectively whenever we understand the distribution of the data. However, bad things happen when we assume the wrong distribution. We spent linear time distributing our items into buckets and learned nothing. Perhaps we could split the big bucket recursively, but it is not certain that we will ever win unless we understand the distribution. Problems like this are why we worry about the worst-case performance of algorithms! Such distribution techniques can be used on strings instead of just numbers. The buckets will correspond to letter ranges instead of just number ranges. The worst case ``shouldn't'' happen if we understand the distribution of our data. 1 m/n m/n+1 2m/n 2m/n+1 3m/n … … …

Real World Distributions Consider the distribution of names in a telephone book. Will there be a lot of Ofria’s? Will there be a lot of Smith’s? Will there be a lot of Zucker’s? Make sure you understand your data, or use a good worst-case or randomized algorithm!