Sorting We have actually seen already two efficient ways to sort:

Slides:



Advertisements
Similar presentations
Analysis of Algorithms CS 477/677 Linear Sorting Instructor: George Bebis ( Chapter 8 )
Advertisements

Sorting Comparison-based algorithm review –You should know most of the algorithms –We will concentrate on their analyses –Special emphasis: Heapsort Lower.
Lower bound for sorting, radix sort COMP171 Fall 2005.
1 Sorting Problem: Given a sequence of elements, find a permutation such that the resulting sequence is sorted in some order. We have already seen: –Insertion.
Lower bound for sorting, radix sort COMP171 Fall 2006.
Lecture 5: Linear Time Sorting Shang-Hua Teng. Sorting Input: Array A[1...n], of elements in arbitrary order; array size n Output: Array A[1...n] of the.
Sorting Heapsort Quick review of basic sorting methods Lower bounds for comparison-based methods Non-comparison based sorting.
Comp 122, Spring 2004 Keys into Buckets: Lower bounds, Linear-time sort, & Hashing.
Comp 122, Spring 2004 Lower Bounds & Sorting in Linear Time.
Data Structures, Spring 2006 © L. Joskowicz 1 Data Structures – LECTURE 4 Comparison-based sorting Why sorting? Formal analysis of Quick-Sort Comparison.
CSC 2300 Data Structures & Algorithms March 27, 2007 Chapter 7. Sorting.
2 -1 Analysis of algorithms Best case: easiest Worst case Average case: hardest.
Lecture 5: Master Theorem and Linear Time Sorting
Divide and Conquer Sorting
2 -1 Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
CSE 326: Data Structures Sorting Ben Lerner Summer 2007.
Analysis of Algorithms CS 477/677
1 Today’s Material Lower Bounds on Comparison-based Sorting Linear-Time Sorting Algorithms –Counting Sort –Radix Sort.
DAST 2005 Week 4 – Some Helpful Material Randomized Quick Sort & Lower bound & General remarks…
The Complexity of Algorithms and the Lower Bounds of Problems
Lecture 8 Sorting. Sorting (Chapter 7) We have a list of real numbers. Need to sort the real numbers in increasing order (smallest first). Important points.
Lower Bounds for Comparison-Based Sorting Algorithms (Ch. 8)
1 Time Analysis Analyzing an algorithm = estimating the resources it requires. Time How long will it take to execute? Impossible to find exact value Depends.
Analysis of Algorithms CS 477/677
September 29, Algorithms and Data Structures Lecture V Simonas Šaltenis Aalborg University
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 7.
COSC 3101A - Design and Analysis of Algorithms 6 Lower Bounds for Sorting Counting / Radix / Bucket Sort Many of these slides are taken from Monica Nicolescu,
Data Structures Haim Kaplan & Uri Zwick December 2013 Sorting 1.
Sorting Fundamental Data Structures and Algorithms Aleks Nanevski February 17, 2004.
1 Sorting an almost sorted input Suppose we know that the input is “almost” sorted Let I be the number of “inversions” in the input: The number of pairs.
Sorting Lower Bounds n Beating Them. Recap Divide and Conquer –Know how to break a problem into smaller problems, such that –Given a solution to the smaller.
Lower Bounds & Sorting in Linear Time
Sorting.
Data Structures and Algorithms (AT70. 02) Comp. Sc. and Inf. Mgmt
Quick Sort Divide: Partition the array into two sub-arrays
CPSC 411 Design and Analysis of Algorithms
CPSC 411 Design and Analysis of Algorithms
Introduction to Algorithms
Sorting We have actually seen already two efficient ways to sort:
Description Given a linear collection of items x1, x2, x3,….,xn
Quick-Sort 11/14/2018 2:17 PM Chapter 4: Sorting    7 9
(2,4) Trees 11/15/2018 9:25 AM Sorting Lower Bound Sorting Lower Bound.
Linear Sorting Sections 10.4
Sorting We have actually seen already two efficient ways to sort:
Quick-Sort 11/19/ :46 AM Chapter 4: Sorting    7 9
Keys into Buckets: Lower bounds, Linear-time sort, & Hashing
The Complexity of Algorithms and the Lower Bounds of Problems
Ch8: Sorting in Linear Time Ming-Te Chi
Ch 7: Quicksort Ming-Te Chi
Linear Sort "Our intuition about the future is linear. But the reality of information technology is exponential, and that makes a profound difference.
Lecture 3 / 4 Algorithm Analysis
CS200: Algorithm Analysis
Linear Sort "Our intuition about the future is linear. But the reality of information technology is exponential, and that makes a profound difference.
Data Structures Sorting Haim Kaplan & Uri Zwick December 2014.
Lower Bounds & Sorting in Linear Time
CS 583 Analysis of Algorithms
Linear Sorting Section 10.4
Linear-Time Sorting Algorithms
Quick-Sort 2/23/2019 1:48 AM Chapter 4: Sorting    7 9
CS 3343: Analysis of Algorithms
(2,4) Trees 2/28/2019 3:21 AM Sorting Lower Bound Sorting Lower Bound.
Topic 5: Heap data structure heap sort Priority queue
Lower bound for sorting, radix sort
Tomado del libro Cormen
Chapter 7 Quicksort.
CPSC 411 Design and Analysis of Algorithms
CS 583 Analysis of Algorithms
The Selection Problem.
Quicksort and Randomized Algs
Algorithms CSCI 235, Spring 2019 Lecture 17 Quick Sort II
Presentation transcript:

Sorting We have actually seen already two efficient ways to sort:

A kind of “insertion” sort Insert the elements into a red-black tree one by one Traverse the tree in in-order and collect the keys Takes O(nlog(n)) time

Heapsort (Willians, Floyd, 1964) Put the elements in an array Make the array into a heap Do a deletemin and put the deleted element at the last position of the array

Quicksort (Hoare 1961)

quicksort Input: an array A[p, r] Quicksort (A, p, r) if (p < r) then q = Partition (A, p, r) //q is the position of the pivot element Quicksort (A, p, q-1) Quicksort (A, q+1, r)

p r i j 2 8 7 1 3 5 6 4 2 8 7 1 3 5 6 4 i j 2 8 7 1 3 5 6 4 i j 2 8 7 1 3 5 6 4 i j 2 1 7 8 3 5 6 4 i j

2 1 7 8 3 5 6 4 i j 2 1 3 8 7 5 6 4 i j 2 1 3 8 7 5 6 4 i j 2 1 3 8 7 5 6 4 i j 2 1 3 4 7 5 6 8 i j

2 8 7 1 3 5 6 4 p r Partition(A, p, r) x ←A[r] i ← p-1 for j ← p to r-1 do if A[j] ≤ x then i ← i+1 exchange A[i] ↔ A[j] exchange A[i+1] ↔A[r] return i+1

Analysis Running time is proportional to the number of comparisons Each pair is compared at most once  O(n2) In fact for each n there is an input of size n on which quicksort takes Ω(n2) time

But Assume that the split is even in each iteration

T(n) = 2T(n/2) + n How do we solve linear recurrences like this ? (read Chapter 4)

Recurrence tree n T(n/2) T(n/2)

Recurrence tree n n/2 n/2 T(n/4) T(n/4) T(n/4) T(n/4)

Recurrence tree n n/2 n/2 logn T(n/4) T(n/4) T(n/4) T(n/4) In every level we do bn comparisons So the total number of comparisons is O(nlogn)

Observations We can’t guarantee good splits But intuitively on random inputs we will get good splits

Randomized quicksort Use randomized-partition rather than partition Randomized-partition (A, p, r) i ← random(p,r) exchange A[r] ↔ A[i] return partition(A,p,r)

On the same input we will get a different running time in each run ! Look at the average for one particular input of all these running times

Expected # of comparisons Let X be the expected # of comparisons This is a random variable Want to know E(X)

Expected # of comparisons Let z1,z2,.....,zn the elements in sorted order Let Xij = 1 if zi is compared to zj and 0 otherwise So,

by linearity of expectation

Consider zi,zi+1,.......,zj ≡ Zij Claim: zi and zj are compared  either zi or zj is the first chosen in Zij Proof: 3 cases: {zi, …, zj} Compared on this partition, and never again. {zi, …, zj} the same {zi, …, zk, …, zj} Not compared on this partition. Partition separates them, so no future partition uses both.

Pr{zi is compared to zj} = Pr{zi or zj is first pivot chosen from Zij} just explained = Pr{zi is first pivot chosen from Zij} + Pr{zj is first pivot chosen from Zij} mutually exclusive possibilities = 1/(j-i+1) + 1/(j-i+1) = 2/(j-i+1)

Simplify with a change of variable, k=j-i+1. Simplify and overestimate, by adding terms.

Lower bound for sorting in the comparison model

A lower bound Comparison model: We assume that the operation from which we deduce order among keys are comparisons Then we prove that we need Ω(nlogn) comparisons on the worst case

Model the algorithm as a decision tree

Insertion sort x y z 1:2 < > x y z 2:3 2:3 y x z < > >

Quicksort x y z 1:3 < > 2:3 2:3 < < > > 2:3 x y z 1:2 x z y y z x z x y > < > < x y z y x z z x y z y x

Important observations Every algorithm can be represented as a (binary) tree like this For every node v there is an input on which the algorithm reaches v The # of leaves is n!

Important observations Each path corresponds to a run on some input The worst case # of comparisons corresponds to the longest path

The lower bound Let d be the length of the longest path n! ≤ #leaves ≤ 2d log2(n!) ≤ d

Lower bound for sorting Any sorting algorithm based on comparisons between elements requires (n log n) comparisons.

Beating the lower bound We can beat the lower bound if we can deduce order relations between keys not by comparisons Examples: Count sort Radix sort

Count sort Assume that keys are integers between 0 and k A 2 3 5 3 5 2

Count sort Allocate a temporary array of size k: cell x counts the # of keys =x A 2 3 5 3 5 2 5 C

Count sort A 2 3 5 3 5 2 5 C 1

Count sort A 2 3 5 3 5 2 5 C 1 1

Count sort A 2 3 5 3 5 2 5 C 1 1 1

Count sort A 2 3 5 3 5 2 5 C 2 2 2 3

Count sort Compute prefix sums of C: cell x holds the # of keys ≤ x (rather than =x) A 2 3 5 3 5 2 5 C 2 2 2 3

Count sort Compute prefix sums of C: cell x holds the # of keys ≤ x (rather than =x) A 2 3 5 3 5 2 5 C 2 2 4 6 6 9

Count sort Move items to output array A 2 3 5 3 5 2 5 C 2 2 4 6 6 9 B 5 3 5 2 5 C 2 2 4 6 6 9 B / / / / / / / / /

Count sort 2 3 5 A 4 6 9 C / B

Count sort 2 3 5 A 4 6 8 C / B

Count sort 2 3 5 A 6 8 C / B

Count sort 2 3 5 A 1 6 8 C / B

Count sort 2 3 5 A 1 6 7 C / B

Count sort 2 3 5 A 1 6 7 C / B

Count sort 2 3 5 A 4 6 C B

Count sort Complexity: O(n+k) The sort is stable Note that count sort does not perform any comparison

Radix sort Say we have numbers with d digits each between 0 and k 2 8 7 1 4 5 9 1 6 5 7 2 1 3 1 2 4 7 2 3 5 5 5 7 2 2 8 3 9 4 4 8 4 4 3 5 3 6

Radix sort Use a stable sort to sort by the least significant digit (e.g. count sort) 2 8 7 1 4 5 9 1 6 5 7 2 1 3 1 2 4 7 2 3 5 5 5 7 2 2 8 3 9 4 4 8 4 4 3 5 3 6

Radix sort 2 8 7 1 2 8 7 1 4 5 9 1 4 5 9 1 6 5 7 2 1 3 1 1 3 1 6 5 7 2 2 4 7 2 2 4 7 2 3 5 5 5 7 2 2 7 2 2 8 3 9 4 8 3 9 4 4 8 4 4 4 8 4 4 3 5 5 5 3 5 3 6 3 5 3 6

Radix sort 2 8 7 1 2 8 7 1 4 5 9 1 4 5 9 1 6 5 7 2 1 3 1 1 3 1 6 5 7 2 2 4 7 2 2 4 7 2 3 5 5 5 7 2 2 7 2 2 8 3 9 4 8 3 9 4 4 8 4 4 4 8 4 4 3 5 5 5 3 5 3 6 3 5 3 6

Radix sort 2 8 7 1 2 8 7 1 1 3 1 4 5 9 1 4 5 9 1 7 2 2 6 5 7 2 1 3 1 3 5 3 6 1 3 1 6 5 7 2 4 8 4 4 2 4 7 2 2 4 7 2 3 5 5 5 3 5 5 5 7 2 2 2 8 7 1 7 2 2 8 3 9 4 6 5 7 2 8 3 9 4 4 8 4 4 2 4 7 2 4 8 4 4 3 5 5 5 4 5 9 1 3 5 3 6 3 5 3 6 8 3 9 4

Radix sort 2 8 7 1 2 8 7 1 1 3 1 4 5 9 1 4 5 9 1 7 2 2 6 5 7 2 1 3 1 3 5 3 6 1 3 1 6 5 7 2 4 8 4 4 2 4 7 2 2 4 7 2 3 5 5 5 3 5 5 5 7 2 2 2 8 7 1 7 2 2 8 3 9 4 6 5 7 2 8 3 9 4 4 8 4 4 2 4 7 2 4 8 4 4 3 5 5 5 4 5 9 1 3 5 3 6 3 5 3 6 8 3 9 4

Radix sort 2 8 7 1 2 8 7 1 1 3 1 7 2 2 4 5 9 1 4 5 9 1 7 2 2 1 3 1 6 5 7 2 1 3 1 3 5 3 6 8 3 9 4 1 3 1 6 5 7 2 4 8 4 4 2 4 7 2 2 4 7 2 2 4 7 2 3 5 5 5 3 5 3 6 3 5 5 5 7 2 2 2 8 7 1 3 5 5 5 7 2 2 8 3 9 4 6 5 7 2 6 5 7 2 8 3 9 4 4 8 4 4 2 4 7 2 4 5 9 1 4 8 4 4 3 5 5 5 4 5 9 1 4 8 4 4 3 5 3 6 3 5 3 6 8 3 9 4 2 8 7 1

Radix sort 2 8 7 1 2 8 7 1 1 3 1 7 2 2 4 5 9 1 4 5 9 1 7 2 2 1 3 1 6 5 7 2 1 3 1 3 5 3 6 8 3 9 4 1 3 1 6 5 7 2 4 8 4 4 2 4 7 2 2 4 7 2 2 4 7 2 3 5 5 5 3 5 3 6 3 5 5 5 7 2 2 2 8 7 1 3 5 5 5 7 2 2 8 3 9 4 6 5 7 2 6 5 7 2 8 3 9 4 4 8 4 4 2 4 7 2 4 5 9 1 4 8 4 4 3 5 5 5 4 5 9 1 4 8 4 4 3 5 3 6 3 5 3 6 8 3 9 4 2 8 7 1

Radix sort 2 8 7 1 2 8 7 1 1 3 1 7 2 2 1 3 1 4 5 9 1 4 5 9 1 7 2 2 1 3 1 2 4 7 2 6 5 7 2 1 3 1 3 5 3 6 8 3 9 4 2 8 7 1 1 3 1 6 5 7 2 4 8 4 4 2 4 7 2 3 5 3 6 2 4 7 2 2 4 7 2 3 5 5 5 3 5 3 6 3 5 5 5 3 5 5 5 7 2 2 2 8 7 1 3 5 5 5 4 5 9 1 7 2 2 8 3 9 4 6 5 7 2 6 5 7 2 4 8 4 4 8 3 9 4 4 8 4 4 2 4 7 2 4 5 9 1 6 5 7 2 4 8 4 4 3 5 5 5 4 5 9 1 4 8 4 4 7 2 2 3 5 3 6 3 5 3 6 8 3 9 4 2 8 7 1 8 3 9 4

Radix sort Complexity O(d(n+k)) if we use count sort and have d digits each between 0 and k

Assume something about the input Random, “almost sorted” For such inputs we want to sort faster

Sorting an almost sorted input Suppose we know that the input is “almost” sorted Let I be the number of “inversions” in the input: The number of pairs ai,aj such that i<j and ai>aj

Example 1, 4 , 5 , 8 , 3 I=3 8, 7 , 5 , 3 , 1 I=10

Insertion sort Think of “insertion sort” How long it takes to insert ak ? As the number of inversions ai,ak for i < k lets call this Ik

Analysis The running time is:

Thoughts When I=Ω(n2) the running time is Ω(n2) But we would like it to be O(nlog(n)) for any input, and faster when I is small

Finger red black trees

Finger tree Take a regular search tree and reverse the direction of the pointers on the rightmost spine We go up from the last leaf until we find the subtree containing the item and we descend into it

Finger trees Say we search for a position at distance d from the end Then we go up to height O(1+log(d)) So search for the dth position takes O(1+log(d)) time Insertions and deletions still take O(log n) worst case time but O(1+log(d)) amortized time

Back to sorting Suppose we implement the insertion sort using a finger search tree When we insert item k then d=O(Ik+1) and it takes O(1+log(Ik+1)) time Total time is bounded by O(n+n log ((I+n)/n))