Fall 2015 Lecture 4: Sorting in linear time

Slides:



Advertisements
Similar presentations
Sorting in linear time (for students to read) Comparison sort: –Lower bound:  (nlgn). Non comparison sort: –Bucket sort, counting sort, radix sort –They.
Advertisements

Analysis of Algorithms CS 477/677 Linear Sorting Instructor: George Bebis ( Chapter 8 )
Sorting in Linear Time Comp 550, Spring Linear-time Sorting Depends on a key assumption: numbers to be sorted are integers in {0, 1, 2, …, k}. Input:
Non-Comparison Based Sorting
MS 101: Algorithms Instructor Neelima Gupta
1 Sorting in Linear Time How can we do better?  CountingSort  RadixSort  BucketSort.
Mudasser Naseer 1 5/1/2015 CSC 201: Design and Analysis of Algorithms Lecture # 9 Linear-Time Sorting Continued.
Counting Sort Non-comparison sort. Precondition: n numbers in the range 1..k. Key ideas: For each x count the number C(x) of elements ≤ x Insert x at output.
Spring 2015 Lecture 5: QuickSort & Selection
Lower bound for sorting, radix sort COMP171 Fall 2006.
Lecture 5: Linear Time Sorting Shang-Hua Teng. Sorting Input: Array A[1...n], of elements in arbitrary order; array size n Output: Array A[1...n] of the.
CS 253: Algorithms Chapter 8 Sorting in Linear Time Credit: Dr. George Bebis.
Comp 122, Spring 2004 Keys into Buckets: Lower bounds, Linear-time sort, & Hashing.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu.
Comp 122, Spring 2004 Lower Bounds & Sorting in Linear Time.
Data Structures, Spring 2006 © L. Joskowicz 1 Data Structures – LECTURE 4 Comparison-based sorting Why sorting? Formal analysis of Quick-Sort Comparison.
© 2004 Goodrich, Tamassia Sorting Lower Bound1. © 2004 Goodrich, Tamassia Sorting Lower Bound2 Comparison-Based Sorting (§ 10.3) Many sorting algorithms.
Analysis of Algorithms CS 477/677 Midterm Exam Review Instructor: George Bebis.
Lecture 5: Master Theorem and Linear Time Sorting
CSE 326: Data Structures Sorting Ben Lerner Summer 2007.
Sorting Lower Bound Andreas Klappenecker based on slides by Prof. Welch 1.
Analysis of Algorithms CS 477/677
1 Today’s Material Lower Bounds on Comparison-based Sorting Linear-Time Sorting Algorithms –Counting Sort –Radix Sort.
Data Structures, Spring 2004 © L. Joskowicz 1 Data Structures – LECTURE 5 Linear-time sorting Can we do better than comparison sorting? Linear-time sorting.
Sorting Lower Bound1. 2 Comparison-Based Sorting (§ 4.4) Many sorting algorithms are comparison based. They sort by making comparisons between pairs of.
Lower Bounds for Comparison-Based Sorting Algorithms (Ch. 8)
Computer Algorithms Lecture 11 Sorting in Linear Time Ch. 8
Sorting in Linear Time Lower bound for comparison-based sorting
CSE 373 Data Structures Lecture 15
Ch. 8 & 9 – Linear Sorting and Order Statistics What do you trade for speed?
2IL50 Data Structures Spring 2015 Lecture 3: Heaps.
1 Sorting in O(N) time CS302 Data Structures Section 10.4.
2IL50 Data Structures Fall 2015 Lecture 2: Analysis of Algorithms.
David Luebke 1 10/13/2015 CS 332: Algorithms Linear-Time Sorting Algorithms.
CSC 41/513: Intro to Algorithms Linear-Time Sorting Algorithms.
2IL50 Data Structures Fall 2015 Lecture 3: Heaps.
Sorting Fun1 Chapter 4: Sorting     29  9.
Analysis of Algorithms CS 477/677
Data Structures and Algorithms (AT70.02) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Prof. Sumanta Guha Slide Sources: CLRS “Intro.
Searching and Sorting Recursion, Merge-sort, Divide & Conquer, Bucket sort, Radix sort Lecture 5.
COSC 3101A - Design and Analysis of Algorithms 6 Lower Bounds for Sorting Counting / Radix / Bucket Sort Many of these slides are taken from Monica Nicolescu,
1 Algorithms CSCI 235, Fall 2015 Lecture 17 Linear Sorting.
CSCE 411H Design and Analysis of Algorithms Set 10: Lower Bounds Prof. Evdokia Nikolova* Spring 2013 CSCE 411H, Spring 2013: Set 10 1 * Slides adapted.
Analysis of Algorithms CS 477/677 Lecture 8 Instructor: Monica Nicolescu.
Linear Sorting. Comparison based sorting Any sorting algorithm which is based on comparing the input elements has a lower bound of Proof, since there.
Sorting Lower Bounds n Beating Them. Recap Divide and Conquer –Know how to break a problem into smaller problems, such that –Given a solution to the smaller.
Sorting & Lower Bounds Jeff Edmonds York University COSC 3101 Lecture 5.
Lecture 5 Algorithm Analysis Arne Kutzner Hanyang University / Seoul Korea.
CS6045: Advanced Algorithms Sorting Algorithms. Sorting So Far Insertion sort: –Easy to code –Fast on small inputs (less than ~50 elements) –Fast on nearly-sorted.
Lower Bounds & Sorting in Linear Time
CPSC 411 Design and Analysis of Algorithms
Introduction to Algorithms
Lecture 5 Algorithm Analysis
Linear Sorting Sections 10.4
Keys into Buckets: Lower bounds, Linear-time sort, & Hashing
Ch8: Sorting in Linear Time Ming-Te Chi
Lecture 5 Algorithm Analysis
Linear Sort "Our intuition about the future is linear. But the reality of information technology is exponential, and that makes a profound difference.
CS200: Algorithm Analysis
Linear Sorting Sorting in O(n) Jeff Chastine.
Linear Sort "Our intuition about the future is linear. But the reality of information technology is exponential, and that makes a profound difference.
Lower Bounds & Sorting in Linear Time
Linear Sorting Section 10.4
Lecture 5 Algorithm Analysis
CPSC 411 Design and Analysis of Algorithms
Chapter 8: Overview Comparison sorts: algorithms that sort sequences by comparing the value of elements Prove that the number of comparison required to.
CS 583 Analysis of Algorithms
Lecture 5 Algorithm Analysis
Presentation transcript:

Fall 2015 Lecture 4: Sorting in linear time 2IL50 Data Structures Fall 2015 Lecture 4: Sorting in linear time

Building a heap One more time …

Building a heap Build-Max-Heap(A) heap-size = A.length for i = A.length downto 1 do Max-Heapify(A,i)

Building a heap Build-Max-Heap2(A) heap-size[A] = 1 for i = 2 to A.length do Max-Heap-Insert(A, A[i]) Max-Heap-Insert(A, key) heap-size[A] = heap-size[A] + 1 A[heap-size[A] ] = - infinity Increase-Key(A, heap-size[A], key) Lower bound worst case running time? 14 3 8 11 2 24 35 28 16 5 20 21 14 8 3 11 A[i] moves up until it reaches the correct position

Building a heap Build-Max-Heap2(A) heap-size[A] = 1 for i = 2 to A.length do Max-Heap-Insert(A, A[i]) Running time: Θ(1) + ∑2≤i ≤ n (time for Max-Heap-Insert(A, A[i])) Max-Heap-Insert (A, A[i] ) takes O(log i) = O(log n) time ➨ worst case running time is O(n log n) If A is sorted in increasing order, then A[i] is always the largest element when Max-Heap-Insert(A, A[i])) is called and must move all the way up the tree ➨ Max-Heap-Insert (A, A[i]) takes Ω(log i) time. Worst case running time: Θ(1) + ∑2≤i ≤ n Ω(log i) = Ω(1 + ∑2≤i ≤ n log i) = Ω (n log n) since ∑2≤i ≤ n log i ≥ ∑n/2≤i ≤ n log (n/2) = n/2 log (n/2)

Quiz log2 n = Θ(log n2) ? √n = Ω(log4 n) ? 2lg n = Ω(n2) ? no yes

Sorting in linear time

The sorting problem Input: a sequence of n numbers ‹a1, a2, …, an› Output: a permutation of the input such that ‹ai1 ≤ … ≤ ain› Why do we care so much about sorting? sorting is used by many applications (first) step of many algorithms many techniques can be illustrated by studying sorting

Can we sort faster than Θ(n log n) ?? Worst case running time of sorting algorithms: InsertionSort: Θ(n2) MergeSort: Θ(n log n) HeapSort: Θ(n log n) Can we do this faster? Θ(n loglog n) ? Θ(n) ?

Upper and lower bounds Upper bound How do you show that a problem (for example sorting) can be solved in Θ(f(n)) time? ➨ give an algorithm that solves the problem in Θ(f(n)) time. Lower bound How do you show that a problem (for example sorting) cannot be solved faster than in Θ(f(n)) time? ➨ prove that every possible algorithm that solves the problem needs Ω(f(n)) time.

Lower bounds Lower bound How do you show that a problem (for example sorting) can not be solved faster than in Θ(f(n)) time? ➨ prove that every possible algorithm that solves the problem needs Ω(f(n)) time. Model of computation: which operations is the algorithm allowed to use? Bit-manipulations? Random-access (array indexing) vs. pointer-machines?

Comparison-based sorting InsertionSort(A) for j = 2 to A.length do begin 6. 7. end Which steps precisely the algorithm executes — and hence, which element ends up where — only depends on the result of comparisons between the input elements. key = A[ j ] ; i = j -1 while i > 0 and A[ i ] > key do begin A[ i+1] = A[ i ]; i = i -1 end A[ i +1] = key

Decision tree for comparison-based sorting exchange of elements, assignments, etc. … or ≤, =, >, ≥ A[.] < A[.] A[.] < A[.] A[.] < A[.] A[.] < A[.] A[.] < A[.] A[.] < A[.] A[.] < A[.]

Proving comparison-based lower bound Proving lower bound of f(n) comparisons Proof by contradiction Assume algorithm with worst case f(n) – 1 comparisons Show two different inputs with same comparison results ➨ Both inputs follow same path in decision tree ➨ Algorithm cannot be correct Easy approach Count number of different inputs (requiring different outputs) Every different input must correspond to a distinct leaf Hard approach Maintain set of possible inputs corresponding to comparisons Show that at least two inputs remain after f(n) – 1 comparisons Cannot choose comparisons, can choose results height of decision tree

Comparison-based sorting every permutation of the input follows a different path in the decision tree ➨ the decision tree has at least n! leaves the height of a binary tree with n! leaves is at least log(n!) worst case running time ≥ longest path from root to leaf ≥ log(n!) = Ω(n log n)

Lower bound for comparison-based sorting Theorem Any comparison-based sorting algorithm requires Ω(n log n) comparisons in the worst case. ➨ The worst case running time of MergeSort and HeapSort is optimal.

Sorting in linear time … Three algorithms which are faster: CountingSort RadixSort BucketSort (not comparison-based, make assumptions on the input)

CountingSort Input: array A[1..n] of numbers Assumption: the input elements are integers in the range 0 to k, for some k Main idea: count for every A[i] the number of elements less than A[i] ➨ position of A[i] in the output array Beware of elements that have the same value! position(i) = number of elements less than A[i] in A[1..n] + number of elements equal to A[i] in A[1..i]

third 5 from left position: (# less than 5) + 3 CountingSort position(i) = number of elements less than A[i] in A[1..n] + number of elements equal to A[i] in A[1..i] 5 3 10 5 4 5 7 7 9 3 10 8 5 3 3 8 3 3 3 3 4 5 5 5 5 7 7 8 8 9 10 10 numbers < 5 third 5 from left position: (# less than 5) + 3

CountingSort position(i) = number of elements less than A[i] in A[1..n] + number of elements equal to A[i] in A[1..i] Lemma If every element A[i] is placed on position(i), then the array is sorted and the sorted order is stable. Numbers with the same value appear in the same order in the output array as they do in the input array.

CountingSort C[i] will contain the number of elements ≤ i CountingSort(A,k) ► Input: array A[1..n] of integers in the range 0..k ► Output: array B[1..n] which contains the elements of A, sorted for i = 0 to k do C[i] = 0 for j = 1 to A.length do C[A[j]] = C[A[j]] + 1 ► C[i] now contains the number of elements equal to i for i = 1 to k do C[i] = C[i] + C[i-1] ► C[i] now contains the number of elements less than or equal to i for j = A.length downto 1 do B[C[A[ j ] ] ] = A[j]; C[A[ j ]] = C[A[ j ]] – 1

CountingSort Correctness lines 6/7: Invariant CountingSort(A,k) ► Input: array A[1..n] of integers in the range 0..k ► Output: array B[1..n] which contains the elements of A, sorted for i = 0 to k do C[i] = 0 for j = 1 to A.length do C[A[j]] = C[A[j]] + 1 ► C[i] now contains the number of elements equal to i for i = 1 to k do C[i] = C[i ] + C[i-1] ► C[i] now contains the number of elements less than or equal to i for j = A.length downto 1 do B[C[A[ j ] ] ] = A[j]; C[A[ j ]] = C[A[ j ]] – 1 Correctness lines 6/7: Invariant Inv(j): for j + 1 ≤ i ≤ n: B[position(i)] contains A[i] for 0 ≤ i ≤ k: C[i] = ( # numbers smaller than i ) + ( # numbers equal to i in A[1..j]) Inv(j) holds before loop is executed, Inv(j – 1) holds afterwards

CountingSort: running time CountingSort(A,k) ► Input: array A[1..n] of integers in the range 0..k ► Output: array B[1..n] which contains the elements of A, sorted for i = 0 to k do C[i] = 0 for j = 1 to A.length do C[A[j]] = C[A[j]] + 1 ► C[i] now contains the number of elements equal to i for i = 1 to k do C[i] = C[i ] + C[i-1] ► C[i] now contains the number of elements less than or equal to i for j = A.length downto 1 do B[C[A[ j ] ] ] = A[j]; C[A[ j ]] = C[A[ j ]] – 1 line 1: ∑0≤i≤k Θ(1) = Θ(k) line 2: ∑1≤i≤n Θ(1) = Θ(n) line 4: ∑0≤i≤k Θ(1) = Θ(k) lines 6/7: ∑1≤i≤n Θ(1) = Θ(n) Total: Θ(n+k) ➨ Θ(n) if k = O(n)

CountingSort Theorem CountingSort is a stable sorting algorithm that sorts an array of n integers in the range 0..k in Θ(n+k) time.

RadixSort Input: array A[1..n] of numbers Assumption: the input elements are integers with d digits example (d = 4): 3288, 1193, 9999, 0654, 7243, 4321 RadixSort(A, d) for i = 1 to d do use a stable sort to sort array A on digit i dth digit 1st digit

RadixSort: example 329 457 657 839 436 720 355 720 355 436 457 657 329 839 720 329 436 839 355 457 657 329 355 436 457 657 720 839 sort on 1st digit sort on 2nd digit sort on 3rd digit Correctness: Assignment 3

RadixSort Running time: If we use CountingSort as stable sorting algorithm ➨ Θ(n + k) per digit Theorem Given n d-digit numbers in which each digit can take up to k possible values, RadixSort correctly sorts these numbers in Θ(d (n + k)) time. each digit is an integer in the range 0..k

BucketSort Input: array A[1..n] of numbers Assumption: the input elements lie in the interval [0..1) (no integers!) BucketSort is fast if the elements are uniformly distributed in [0..1)

input array A[1..n]; numbers in [0..1) BucketSort Throw input elements in “buckets”, sort buckets, concatenate … input array A[1..n]; numbers in [0..1) auxiliary array B[0..n-1] bucket B[i] contains numbers in [i/n … (i+1)/n] 0.792 0.1 0.287 0.15 0.346 0.734 0.5 0.13 0.256 0.53 1 2 n 1 n-1 0.346 0.53 0.5 0.1 0.15 0.13 0.287 0.256 0.734 0.792

input array A[1..n]; numbers in [0..1) BucketSort Throw input elements in “buckets”, sort buckets, concatenate … input array A[1..n]; numbers in [0..1) auxiliary array B[0..n-1] bucket B[i] contains numbers in [i/n … (i+1)/n] 0.792 0.1 0.287 0.15 0.346 0.734 0.5 0.13 0.256 0.53 1 2 n 0.346 0.53 0.5 0.1 0.13 0.15 0.256 0.287 0.792 0.734 1 0.1 0.15 0.13 0.287 0.256 0.346 0.53 0.5 0.792 0.734 n-1

BucketSort BucketSort(A) ► Input: array A[1..n] of numbers with 0 ≤ A[i ] < 1 ► Output: sorted list, which contains the elements of A n = A.length initialize auxiliary array B[0..n-1]; each B[i] is a linked list of numbers for i = 1 to n do insert A[i] into list B[ n∙A[i ] ] for i = 0 to n-1 do sort list B[i], for example with InsertionSort concatenate the lists B[0], B[1], …, B[n-1] together in order

BucketSort Running time? Define ni = number of elements in bucket B[i] ➨ running time = Θ(n) + ∑0≤i≤n-1 Θ(ni2) worst case: best case: expected running time if the numbers are randomly distributed ? all numbers fall into the same bucket ➨ Θ(n2) all numbers fall into different buckets ➨ Θ(n)

BucketSort: expected running time Define ni = number of elements in bucket B[i] ➨ running time = Θ(n) + ∑0≤i≤n-1 Θ(ni2) Assumption: Pr { A[j] falls in bucket B[i] } = 1/n for each i E [ running time ] = E [Θ(n) + ∑0≤i≤n-1 Θ(ni2) ] = Θ ( n + ∑0≤i≤n-1 E [ni2 ] ) What is E [ni2 ] ? (some math with indicator random variables – see book for details) ➨ E [ni2 ] = 2 - 1/n = Θ(1) ➨ expected running time = Θ(n) We have E [ni] = 1 … but E [ni2 ] ≠ E [ni ]2

Linear time sorting Sorting in linear time Only if assumptions hold! CountingSort Assumption: input elements are integers in the range 0 to k Running time: Θ(n+k) ➨ Θ(n) if k = O(n) RadixSort Assumption: input elements are integers with d digits Running time: Θ(d (n+k)) Can be Θ(n) for bounded integers with good choice of base BucketSort Assumption: input elements lie in the interval [0..1) Expected Θ(n) for uniform input, not for worst case!

Announcements Today 7+8 Nothing… Wednesday 3+4 Next data structures lecture Thursday 17:00 – 18:00 Office hours