Download presentation
Presentation is loading. Please wait.
Published byDonald Lindsey Modified over 9 years ago
1
2007 Pearson Education, Inc. All rights reserved. 1 16 Sorting: A Deeper Look
2
2007 Pearson Education, Inc. All rights reserved. 2 With sobs and tears he sorted out Those of the largest size … — Lewis Carroll ‘Tis in my memory lock’d, And you yourself shall keep the key of it. — William Shakespeare It is an immutable law in business that words are words, explanations are explanations, promises are promises — but only performance is reality. — Harold S. Green
3
2007 Pearson Education, Inc. All rights reserved. 3 OBJECTIVES In this chapter you will learn: To sort an array using the selection sort algorithm. To sort an array using the insertion sort algorithm. To sort an array using the recursive merge sort algorithm. To determine the efficiency of searching and sorting algorithms and express it in “Big O” notation. To explore (in the chapter exercises) additional recursive sorts, including quicksort and a recursive version of selection sort. To explore (in the chapter exercises) the bucket sort, which achieves very high performance, but by using considerably more memory than the other sorts we have studied—an example of the so-called “space–time trade-off.”
4
2007 Pearson Education, Inc. All rights reserved. 4 16.1Introduction 16.2Big O Notation 16.3Selection Sort 16.4Insertion Sort 16.5Merge Sort
5
2007 Pearson Education, Inc. All rights reserved. 5 16.1 Introduction Sorting data – Place data in order Typically ascending or descending Based on one or more sort keys – Algorithms Insertion sort Selection sort Merge sort - More efficient, but more complex
6
2007 Pearson Education, Inc. All rights reserved. 6 16.1 Introduction (Cont.) Big O notation – Estimates worst-case runtime for an algorithm How hard an algorithm must work to solve a problem
7
2007 Pearson Education, Inc. All rights reserved. 7 16.2 Big O Notation Big O notation – Measures runtime growth of an algorithm relative to number of items processed Highlights dominant terms Ignores terms that become unimportant as n grows Ignores constant factors
8
2007 Pearson Education, Inc. All rights reserved. 8 16.2 Big O Notation (Cont.) – Constant runtime Number of operations performed by algorithm is constant - Does not grow as number of items increases Represented in Big O notation as O(1) - Pronounced “on the order of 1” or “order 1” Example - Test if the first element of an n-array is equal to the second element Always takes one comparison, no matter how large the array
9
2007 Pearson Education, Inc. All rights reserved. 9 16.2 Big O Notation (Cont.) – Linear runtime Number of operations performed by algorithm grows linearly with number of items Represented in Big O notation as O(n) - Pronounced “on the order of n” or “order n” Example - Test if the first element of an n-array is equal to any other element Takes n - 1 comparisons n term dominates, -1 is ignored
10
2007 Pearson Education, Inc. All rights reserved. 10 16.2 Big O Notation (Cont.) – Quadratic runtime Number of operations performed by algorithm grows as the square of the number of items Represented in Big O notation as O(n 2 ) - Pronounced “on the order of n 2 ” or “order n 2 ” Example - Test if any element of an n-array is equal to any other element Takes n 2 /2 – n/2 comparisons n 2 term dominates, constant 1/2 is ignored, -n/2 is ignored
11
2007 Pearson Education, Inc. All rights reserved. 11 Polynomial-Time Brute force. For many non-trivial problems, there is a natural brute force search algorithm that checks every possible solution. – Typically takes 2 N time or worse for inputs of size N. – Unacceptable in practice. Desirable scaling property. When the input size doubles, the algorithm should only slow down by some constant factor C. Def. An algorithm is poly-time if the above scaling property holds. There exists constants c > 0 and d > 0 such that on every input of size N, its running time is bounded by c N d steps. choose C = 2 d n ! for stable matching with n men and n women
12
2007 Pearson Education, Inc. All rights reserved. 12 Worst-Case Analysis Worst case running time. Obtain bound on largest possible running time of algorithm on input of a given size N. – Generally captures efficiency in practice. – Draconian view, but hard to find effective alternative. Average case running time. Obtain bound on running time of algorithm on random input as a function of input size N. – Hard (or impossible) to accurately model real instances by random distributions. – Algorithm tuned for a certain distribution may perform poorly on other inputs.
13
2007 Pearson Education, Inc. All rights reserved. 13 Worst-Case Polynomial-Time Def. An algorithm is efficient if its running time is polynomial. Justification: It really works in practice! – Although 6.02 10 23 N 20 is technically poly-time, it would be useless in practice. – In practice, the poly-time algorithms that people develop almost always have low constants and low exponents. – Breaking through the exponential barrier of brute force typically exposes some crucial structure of the problem. Exceptions. – Some poly-time algorithms do have high constants and/or exponents, and are useless in practice. – Some exponential-time (or worse) algorithms are widely used because the worst-case instances seem to be rare. simplex method Unix grep
14
2007 Pearson Education, Inc. All rights reserved. 14 Why It Matters
15
2007 Pearson Education, Inc. All rights reserved. 15 2.2 Asymptotic Order of Growth
16
2007 Pearson Education, Inc. All rights reserved. 16 Complexity Measures Problem size n Worst-case complexity: max # steps algorithm takes on any input of size n Best-case complexity: min # steps algorithm takes on any input of size n Average-case complexity: avg # steps algorithm takes on inputs of size n Best-case : unrealistic Average-case : over what probability distribution?, analysis often hard Worst-case : a fast algorithm has a comforting guarantee maybe too pessimistic
17
2007 Pearson Education, Inc. All rights reserved. 17 Asymptotic Order of Growth Upper bounds. T(n) is O(f(n)) if there exist constants c > 0 and n 0 0 such that for all n n 0 we have T(n) c · f(n). Lower bounds. T(n) is (f(n)) if there exist constants c > 0 and n 0 0 such that for all n n 0 we have T(n) c · f(n). Tight bounds. T(n) is (f(n)) if T(n) is both O(f(n)) and (f(n)). Ex: T(n) = 32n 2 + 17n + 32. – T(n) is O(n 2 ), O(n 3 ), (n 2 ), (n), and (n 2 ). – T(n) is not O(n), (n 3 ), (n), or (n 3 ).
18
2007 Pearson Education, Inc. All rights reserved. 18 Asymptotic Order of Growth in words A way of comparing functions that ignores constant factors and small input sizes O(g(n)): class of functions f(n) that grow no faster than g(n) Θ (g(n)): class of functions f(n) that grow at same rate as g(n) Ω (g(n)): class of functions f(n) that grow at least as fast as g(n)
19
2007 Pearson Education, Inc. All rights reserved. 19 Notation Slight abuse of notation. T(n) = O(f(n)). – Asymmetric: f(n) = 5n 3 ; g(n) = 3n 2 f(n) = O(n 3 ) = g(n) but f(n) g(n). – Better notation: T(n) O(f(n)). Meaningless statement. Any comparison-based sorting algorithm requires at least O(n log n) comparisons. – Statement doesn't "type-check." – Use for lower bounds.
20
2007 Pearson Education, Inc. All rights reserved. 20 Properties Transitivity. – If f = O(g) and g = O(h) then f = O(h). – If f = (g) and g = (h) then f = (h). – If f = (g) and g = (h) then f = (h). Additivity. – If f = O(h) and g = O(h) then f + g = O(h). – If f = (h) and g = (h) then f + g = (h). – If f = (h) and g = O(h) then f + g = (h).
21
2007 Pearson Education, Inc. All rights reserved. 21 Asymptotic Bounds for Some Common Functions Polynomials. a 0 + a 1 n + … + a d n d is (n d ) if a d > 0. Polynomial time. Running time is O(n d ) for some constant d independent of the input size n. Logarithms. O(log a n) = O(log b n) for any constants a, b > 0. Logarithms. For every x > 0, log n = O(n x ). Exponentials. For every r > 1 and every d > 0, n d = O(r n ). every exponential grows faster than every polynomial can avoid specifying the base log grows slower than every polynomial
22
2007 Pearson Education, Inc. All rights reserved. 22 More Examples 10n2-16n+100 is O(n2) also O(n3) 10n2-16n+100 ≤ 11n2 for all n ≥ 10 10n2-16n+100 is Ω (n2) also Ω (n) 10n2-16n+100 ≥ 9n2 for all n ≥16 Therefore also 10n2-16n+100 is Θ (n2) 10n2-16n+100 is not O(n) also not Ω (n3)
23
2007 Pearson Education, Inc. All rights reserved. 23 Time efficiency of nonrecursive algorithms General Plan for Analysis - Decide on parameter n indicating input size - Identify algorithm’s basic operation - Determine worst, average, and best cases for input of size n - Set up a sum for the number of times the basic operation is executed - Simplify the sum using standard formulas and rules
24
2007 Pearson Education, Inc. All rights reserved. 24 Useful summation formulas and rules l i u 1 = 1+1+…+1 = u - l + 1 In particular, l i u 1 = n - 1 + 1 = n (n) 1 i n i = 1+2+…+n = n(n+1)/2 n 2 /2 (n 2 ) 1 i n i 2 = 1 2 +2 2 +…+n 2 = n(n+1)(2n+1)/6 n 3 /3 (n 3 ) 0 i n a i = 1 + a +…+ a n = (a n+1 - 1)/(a - 1) for any a 1 In particular, 0 i n 2 i = 2 0 + 2 1 +…+ 2 n = 2 n+1 - 1 (2 n ) (a i ± b i ) = a i ± b i ca i = c a i l i u a i = l i m a i + m+1 i u a i
25
2007 Pearson Education, Inc. All rights reserved. 25 Example Problem Problem: Given N integers stored in an array X (int X[N]), find the sum of the numbers How can you design an algorithm for this problem? – Iterative (Non-Recursive) Solution Use a for or while loop and add the numbers one by one – Recursive Solution A solution that calls itself on smaller problems to solve the big problem
26
2007 Pearson Education, Inc. All rights reserved. 26 Finding the sum of a set of numbers: Non- Recursive Algorithm Int X[N]; Sum = 0; For (int i = 0; i < N; i++) – Sum = Sum + X[i];
27
2007 Pearson Education, Inc. All rights reserved. 27 Finding the sum of a set of numbers: Recursive Algorithm int sum ( int A[], int N) { if (N == 0) return 0; -- Stopping rule else return sum(A, N-1) + A[N-1]; -- Key Step } /* end-sum */ Why recursion? Simplifies the code drastically
28
2007 Pearson Education, Inc. All rights reserved. 28 Analyzing Running Time RT: the amount of time it takes for the algorithm to finish execution on a particular input size More precisely, the RT of an algorithm on a particular input is the number of primitive operations or steps executed. We define a step to be a unit of work that can be executed in constant amount of time in a machine.
29
2007 Pearson Education, Inc. All rights reserved. 29 Finding the sum of a set of numbers: Iterative Algorithm and its analysis Assume int X[N] of integer is our data set Cost Times Sum = 0; C0 1 For (int i = 0; i < N; i++) C1 N – Sum = Sum + X[i]; C2 N T(n) = C0 + C1*N + C2*N Since C0, C1 and C2 are constants, T(n) can be expressed as a linear function n, i.e., – T(n) = a + b*n, for some constants a, b
30
2007 Pearson Education, Inc. All rights reserved. 30 Another Example: Searching for a number in an array of numbers Assume int X[N] of integer is our data set and we are searching for “key” Cost Times Found = 0; C0 1 I = 0; C1 1 while (!found && i < N){ C2 0 <= L < N If (key ==X[I]) found = 1; C3 1 <= L <= N I++; C4 1 <= L <= N } T(n) = C0 + C1 + L*(C2 + C3 + C4), where 1 <= L <= N is the number of times that the loop is iterated.
31
2007 Pearson Education, Inc. All rights reserved. 31 Example2: Searching for a number in an array of numbers (continued) What’s the best case? Loop iterates just once => – T(n) = C0 + C1 + C2 + C3 + C4 What’s the average (expected) case? Loop iterates N/2 times => – T(n) = C0 + C1 + N/2 * (C2 + C3 + C4) – Notice that this can be written as T(n) = a + b*n where a, b are constants What’s the worst case? Loop iterates N times => – T(n) = C0 + C1 + N * (C2 + C3 + C4) – Notice that this can be written as T(n) = a + b*n where a, b are constants
32
2007 Pearson Education, Inc. All rights reserved. 32 Worst Case Analysis of Algorithms We will only look at WORST CASE running time of an algorithm. Why? – Worst case is an upper bound on the running time. It gives us a guarantee that the algorithm will never take any longer – For some algorithms, the worst case happens fairly often. As in this search example, the searched item is typically not in the array, so the loop will iterate N times – The “average case” is often roughly as bad as the “worst case”. In our search algorithm, both the average case and the worst case are linear functions of the input size “n”
33
2007 Pearson Education, Inc. All rights reserved. 33 Asymptotic Notation We will study the asymptotic efficiency of algorithms – To do so, we look at input sizes large enough to make only the order of growth of the running time relevant – That is, we are concerned with how the running time of an algorithm increases with the size of the input in the limit as the size of the input increases without bound. – Usually an algorithm that is asymptotically more efficient will be the best choice for all but very small inputs. 3 asymptotic notations – Big O, Notations
34
2007 Pearson Education, Inc. All rights reserved. 34 Big-Oh Notation: Asymptotic Upper Bound T(n) = f(n) = O(g(n)) – if f(n) n0, where c & n0 are constants > 0 n c*g(n) f(n) n0 –Example: T(n) = 2n + 5 is O(n). Why? –2n+5 = 5 –T(n) = 5*n 2 + 3*n + 15 is O(n 2 ). Why? –5*n 2 + 3*n + 15 = 6
35
2007 Pearson Education, Inc. All rights reserved. 35 Notation: Asymptotic Lower Bound T(n) = f(n) = (g(n)) – if f(n) >= c*g(n) for all n > n0, where c and n0 are constants > 0 n f(n) c*g(n) n0 –Example: T(n) = 2n + 5 is (n). Why? –2n+5 >= 2n, for all n > 0 –T(n) = 5*n 2 - 3*n is (n 2 ). Why? –5*n 2 - 3*n >= 4*n 2, for all n >= 4
36
2007 Pearson Education, Inc. All rights reserved. 36 Notation: Asymptotic Tight Bound T(n) = f(n) = (g(n)) – if c1*g(n) n0, where c1, c2 and n0 are constants > 0 n0 –Example: T(n) = 2n + 5 is (n). Why? 2n = 5 –T(n) = 5*n 2 - 3*n is (n 2 ). Why? –4*n 2 = 4 n f(n) c1*g(n) c2*g(n)
37
2007 Pearson Education, Inc. All rights reserved. 37 Big-Oh, Theta, Omega Tips to guide your intuition: Think of O(f(N)) as “less than or equal to” f(N) – Upper bound: “grows slower than or same rate as” f(N) Think of Ω(f(N)) as “greater than or equal to” f(N) – Lower bound: “grows faster than or same rate as” f(N) Think of Θ(f(N)) as “equal to” f(N) – “Tight” bound: same growth rate (True for large N and ignoring constant factors)
38
2007 Pearson Education, Inc. All rights reserved. 38 Common Functions we will encounter NameBig-OhComment ConstantO(1) Can’t beat it! Log logO(loglogN) Extrapolation search LogarithmicO(logN) Typical time for good searching algorithms LinearO(N) This is about the fastest that an algorithm can run given that we need O(n) just to read the input N logNO(NlogN) Most sorting algorithms QuadraticO(N 2 ) Acceptable when the data size is small (N<1000) CubicO(N 3 ) Acceptable when the data size is small (N<1000) ExponentialO(2 N ) Only good for really small input sizes (n<=20) Increasing cost Polynomial time
39
2007 Pearson Education, Inc. All rights reserved. 39 Time and Space Tradeoffs In turns out in most algorithm design, there is a tradeoff between time and space – To make an algorithm faster, you might have to use more space – Trade space away (use less space), then the algorithm will run slower
40
2007 Pearson Education, Inc. All rights reserved. 40 2.4 A Survey of Common Running Times
41
2007 Pearson Education, Inc. All rights reserved. 41 Linear Time: O(n) Linear time. Running time is at most a constant factor times the size of the input. Computing the maximum. Compute maximum of n numbers a 1, …, a n. max a 1 for i = 2 to n { if (a i > max) max a i }
42
2007 Pearson Education, Inc. All rights reserved. 42 Linear Time: O(n) Merge. Combine two sorted lists A = a 1,a 2,…,a n with B = b 1,b 2,…,b n into sorted whole. Claim. Merging two lists of size n takes O(n) time. Pf. After each comparison, the length of output list increases by 1. i = 1, j = 1 while (both lists are nonempty) { if (a i b j ) append a i to output list and increment i else(a i b j )append b j to output list and increment j } append remainder of nonempty list to output list
43
2007 Pearson Education, Inc. All rights reserved. 43 O(n log n) Time O(n log n) time. Arises in divide-and-conquer algorithms. Sorting. Mergesort and heapsort are sorting algorithms that perform O(n log n) comparisons. Largest empty interval. Given n time-stamps x 1, …, x n on which copies of a file arrive at a server, what is largest interval of time when no copies of the file arrive? O(n log n) solution. Sort the time-stamps. Scan the sorted list in order, identifying the maximum gap between successive time-stamps. also referred to as linearithmic time
44
2007 Pearson Education, Inc. All rights reserved. 44 Quadratic Time: O(n 2 ) Quadratic time. Enumerate all pairs of elements. Closest pair of points. Given a list of n points in the plane (x 1, y 1 ), …, (x n, y n ), find the pair that is closest. O(n 2 ) solution. Try all pairs of points. Remark. (n 2 ) seems inevitable, but this is just an illusion. min (x 1 - x 2 ) 2 + (y 1 - y 2 ) 2 for i = 1 to n { for j = i+1 to n { d (x i - x j ) 2 + (y i - y j ) 2 if (d < min) min d } don't need to take square roots see chapter 5
45
2007 Pearson Education, Inc. All rights reserved. 45 Cubic Time: O(n 3 ) Cubic time. Enumerate all triples of elements. Set disjointness. Given n sets S 1, …, S n each of which is a subset of 1, 2, …, n, is there some pair of these which are disjoint? O(n 3 ) solution. For each pairs of sets, determine if they are disjoint. foreach set S i { foreach other set S j { foreach element p of S i { determine whether p also belongs to S j } if (no element of S i belongs to S j ) report that S i and S j are disjoint }
46
2007 Pearson Education, Inc. All rights reserved. 46 Polynomial Time: O(n k ) Time Independent set of size k. Given a graph, are there k nodes such that no two are joined by an edge? O(n k ) solution. Enumerate all subsets of k nodes. – Check whether S is an independent set = O(k 2 ). – Number of k element subsets = – O(k 2 n k / k!) = O(n k ). foreach subset S of k nodes { check whether S in an independent set if (S is an independent set) report S is an independent set } poly-time for k=17, but not practical k is a constant
47
2007 Pearson Education, Inc. All rights reserved. 47 Exponential Time Independent set. Given a graph, what is maximum size of an independent set? O(n 2 2 n ) solution. Enumerate all subsets. S* foreach subset S of nodes { check whether S in an independent set if (S is largest independent set seen so far) update S* S }
48
2007 Pearson Education, Inc. All rights reserved. 48 16.3 Selection Sort Selection sort – At ith iteration Swaps the ith smallest element with element i – After ith iteration Smallest i elements are sorted in increasing order in first i positions – Requires a total of (n 2 – n)/2 comparisons Iterates n - 1 times - In ith iteration, locating ith smallest element requires n – i comparisons Has Big O of O(n 2 )
49
2007 Pearson Education, Inc. All rights reserved. 49 Outline fig16_01.c (1 of 4 )
50
2007 Pearson Education, Inc. All rights reserved. 50 Outline fig16_01.c (2 of 4 ) Store the index of the smallest element in the remaining array Iterate through the whole array length – 1 times Initializes the index of the smallest element to the current item Determine the index of the remaining smallest element Place the smallest remaining element in the next spot
51
2007 Pearson Education, Inc. All rights reserved. 51 Outline fig16_01.c (3 of 4 ) Swap two elements
52
2007 Pearson Education, Inc. All rights reserved. 52 Outline fig16_01.c (4 of 4 )
53
2007 Pearson Education, Inc. All rights reserved. 53 16.4 Insertion Sort Insertion sort – At ith iteration Insert (i + 1)th element into correct position with respect to first i elements – After ith iteration First i elements are sorted – Requires a worst-case of n 2 inner-loop iterations Outer loop iterates n - 1 times - Inner loop requires n – 1iterations in worst case For determining Big O, nested statements mean multiply the number of iterations Has Big O of O(n 2 )
54
2007 Pearson Education, Inc. All rights reserved. 54 Outline fig16_02.c (1 of 4 )
55
2007 Pearson Education, Inc. All rights reserved. 55 Outline fig16_02.c (2 of 4 ) Holds the element to be inserted while the order elements are moved Iterate through length - 1 items in the array Stores the value of the element that will be inserted in the sorted portion of the array Loop to locate the correct position to insert the element Moves an element to the right and decrement the position at which to insert the next element Keep track of where to insert the element
56
2007 Pearson Education, Inc. All rights reserved. 56 Outline fig16_02.c (3 of 4 ) Inserts the element in place
57
2007 Pearson Education, Inc. All rights reserved. 57 Outline fig16_02.c (4 of 4 )
58
2007 Pearson Education, Inc. All rights reserved. 58 16.5 Merge Sort Merge sort – Sorts array by Splitting it into two equal-size subarrays - If array size is odd, one subarray will be one element larger than the other Sorting each subarray Merging them into one larger, sorted array - Repeatedly compare smallest elements in the two subarrays - The smaller element is removed and placed into the larger, combined array
59
2007 Pearson Education, Inc. All rights reserved. 59 16.5 Merge Sort (Cont.) – Our recursive implementation Base case - An array with one element is already sorted Recursion step - Split the array (of ≥ 2 elements) into two equal halves If array size is odd, one subarray will be one element larger than the other - Recursively sort each subarray - Merge them into one larger, sorted array
60
2007 Pearson Education, Inc. All rights reserved. 60 16.5 Merge Sort (Cont.) Merge sort (Cont.) – Sample merging step Smaller, sorted arrays - A: 4 10 34 56 77 - B: 5 30 51 52 93 Compare smallest element in A to smallest element in B - 4 (A) is less than 5 (B) 4 becomes first element in merged array - 5 (B) is less than 10 (A) 5 becomes second element in merged array - 10 (A) is less than 30 (B) 10 becomes third element in merged array - Etc.
61
2007 Pearson Education, Inc. All rights reserved. 61 Outline fig16_03.c (1 of 8 )
62
2007 Pearson Education, Inc. All rights reserved. 62 Outline fig16_03.c (2 of 8 ) Call function sortSubArray with 0 and length – 1 as the beginning and ending indices Test the base case Split the array in two
63
2007 Pearson Education, Inc. All rights reserved. 63 Outline fig16_03.c (3 of 8 ) Recursively call function sortSubArray on the two subarrays Combine the two sorted arrays into one larger, sorted array
64
2007 Pearson Education, Inc. All rights reserved. 64 Outline fig16_03.c (4 of 8 ) Loop until the end of either subarray is reached Test which element at the beginning of the arrays is smaller Place the smaller element in the combined array Fill the combined array with the remaining elements of the right array or … … else fill the combined array with the remaining elements of the left array Copy the combined array into the original array
65
2007 Pearson Education, Inc. All rights reserved. 65 Outline fig16_03.c (5 of 8 )
66
2007 Pearson Education, Inc. All rights reserved. 66 Outline fig16_03.c (6 of 8 )
67
2007 Pearson Education, Inc. All rights reserved. 67 Outline fig16_03.c (7 of 8 )
68
2007 Pearson Education, Inc. All rights reserved. 68 Outline fig16_03.c (8 of 8 )
69
2007 Pearson Education, Inc. All rights reserved. 69 16.5 Merge Sort (Cont.) Efficiency of merge sort – n log n runtime arrays means log 2 n levels to reach base case - Doubling size of array requires one more level - Quadrupling size of array requires two more levels O(n) comparisons are required at each level - Calling sortSubArray with a size-n array results in Two sortSubArray calls with size-n/2 subarrays A merge operation with n – 1 (order n) comparisons - So, always order n total comparisons at each level Represented in Big O notation as O(n log n) - Pronounced “on the order of n log n” or “order n log n”
70
2007 Pearson Education, Inc. All rights reserved. 70 Fig. 16.4 | Searching and sorting algorithms with Big O values.
71
2007 Pearson Education, Inc. All rights reserved. 71 Fig. 16.5 | Approximate number of comparisons for common Big O notations.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.