When is O(n lg n) Really O(n lg n)? A Comparison of the Quicksort and Heapsort Algorithms Gerald Kruse Juniata College Huntingdon, PA
Outline Analyzing Sorting Algorithms Quicksort Heapsort Experimental Results Observations (this is a fun, open-ended, student project)
How Fast is my Sorting Algorithm? “A nice blend of Math and CS” The Sorting Problem, from Cormen et. al. Input: A sequence of n numbers, (a 1, a 2, … a n ) Output: A permutation (reordering) (a 1 ’, a 2 ’, … a n ’) of the input sequence such that a 1 ’ ≤ a 2 ’ ≤ … ≤ a n ’ Note: This definition can be expanded to include sorting primitive data such as characters or strings, alpha-numeric data, and data records with key values. Sorting algorithms are analyzed using many different metrics: expected run-time, memory usage, communication bandwidth, implementation complexity, … Expected running time is given using “Big-O” notation O( g(n) ) = { f(n): pos. constants c and n 0 s.t. 0 ≤ f(n) ≤ c*g(n) n ≥ n 0 }. While O-notation describes an asymptotic upper bound on a function, it is frequently used to describe asymptotically tight bounds.
Algorithm analysis also requires a model of the implementation technology to be used The most commonly used model is RAM, the Random- Access Machine. This should NOT be confused with Random-Access Memory. Each instruction requires an equal amount of processing time Memory hierarchy (cache, virtual memory) is NOT modeled The RAM model is relatively straightforward and “usually an excellent predictor of performance on actual machines.”
Quicksort “Good” partitioning means the partitions are usually equally sized After a partition, the element partitioned around will be in the correct position There are n compares per level, and log(n) levels, resulting in an algorithm that should run proportionally to n * lg n, taking the assumptions of the RAM model
Quicksort Pathological data leads to “bad” or unbalanced partitions and the worst- case for Quicksort The element partitioned around will be in sorted position This data will be sorted in O(n 2 ) time, since there are still n compares per level, but now there are n -1 levels.
A heap can be seen as a complete binary tree: Heaps In practice, heaps are usually implemented as arrays A =
Heaps, continued Heaps satisfy the heap property: A[Parent(i)] A[i]for all nodes i > 1 In other words, the value of a node is at most the value of its parent. By the way, e-Bay uses a “heap-like” data structure to track bids.
Heapsort Heapsort(A){BuildHeap(A); for (i = length(A) downto 2) { Swap(A[1], A[i]); heap_size(A) -= 1; Heapify(A, 1); }} When the heap property is violated at just one node (which has sub- trees which are valid heaps), Heapify “floats down” the parent node to fix the heap. Remembering the tree structure of the heap, each Heapify call takes O(lg n) time. Since there are n – 1 calls to Heapify, Heapsort’s expected execution time is O(n lg n), just like Quicksort.
Counting Comparisons
Timing Results
Observations Implementation Implementation Run on Windows and Unix based machines, implemented in C, C++, and Java, and based on psuedo-code from: Cormen et. al., Sedgewick, and Joyce et. al. Heapsort does not run in O(n lg n) time even for the relatively small values of n tested Heapsort does not run in O(n lg n) time even for the relatively small values of n tested Quicksort does exhibit O(n lg n) behavior Quicksort does exhibit O(n lg n) behavior Consider the memory access patterns For very large n, we would expect a slowdown for ANY algorithm as the data no longer fits in memory For the size n run here, the partitions in Quicksort consist of elements which are contiguous in memory, while “floating down” a Heap requires accessing elements which are not close in memory Consider the memory access patterns For very large n, we would expect a slowdown for ANY algorithm as the data no longer fits in memory For the size n run here, the partitions in Quicksort consist of elements which are contiguous in memory, while “floating down” a Heap requires accessing elements which are not close in memory This is a fun exploration for students, appealing to those with an interest in the mathematics or computer science
Bibliography T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, “Introduction to Algorithms, Second Edition,” Cambridge, MA/London, England: The MIT Press/McGraw-Hill, N. Dale, C. Weems, D. T. Joyce, “Object-Oriented Data Structures Using Java,” Boston, MA: Jones and Bartlett, M. T. Goodrich and R. Tamassia, “Algorithm Design: Foundation, Analysis, and Internet Examples,” Wiley: New York: D. E. Knuth, “The Art of Computer Programming, Volume 3: (Second Edition) Sorting and Searching,” Addison-Wesley-Longman: Redwood City, CA, C. C. McGeoch, “Analyzing algorithms by simulation: Variance reduction techniques and simulation speedups,” ACM Computing Surveys, vol. 24, no. 2, pp. 195 – 212, C. C. McGeoch, D. Precup, and P. R. Cohen, “How to find the Big-Oh of your data set (and how not to),” Advances in Intelligent Data Analysis, vol of Lecture Notes in Computer Science, pp. 41 – 52, Springer-Verlag, R. Sedgewick, “Algorithms in C, Parts 1-4: Fundamentals, Data Structures, Sorting, Searching, Third Edition,” Addison-Wesley: Boston, MA, 1997