Download presentation
Presentation is loading. Please wait.
Published byTheodora Roxanne Boone Modified over 9 years ago
1
Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004
2
Announcements Homework #5 Midterm March 4 Review: March 2
3
Today - Recall Sorting - Implementation Issues - Average case RT for quicksort - Timing Results
4
Total Recall: Sorting Algorithms
5
The Bible Robert Sedgewick Algorithms in C Parts 1-4 Fundamentals, Data Structures, Sorting, Searching Addison-Wesley 1998
6
Multiple Keys We could use a special comparator function (this would require a special function for each combination of keys). Easier is often to - first sort by name - stable sort by year Done!
7
Sorting Review Several simple, quadratic algorithms (worst case and average). - Bubble Sort - Selection Sort - Insertion Sort Only Insertion Sort of practical interest: running time linear in number of inversion of input sequence. Constants small. Also stable.
8
Sorting Review Asymptotically optimal O(n log n) algorithms (worst case and average). - Merge Sort - Heap Sort Merge Sort purely sequential and stable. But requires extra memory: 2n + O(log n).
9
Quick Sort Overall fastest. In place. BUT: Worst case quadratic. Not stable. Implementation details messy.
10
Picking An Algorithm First Question: Is the input short? Short means something like n < 500. In this case Insertion Sort is probably the best choice. Don't bother with asymptotically faster methods.
11
Picking An Algorithm Second Question: Does the input have special properties? E.g., if the number of inversions is small, Insertion Sort may be the best choice. Or linear sorting methods may be appropriate.
12
Otherwise: Quick Sort Large inputs, comparison based method, stability not required (recall our stabilizer trick, though). Quick Sort is worst case quadratic, why should it be the default candidate? On average, Quick Sort is O(n log ), and the constants are quite small.
13
Average ??? Average case analysis requires a probability distribution on the inputs: we have to average the running times. t(n) = p x t(x) where the sum is over all instances of size n and p x is the probability of getting instance x. Often simply assume uniform distribution: every instance (of a certain size) is equally likely.
14
A Computation Can we write down a recurrence equation? Can we solve the equation? At least approximately? Is the solution (if any) practically relevant? (see handout from last time)
15
Implementing Quick Sort
16
Pivot Selection Ideally, the pivot should be the median. Much too slow to be of practical value. Instead either - pick the pivot at random, or - take the median of a small sample.
17
Partitioning Partitioning is easy if we use extra scratch space. But we would like to partition in place. Need to move elements within the same given block of the big array. Basic idea: use two pointers, sweep across block from left and right till an out-of-place element is encountered. Swap them.
18
1. Doing quicksort in place 85 24 63 50 17 31 96 45 85 24 63 45 17 31 96 50 LR LR 31 24 63 45 17 85 96 50 LR
19
1. Doing quicksort in place 31 24 63 45 17 85 96 50 LR 31 24 17 45 63 85 96 50 RL 31 24 17 45 50 85 96 63 31 24 17 45 63 85 96 50 LR
20
Pseudo Code i = lo – 1; j = hi; while( true ) { while( A[++i] < p ); while( p < a[--j] ) if( j==lo ) break; if( i >= j ) break; swap( i, j ); } swap( i, hi ); return i;
21
Getting Out Using Quick Sort on very short arrays is a bad idea: the overhead becomes too large. So, when the block becomes short we should exit Quick Sort and switch to Insertion Sort. But not locally: quicksort( A, lo, hi ) { if( hi – lo < magic_number ) insertionsort( A, lo, hi ); else …
22
Getting Out Just do nothing when the block is short. Then do one global cleanup with insertion sort. quicksort( A, 0, n ) insertionsort( A, 0, n ); This is linear, since the number of inversions is linear.
23
Magic Number The best way to determine the magic number is to run real-world tests. It seems that for current architectures, some value in the range 5 to 20 will work best.
24
Equal Elements Note that ideally pivoting should produce three sub-blocks: left:< p middle:== p right:> p Then the recursion could ignore the middle part, possibly omitting many elements.
25
Equal Elements Three natural strategies: Both pointers stop. Only one pointer stops. Neither pointer stops. Fact: The first strategy works best overall.
26
Equal Elements There are clever implementations that partition into three sub-blocks. This is amazingly hard to get both right and fast. Try it!
27
Application: Quick Select
28
Selection (Order Statistics) A classical problem: given a list, find the k-th element in the ordered list. The brute-force approach sorts the whole list first, and thus produces more information than required. Can we get away with less than n log n work (in a comparison based world)?
29
Easy Cases Needless to say, when k is small there are easy answers. - Scan the array and keep track of the k smallest. - Use a Selection Sort approach. But how about general k?
30
Selection and Partitioning qselect( A, lo, hi, k ) { if( hi <= lo ) return; i = partition( A, lo, hi ); if( i > k ) qselect( A, lo, i-1, k ); if( i < k ) qselect( A, i+1, hi, k ); } This looks like a typo. What’s really going on here?
31
Quick Select What should we expect as running time? As usual, if there is a ghost in the machine, it could force quadratic behavior. But on average this algorithm is linear. Don’t get any ideas about using this to find the median in the pivoting step of Quick Sort!
32
Some Timing Results
33
The Real World Beyond asymptotic analysis, it is always a good idea to do some real world testing. Construct a small test-bed: - automate testing - flexible but simple - organize the data in a useful way
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.