Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 24 February 2004.

Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 24 February 2004

Announcements Homework 5 Midterm March 4 Review: March 2

Total Recall: Sorting Algorithms

Stable Sorting Algorithms An important notion is stability: A sorting algorithm is stable if it does not change the relative order of equal elements. a[i] = a[j], i < j and f(i) < f(j) Stability is useful when sorting wrto multiple keys. item: (name, year, … ) Suppose we want to sort by year, and lexicographic within each year.

Multiple Keys We could use a special comparator function (this would require a special function for each combination of keys). Easier is often to - first sort by name - stable sort by year Done!

Sorting Review Several simple, quadratic algorithms (worst case and average). - Bubble Sort - Insertion Sort - Shell Sort (sub-quadratic) Only Insertion Sort of practical interest: running time linear in number of inversion of input sequence. Constants small. Stable?

Sentinels: Small constants insertionSort(a, n) { for( i = 2; i <= n; i++ ) x = A[i]; A[0] = x; for( j = i; x < A[j-1]; j-- ) A[j] = A[j-1]; A[j] = x; }

Sorting Review Asymptotically optimal O(n log n) algorithms (worst case and average). - Merge Sort - Heap Sort Merge Sort purely sequential and stable. But requires extra memory: 2n + O(log n).

Quick Sort Overall fastest. In place. BUT: Worst case quadratic. Not stable. Implementation details messy.

IBM Type 82 Sorter (1949)

Radix Sort Used by old computer-card-sorting machines. Linear time: b passes on b-bit elements b/m passes m bits per pass Each pass must be stable BUT: Uses 2n+2 m space. May only beat Quick Sort for very large arrays.

Picking An Algorithm First Question: Is the input short? Short means something like n < 500. In this case Insertion Sort is probably the best choice. Don't bother with asymptotically faster methods.

Picking An Algorithm Second Question: Does the input have special properties? E.g., if the number of inversions is small, Insertion Sort may be the best choice. Or linear sorting methods may be appropriate.

Otherwise: Quick Sort Large inputs, comparison based method, not stable (stabilizer trick, though). Quick Sort is worst case quadratic, why should it be the default candidate? On average, Quick Sort is O(n log n), and the constants are quite small.

Why divide-and-conquer works Suppose the amount of work required to divide and recombine is linear, that is, O(n). Suppose also that the amount of work to complete each step is greater than O(n). Then each dividing step reduces the amount of work by greater than a linear amount, while requiring only linear work to do so.

Two Major Approaches 1. Make the split trivial, but perform some work when the pieces are combined  Merge Sort. 2.Work during the split, but then do nothing in the combination step  Quick Sort. In either case, the overhead should be linear with small constants.

Divide-and-conquer

Merge Sort

The Algorithm The main function. List MergeSort( List L ) { if( length(L) <= 1 ) return L; A = first half of L; B = second half of L; return merge(MergeSort(A),MergeSort(B)); }

The Algorithm Merging the two sorted parts here is responsible for the overhead. merge( nil, B ) = B; merge( A, nil ) = A; merge( a A, b B ) = if( a <= b ) prepend( merge( A, b B ), a ) else prepend( merge( a A, B ), b )

Harsh Reality In reality, the items are always given in an array. The first and second half can be found by index arithmetic. LLRL

But Note … We cannot perform the merge operation in place. Rather, we need to have another array as scratch space. The total space requirement for Merge Sort is 2n + O(log n) Assuming the recursive implementation.

Running Time Solving the recurrence equation for Merge Sort one can see that the running time is O(n log n) Since Merge Sort reads the data strictly sequentially it is sometimes useful when data reside on slow external media. But overall it is no match for Quick Sort.

Implementing Quick Sort

Quicksort Performance: Worse case: O(N 2 ) Average case: O(Nlog N). Space: In-place plus stack More importantly, it is the fastest known comparison-based sorting algorithm in practice. But it is fragile: Mistakes can go unnoticed.

Quicksort - Basic form Divide and Conquer quicksort( Comparable [] a, int low, int high ) { int i; if( high > low ) { i = partition(a, low, high); quicksort(a, low, i-1); quicksort(a, i+1, high); }

Partitioning Partitioning is easy if we use extra scratch space. But we would like to partition in place. Need to move elements within a chunk of the big array. Basic idea: use two pointers, sweep across chunk from left and from right until two out-of-place elements are encountered. Swap them.

Doing quicksort in place 85 24 63 50 17 31 96 45 85 24 63 45 17 31 96 50 LR LR 31 24 63 45 17 85 96 50 LR

Doing quicksort in place 31 24 63 45 17 85 96 50 LR 31 24 17 45 63 85 96 50 RL 31 24 17 45 50 85 96 63 31 24 17 45 63 85 96 50 LR

Partition: In-Place p = random(); pivot = a[p]; swap( a, p, high ); for( i = low - 1, j = high; ; ) { while( a[++i] < pivot ) ; while( (j > low) & (pivot < a[--j]) ) ; if( i >= j ) break; swap( a, i, j ); } swap( a, i, high ); return i;

Pivot Selection Ideally, the pivot should be the median. Much too slow to be of practical value. Instead either - pick the pivot at random, or - take the median of a small sample.

Take median of elements at low, mid, high mid = ( low + high ) / 2; if( a[mid] < a[low] ) swap( a, low, mid ); if( a[high] < a[low] ) swap( a, low, high ); if(a[high] < a[mid] ) swap( a, mid, high ); pivot = a[mid]; Median of Three

Partition: Median-of-Three swap( a, mid, high - 1 ); for( i = low, j = high - 1; ; ) { while( a[++i] < pivot ) ; while( pivot < a[--j] ) ; if( i >= j ) break; swap( a, i, j ); } swap( a, i, high - 1 ); return i; Now have sentinels for left and right scans.

Getting Out Using Quick Sort on very short arrays is a bad idea: the overhead becomes too large. So, when the block becomes short we should exit Quick Sort and switch to Insertion Sort. But not locally: quicksort( a, low, high ) { if( high – low < magic_number ) insertionSort( a, low, high ); else …

Getting Out Just do nothing when the block is short. Then do one global cleanup with insertion sort. quicksort( A, 0, n ) insertionSort( A, 0, n ); This is linear, since the number of inversions is linear. Caveat: InsertionSort may hide errors in quicksort.

Magic Number The best way to determine the magic number is to run real-world tests. It seems that for current architectures, some value in the range 5 to 20 will work best.

Equal Elements Note that ideally pivoting should produce three sub-blocks: left:< p middle:== p right:> p Then the recursion could ignore the middle part, possibly omitting many elements.

Equal Elements Three natural strategies: Both pointers stop. Only one pointer stops. Neither pointer stops. Fact: The first strategy works best overall as it tends to keep partitions balanced.

Equal Elements There are clever implementations that partition into three sub-blocks. This is amazingly hard to get both correct and fast. Try it!

Application: Quick Select

Selection (Order Statistics) A classical problem: given a list, find the k-th element in the ordered list. The brute-force approach sorts the whole list first, and thus produces more information than required. Can we get away with less than O(n log n) work (in a comparison based world)?

Easy Cases Needless to say, when k is small there are easy answers. - Scan the array and keep track of the k smallest. - Use a Selection Sort approach. But how about general k?

Selection and Partitioning qselect( a, low, high, k ) { if( high <= low ) return; i = partition( a, low, high ); if( i > k ) qselect( a, low, i-1, k ); if( i < k ) qselect( a, i+1, high, k ); } This looks like a typo. What’s really going on here?

Quick Select What should we expect as running time? As usual, if there is a ghost in the machine, it could force quadratic behavior. But on average this algorithm is linear. Don’t get any ideas about using this to find the median in the pivoting step of Quick Sort!

Divide-and-conquer

Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 24 February 2004.

Similar presentations

Presentation on theme: "Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 24 February 2004."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 24 February 2004.

Similar presentations

Presentation on theme: "Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 24 February 2004."— Presentation transcript:

Similar presentations

About project

Feedback