Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 24 February 2004.

Slides:



Advertisements
Similar presentations
Algorithms Analysis Lecture 6 Quicksort. Quick Sort Divide and Conquer.
Advertisements

CSE 3101: Introduction to the Design and Analysis of Algorithms
§7 Quicksort -- the fastest known sorting algorithm in practice 1. The Algorithm void Quicksort ( ElementType A[ ], int N ) { if ( N < 2 ) return; pivot.
ADA: 5. Quicksort1 Objective o describe the quicksort algorithm, it's partition function, and analyse its running time under different data conditions.
Quick Sort, Shell Sort, Counting Sort, Radix Sort AND Bucket Sort
Stephen P. Carl - CS 2421 Recursive Sorting Algorithms Reading: Chapter 5.
Chapter 4: Divide and Conquer Master Theorem, Mergesort, Quicksort, Binary Search, Binary Trees The Design and Analysis of Algorithms.
DIVIDE AND CONQUER APPROACH. General Method Works on the approach of dividing a given problem into smaller sub problems (ideally of same size).  Divide.
Quicksort CSE 331 Section 2 James Daly. Review: Merge Sort Basic idea: split the list into two parts, sort both parts, then merge the two lists
Quicksort CS 3358 Data Structures. Sorting II/ Slide 2 Introduction Fastest known sorting algorithm in practice * Average case: O(N log N) * Worst case:
25 May Quick Sort (11.2) CSE 2011 Winter 2011.
Quicksort COMP171 Fall Sorting II/ Slide 2 Introduction * Fastest known sorting algorithm in practice * Average case: O(N log N) * Worst case: O(N.
1 Today’s Material Divide & Conquer (Recursive) Sorting Algorithms –QuickSort External Sorting.
Quicksort, Mergesort, and Heapsort. Quicksort Fastest known sorting algorithm in practice  Caveats: not stable  Vulnerable to certain attacks Average.
Updated QuickSort Problem From a given set of n integers, find the missing integer from 0 to n using O(n) queries of type: “what is bit[j]
1 Sorting Problem: Given a sequence of elements, find a permutation such that the resulting sequence is sorted in some order. We have already seen: –Insertion.
Fundamentals of Algorithms MCS - 2 Lecture # 16. Quick Sort.
Lecture 25 Selection sort, reviewed Insertion sort, reviewed Merge sort Running time of merge sort, 2 ways to look at it Quicksort Course evaluations.
Quicksort. 2 Introduction * Fastest known sorting algorithm in practice * Average case: O(N log N) * Worst case: O(N 2 ) n But, the worst case seldom.
Quicksort.
TDDB56 DALGOPT-D DALG-C Lecture 8 – Sorting (part I) Jan Maluszynski - HT Sorting: –Intro: aspects of sorting, different strategies –Insertion.
Quicksort.
TTIT33 Algorithms and Optimization – Dalg Lecture 2 HT TTIT33 Algorithms and optimization Lecture 2 Algorithms Sorting [GT] 3.1.2, 11 [LD] ,
Quicksort
Chapter 7 (Part 2) Sorting Algorithms Merge Sort.
CS2420: Lecture 11 Vladimir Kulyukin Computer Science Department Utah State University.
Sorting II/ Slide 1 Lecture 24 May 15, 2011 l merge-sorting l quick-sorting.
Sorting (Part II: Divide and Conquer) CSE 373 Data Structures Lecture 14.
Divide-And-Conquer Sorting Small instance.  n
1 Data Structures and Algorithms Sorting. 2  Sorting is the process of arranging a list of items into a particular order  There must be some value on.
CIS 068 Welcome to CIS 068 ! Lesson 9: Sorting. CIS 068 Overview Algorithmic Description and Analysis of Selection Sort Bubble Sort Insertion Sort Merge.
Computer Science 101 Fast Searching and Sorting. Improving Efficiency We got a better best case by tweaking the selection sort and the bubble sort We.
HKOI 2006 Intermediate Training Searching and Sorting 1/4/2006.
Merge Sort. What Is Sorting? To arrange a collection of items in some specified order. Numerical order Lexicographical order Input: sequence of numbers.
Quicksort, Mergesort, and Heapsort. Quicksort Fastest known sorting algorithm in practice  Caveats: not stable  Vulnerable to certain attacks Average.
1 Sorting Algorithms Sections 7.1 to Comparison-Based Sorting Input – 2,3,1,15,11,23,1 Output – 1,1,2,3,11,15,23 Class ‘Animals’ – Sort Objects.
CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest.
CSE373: Data Structure & Algorithms Lecture 22: More Sorting Linda Shapiro Winter 2015.
Sorting: Implementation Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.
Quicksort CSE 2320 – Algorithms and Data Structures Vassilis Athitsos University of Texas at Arlington 1.
Review 1 Selection Sort Selection Sort Algorithm Time Complexity Best case Average case Worst case Examples.
Divide And Conquer A large instance is solved as follows:  Divide the large instance into smaller instances.  Solve the smaller instances somehow. 
Computer Science 101 Fast Algorithms. What Is Really Fast? n O(log 2 n) O(n) O(n 2 )O(2 n )
Sorting and Lower Bounds Fundamental Data Structures and Algorithms Peter Lee February 25, 2003.
Sorting Fundamental Data Structures and Algorithms Aleks Nanevski February 17, 2004.
Sorting divide and conquer. Divide-and-conquer  a recursive design technique  solve small problem directly  divide large problem into two subproblems,
Divide and Conquer Sorting Algorithms COMP s1 Sedgewick Chapters 7 and 8.
PREVIOUS SORTING ALGORITHMS  BUBBLE SORT –Time Complexity: O(n 2 ) For each item, make (n –1) comparisons Gives: Comparisons = (n –1) + (n – 2)
Quicksort Quicksort is a well-known sorting algorithm that, in the worst case, it makes Θ(n 2 ) comparisons. Typically, quicksort is significantly faster.
Quicksort This is probably the most popular sorting algorithm. It was invented by the English Scientist C.A.R. Hoare It is popular because it works well.
QuickSort. Yet another sorting algorithm! Usually faster than other algorithms on average, although worst-case is O(n 2 ) Divide-and-conquer: –Divide:
CS 367 Introduction to Data Structures Lecture 11.
Sorting Fundamental Data Structures and Algorithms Klaus Sutner February 17, 2004.
329 3/30/98 CSE 143 Searching and Sorting [Sections 12.4, ]
CS6045: Advanced Algorithms Sorting Algorithms. Sorting So Far Insertion sort: –Easy to code –Fast on small inputs (less than ~50 elements) –Fast on nearly-sorted.
CMPT 238 Data Structures More on Sorting: Merge Sort and Quicksort.
Sorting and Lower Bounds Fundamental Data Structures and Algorithms Klaus Sutner February 19, 2004.
Advanced Sorting.
Fundamental Data Structures and Algorithms
Chapter 7 Sorting Spring 14
Quicksort "There's nothing in your head the sorting hat can't see. So try me on and I will tell you where you ought to be." -The Sorting Hat, Harry Potter.
Quick Sort (11.2) CSE 2011 Winter November 2018.
CO 303 Algorithm Analysis And Design Quicksort
CSC215 Lecture Algorithms.
Sub-Quadratic Sorting Algorithms
EE 312 Software Design and Implementation I
CSE 373 Data Structures and Algorithms
Algorithms: Design and Analysis
CSE 332: Sorting II Spring 2016.
Presentation transcript:

Sorting: Implementation Fundamental Data Structures and Algorithms Margaret Reid-Miller 24 February 2004

Announcements Homework 5 Midterm March 4 Review: March 2

Total Recall: Sorting Algorithms

Stable Sorting Algorithms An important notion is stability: A sorting algorithm is stable if it does not change the relative order of equal elements. a[i] = a[j], i < j and f(i) < f(j) Stability is useful when sorting wrto multiple keys. item: (name, year, … ) Suppose we want to sort by year, and lexicographic within each year.

Multiple Keys We could use a special comparator function (this would require a special function for each combination of keys). Easier is often to - first sort by name - stable sort by year Done!

Sorting Review Several simple, quadratic algorithms (worst case and average). - Bubble Sort - Insertion Sort - Shell Sort (sub-quadratic) Only Insertion Sort of practical interest: running time linear in number of inversion of input sequence. Constants small. Stable?

Sentinels: Small constants insertionSort(a, n) { for( i = 2; i <= n; i++ ) x = A[i]; A[0] = x; for( j = i; x < A[j-1]; j-- ) A[j] = A[j-1]; A[j] = x; }

Sorting Review Asymptotically optimal O(n log n) algorithms (worst case and average). - Merge Sort - Heap Sort Merge Sort purely sequential and stable. But requires extra memory: 2n + O(log n).

Quick Sort Overall fastest. In place. BUT: Worst case quadratic. Not stable. Implementation details messy.

IBM Type 82 Sorter (1949)

Radix Sort Used by old computer-card-sorting machines. Linear time: b passes on b-bit elements b/m passes m bits per pass Each pass must be stable BUT: Uses 2n+2 m space. May only beat Quick Sort for very large arrays.

Picking An Algorithm First Question: Is the input short? Short means something like n < 500. In this case Insertion Sort is probably the best choice. Don't bother with asymptotically faster methods.

Picking An Algorithm Second Question: Does the input have special properties? E.g., if the number of inversions is small, Insertion Sort may be the best choice. Or linear sorting methods may be appropriate.

Otherwise: Quick Sort Large inputs, comparison based method, not stable (stabilizer trick, though). Quick Sort is worst case quadratic, why should it be the default candidate? On average, Quick Sort is O(n log n), and the constants are quite small.

Why divide-and-conquer works Suppose the amount of work required to divide and recombine is linear, that is, O(n). Suppose also that the amount of work to complete each step is greater than O(n). Then each dividing step reduces the amount of work by greater than a linear amount, while requiring only linear work to do so.

Two Major Approaches 1. Make the split trivial, but perform some work when the pieces are combined  Merge Sort. 2.Work during the split, but then do nothing in the combination step  Quick Sort. In either case, the overhead should be linear with small constants.

Divide-and-conquer

Merge Sort

The Algorithm The main function. List MergeSort( List L ) { if( length(L) <= 1 ) return L; A = first half of L; B = second half of L; return merge(MergeSort(A),MergeSort(B)); }

The Algorithm Merging the two sorted parts here is responsible for the overhead. merge( nil, B ) = B; merge( A, nil ) = A; merge( a A, b B ) = if( a <= b ) prepend( merge( A, b B ), a ) else prepend( merge( a A, B ), b )

Harsh Reality In reality, the items are always given in an array. The first and second half can be found by index arithmetic. LLRL

But Note … We cannot perform the merge operation in place. Rather, we need to have another array as scratch space. The total space requirement for Merge Sort is 2n + O(log n) Assuming the recursive implementation.

Running Time Solving the recurrence equation for Merge Sort one can see that the running time is O(n log n) Since Merge Sort reads the data strictly sequentially it is sometimes useful when data reside on slow external media. But overall it is no match for Quick Sort.

Implementing Quick Sort

Quicksort Performance: Worse case: O(N 2 ) Average case: O(Nlog N). Space: In-place plus stack More importantly, it is the fastest known comparison-based sorting algorithm in practice. But it is fragile: Mistakes can go unnoticed.

Quicksort - Basic form Divide and Conquer quicksort( Comparable [] a, int low, int high ) { int i; if( high > low ) { i = partition(a, low, high); quicksort(a, low, i-1); quicksort(a, i+1, high); }

Partitioning Partitioning is easy if we use extra scratch space. But we would like to partition in place. Need to move elements within a chunk of the big array. Basic idea: use two pointers, sweep across chunk from left and from right until two out-of-place elements are encountered. Swap them.

Doing quicksort in place LR LR LR

Doing quicksort in place LR RL LR

Partition: In-Place p = random(); pivot = a[p]; swap( a, p, high ); for( i = low - 1, j = high; ; ) { while( a[++i] < pivot ) ; while( (j > low) & (pivot < a[--j]) ) ; if( i >= j ) break; swap( a, i, j ); } swap( a, i, high ); return i;

Pivot Selection Ideally, the pivot should be the median. Much too slow to be of practical value. Instead either - pick the pivot at random, or - take the median of a small sample.

Take median of elements at low, mid, high mid = ( low + high ) / 2; if( a[mid] < a[low] ) swap( a, low, mid ); if( a[high] < a[low] ) swap( a, low, high ); if(a[high] < a[mid] ) swap( a, mid, high ); pivot = a[mid]; Median of Three

Partition: Median-of-Three swap( a, mid, high - 1 ); for( i = low, j = high - 1; ; ) { while( a[++i] < pivot ) ; while( pivot < a[--j] ) ; if( i >= j ) break; swap( a, i, j ); } swap( a, i, high - 1 ); return i; Now have sentinels for left and right scans.

Getting Out Using Quick Sort on very short arrays is a bad idea: the overhead becomes too large. So, when the block becomes short we should exit Quick Sort and switch to Insertion Sort. But not locally: quicksort( a, low, high ) { if( high – low < magic_number ) insertionSort( a, low, high ); else …

Getting Out Just do nothing when the block is short. Then do one global cleanup with insertion sort. quicksort( A, 0, n ) insertionSort( A, 0, n ); This is linear, since the number of inversions is linear. Caveat: InsertionSort may hide errors in quicksort.

Magic Number The best way to determine the magic number is to run real-world tests. It seems that for current architectures, some value in the range 5 to 20 will work best.

Equal Elements Note that ideally pivoting should produce three sub-blocks: left:< p middle:== p right:> p Then the recursion could ignore the middle part, possibly omitting many elements.

Equal Elements Three natural strategies: Both pointers stop. Only one pointer stops. Neither pointer stops. Fact: The first strategy works best overall as it tends to keep partitions balanced.

Equal Elements There are clever implementations that partition into three sub-blocks. This is amazingly hard to get both correct and fast. Try it!

Application: Quick Select

Selection (Order Statistics) A classical problem: given a list, find the k-th element in the ordered list. The brute-force approach sorts the whole list first, and thus produces more information than required. Can we get away with less than O(n log n) work (in a comparison based world)?

Easy Cases Needless to say, when k is small there are easy answers. - Scan the array and keep track of the k smallest. - Use a Selection Sort approach. But how about general k?

Selection and Partitioning qselect( a, low, high, k ) { if( high <= low ) return; i = partition( a, low, high ); if( i > k ) qselect( a, low, i-1, k ); if( i < k ) qselect( a, i+1, high, k ); } This looks like a typo. What’s really going on here?

Quick Select What should we expect as running time? As usual, if there is a ghost in the machine, it could force quadratic behavior. But on average this algorithm is linear. Don’t get any ideas about using this to find the median in the pivoting step of Quick Sort!

Divide-and-conquer