CSE 326 Sorting David Kaplan Dept of Computer Science & Engineering Autumn 2001.

Slides:



Advertisements
Similar presentations
Sorting in Linear Time Introduction to Algorithms Sorting in Linear Time CSE 680 Prof. Roger Crawfis.
Advertisements

Introduction to Algorithms Quicksort
Non-Comparison Based Sorting
Linear Sorts Counting sort Bucket sort Radix sort.
CSE 3101: Introduction to the Design and Analysis of Algorithms
MS 101: Algorithms Instructor Neelima Gupta
CSE332: Data Abstractions Lecture 14: Beyond Comparison Sorting Dan Grossman Spring 2010.
Quick Sort, Shell Sort, Counting Sort, Radix Sort AND Bucket Sort
CSCE 3110 Data Structures & Algorithm Analysis
Using Divide and Conquer for Sorting
CSE 373: Data Structures and Algorithms
25 May Quick Sort (11.2) CSE 2011 Winter 2011.
1 Sorting Problem: Given a sequence of elements, find a permutation such that the resulting sequence is sorted in some order. We have already seen: –Insertion.
Comp 122, Spring 2004 Lower Bounds & Sorting in Linear Time.
1 CSE 326: Sorting Henry Kautz Autumn Quarter 2002.
Divide and Conquer Sorting
CSE 326: Data Structures Sorting Ben Lerner Summer 2007.
Analysis of Algorithms CS 477/677
CSC 2300 Data Structures & Algorithms March 20, 2007 Chapter 7. Sorting.
1 CSE 326: Data Structures: Sorting Lecture 16: Friday, Feb 14, 2003.
1 CSE 326: Data Structures: Sorting Lecture 17: Wednesday, Feb 19, 2003.
CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University1 Sorting - 3 CS 202 – Fundamental Structures of Computer Science II.
CS 146: Data Structures and Algorithms July 14 Class Meeting Department of Computer Science San Jose State University Summer 2015 Instructor: Ron Mak
David Luebke 1 8/17/2015 CS 332: Algorithms Linear-Time Sorting Continued Medians and Order Statistics.
Sorting in Linear Time Lower bound for comparison-based sorting
CSE 373 Data Structures Lecture 15
1 Time Analysis Analyzing an algorithm = estimating the resources it requires. Time How long will it take to execute? Impossible to find exact value Depends.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Merge Sort. What Is Sorting? To arrange a collection of items in some specified order. Numerical order Lexicographical order Input: sequence of numbers.
David Luebke 1 10/13/2015 CS 332: Algorithms Linear-Time Sorting Algorithms.
CSC 41/513: Intro to Algorithms Linear-Time Sorting Algorithms.
Introduction to Algorithms Jiafen Liu Sept
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
Sorting Fun1 Chapter 4: Sorting     29  9.
CSE332: Data Abstractions Lecture 14: Beyond Comparison Sorting Dan Grossman Spring 2012.
Analysis of Algorithms CS 477/677
1 Joe Meehean.  Problem arrange comparable items in list into sorted order  Most sorting algorithms involve comparing item values  We assume items.
1 CSE 326: Data Structures Sorting It All Out Henry Kautz Winter Quarter 2002.
Bucket & Radix Sorts. Efficient Sorts QuickSort : O(nlogn) – O(n 2 ) MergeSort : O(nlogn) Coincidence?
CSE 326: Data Structures Lecture 24 Spring Quarter 2001 Sorting, Part B David Kaplan
Mudasser Naseer 1 11/5/2015 CSC 201: Design and Analysis of Algorithms Lecture # 8 Some Examples of Recursion Linear-Time Sorting Algorithms.
1 Sorting Algorithms Sections 7.1 to Comparison-Based Sorting Input – 2,3,1,15,11,23,1 Output – 1,1,2,3,11,15,23 Class ‘Animals’ – Sort Objects.
CSE373: Data Structure & Algorithms Lecture 22: More Sorting Linda Shapiro Winter 2015.
CSE 326: Data Structures Lecture #16 Sorting Things Out Bart Niswonger Summer Quarter 2001.
1 CSE 326: Data Structures A Sort of Detour Henry Kautz Winter Quarter 2002.
Review 1 Selection Sort Selection Sort Algorithm Time Complexity Best case Average case Worst case Examples.
CS 146: Data Structures and Algorithms July 14 Class Meeting Department of Computer Science San Jose State University Summer 2015 Instructor: Ron Mak
COSC 3101A - Design and Analysis of Algorithms 6 Lower Bounds for Sorting Counting / Radix / Bucket Sort Many of these slides are taken from Monica Nicolescu,
Data Structures Haim Kaplan & Uri Zwick December 2013 Sorting 1.
Sorting Fundamental Data Structures and Algorithms Aleks Nanevski February 17, 2004.
1 CSE 326: Data Structures Sorting in (kind of) linear time Zasha Weinberg in lieu of Steve Wolfman Winter Quarter 2000.
Sorting divide and conquer. Divide-and-conquer  a recursive design technique  solve small problem directly  divide large problem into two subproblems,
CSE 326: Data Structures Lecture 23 Spring Quarter 2001 Sorting, Part 1 David Kaplan
Chapter 4, Part II Sorting Algorithms. 2 Heap Details A heap is a tree structure where for each subtree the value stored at the root is larger than all.
SORTING AND ASYMPTOTIC COMPLEXITY Lecture 13 CS2110 – Fall 2009.
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 1 Chapter 7 Sorting Sort is.
Sorting & Lower Bounds Jeff Edmonds York University COSC 3101 Lecture 5.
CS6045: Advanced Algorithms Sorting Algorithms. Sorting So Far Insertion sort: –Easy to code –Fast on small inputs (less than ~50 elements) –Fast on nearly-sorted.
INTRO2CS Tirgul 8 1. Searching and Sorting  Tips for debugging  Binary search  Sorting algorithms:  Bogo sort  Bubble sort  Quick sort and maybe.
1 CSE 326: Data Structures Sorting by Comparison Zasha Weinberg in lieu of Steve Wolfman Winter Quarter 2000.
Lower Bounds & Sorting in Linear Time
Sorting.
Quick Sort (11.2) CSE 2011 Winter November 2018.
Lower Bounds & Sorting in Linear Time
Linear-Time Sorting Algorithms
CSE 326: Data Structures Sorting
CSE 326: Data Structures: Sorting
The Selection Problem.
Presentation transcript:

CSE 326 Sorting David Kaplan Dept of Computer Science & Engineering Autumn 2001

SortingCSE 326 Autumn Outline Sorting: The Problem Space Sorting by Comparison  Lower bound for comparison sorts  Insertion Sort  Heap Sort  Merge Sort  Quick Sort External Sorting Comparison of Sorting by Comparison

SortingCSE 326 Autumn Sorting: The Problem Space General problem Given a set of N orderable items, put them in order Without (significant) loss of generality, assume:  Items are integers  Ordering is  (Most sorting problems map to the above in linear time.)

SortingCSE 326 Autumn Lower Bound for Sorting by Comparison Sorting by Comparison  Only information available to us is the set of N items to be sorted  Only operation available to us is pairwise comparison between 2 items What is the best running time we can possibly achieve?

SortingCSE 326 Autumn Decision Tree Analysis of Sorting by Comparison

SortingCSE 326 Autumn Max depth of decision tree  How many permutations are there of N numbers?  How many leaves does the tree have?  What’s the shallowest tree with a given number of leaves?  What is therefore the worst running time (number of comparisons) by the best possible sorting algorithm?

SortingCSE 326 Autumn Lower Bound for Comparison Sort Stirling’s approximation:  Any comparison sort of n items must have a decision tree with n! leaves, of height log(n!)  n log(n) is a lower bound for log(n!) Thus, no comparison sort can be faster than O(n log(n))

SortingCSE 326 Autumn Insertion Sort Basic idea After k th pass, ensure that first k+1 elements are sorted On k th pass, swap (k+1) th element to left as necessary Start After Pass 1 After Pass After Pass What if array is initially sorted? Reverse sorted?

SortingCSE 326 Autumn Why Insertion Sort is Slow Inversion: a pair (i,j) such that i Array[j] Array of size N can have  (N 2 ) inversions  average number of inversions in a random set of elements is N(N-1)/4 Insertion Sort only swaps adjacent elements  only removes 1 inversion!

SortingCSE 326 Autumn HeapSort Sorting via Priority Queue (Heap) Shove items into a priority queue, take them out smallest to largest Worst Case: Best Case:

SortingCSE 326 Autumn MergeSort Merging Cars by key [Aggressiveness of driver]. Most aggressive goes first. MergeSort (Table [1..n]) Split Table in half Recursively sort each half Merge two sorted halves together

SortingCSE 326 Autumn MergeSort Analysis Running Time  Worst case?  Best case?  Average case? Other considerations besides running time?

SortingCSE 326 Autumn QuickSort Basic idea:  Pick a “pivot”  Divide into less-than & greater-than pivot  Sort each side recursively Picture from PhotoDisc.com

SortingCSE 326 Autumn QuickSort Partition Pick pivot: Partition with cursors <> <> 2 goes to less-than

SortingCSE 326 Autumn QuickSort Partition (cont’d) <> 6, 8 swap less/greater-than ,5 less-than 9 greater-than Partition done. Recursively sort each side.

SortingCSE 326 Autumn Analyzing QuickSort  Picking pivot: constant time  Partitioning: linear time  Recursion: time for sorting left partition (say of size i) + time for right (size N-i-1) T(1) = b T(N) = T(i) + T(N-i-1) + cN where i is the number of elements smaller than the pivot

SortingCSE 326 Autumn QuickSort Worst case Pivot is always smallest element. T(N) = T(i) + T(N-i-1) + cN T(N)= T(N-1) + cN = T(N-2) + c(N-1) + cN = T(N-k) + = O(N 2 )

SortingCSE 326 Autumn Optimizing QuickSort Choosing the Pivot  Randomly choose pivot Good theoretically and practically, but call to random number generator can be expensive  Pick pivot cleverly “Median-of-3” rule takes Median(first, middle, last). Works well in practice. Cutoff  Use simpler sorting technique below a certain problem size (Weiss suggests using insertion sort, with a cutoff limit of 5-20)

SortingCSE 326 Autumn QuickSort Best Case Pivot is always median element. T(N) = T(i) + T(N-i-1) + cN T(N)= 2T(N/2 - 1) + cN

SortingCSE 326 Autumn QuickSort Average Case Assume all size partitions equally likely, with probability 1/N details: Weiss pg

SortingCSE 326 Autumn External Sorting When you just ain’t got enough RAM …  e.g. Sort 10 billion numbers with 1 MB of RAM.  Databases need to be very good at this MergeSort Good for Something!  Basis for most external sorting routines  Can sort any number of records using a tiny amount of main memory in extreme case, keep only 2 records in memory at once!

SortingCSE 326 Autumn External MergeSort  Split input into two tapes  Each group of 1 records is sorted by definition, so merge groups of 1 to groups of 2, again split between two tapes  Merge groups of 2 into groups of 4  Repeat until data entirely sorted log N passes

SortingCSE 326 Autumn Better External MergeSort Suppose main memory can hold M records:  Read in groups of M records and sort them (e.g. with QuickSort)  Number of passes reduced to log(N/M) k-way mergesort reduces number of passes to log k (N/M)  Requires 2k output devices (e.g. mag tapes) But wait, there’s more … Polyphase merge does a k-way mergesort using only k+1 output devices (plus k th -order Fibonacci numbers!)

SortingCSE 326 Autumn Sorting by Comparison Summary Sorting algorithms that only compare adjacent elements are  (N 2 ) worst case – but may be  (N) best case HeapSort -  (N log N) both best and worst case  Suffers from two test-ops per data move MergeSort -  (N log N) running time  Suffers from extra-memory problem QuickSort -  (N 2 ) worst case,  (N log N) best and average case  In practice, median-of-3 almost always gets us  (N log N)  Big win comes from {sorting in place, one test-op, few swaps}!  Any comparison-based sorting algorithm is  (N log N)  External sorting: MergeSort with  (log N/M) passes

SortingCSE 326 Autumn Sorting: The Problem Space Revisited  General problem Given a set of N orderable items, put them in order  Without (significant) loss of generality, assume:  Items are integers  Ordering is  Most sorting problems can be mapped to the above in linear time But what if we have more information available to us?

SortingCSE 326 Autumn Sorting in Linear Time Sorting by Comparison  Only information available to us is the set of N items to be sorted  Only operation available to us is pairwise comparison between 2 items  Best running time is O(N log(N)), given these constraints What if we relax the constraints?  Know something in advance about item values

SortingCSE 326 Autumn BinSort (a.k.a. BucketSort)  If keys are known to be in {1, …, K}  Have array of size K  Put items into correct bin (cell) of array, based on key

SortingCSE 326 Autumn BinSort example K=5 list=(5,1,3,4,3,2,1,1,5,4,5) Bins in array key = 11,1,1 key = 22 key = 33,3 key = 44,4 key = 55,5,5 Sorted list: 1,1,1,2,3,3,4,4,5,5,5

SortingCSE 326 Autumn BinSort Pseudocode procedure BinSort (List L,K) LinkedList bins[1..K] // Each element of array bins is linked list. // Could also BinSort with array of arrays. For Each number x in L bins[x].Append(x) End For For i = 1..K For Each number x in bins[i] Print x End For

SortingCSE 326 Autumn BinSort Running Time K is a constant  BinSort is linear time K is variable  Not simply linear time K is large (e.g )  Impractical

SortingCSE 326 Autumn BinSort is “stable” Stable Sorting algorithm  Items in input with the same key end up in the same order as when they began.  Important if keys have associated values  Critical for RadixSort

SortingCSE 326 Autumn Mr. Radix Herman Hollerith invented and developed a punch-card tabulation machine system that revolutionized statistical computation. Born in Buffalo, New York, the son of German immigrants, Hollerith enrolled in the City College of New York at age 15 and graduated from the Columbia School of Mines with distinction at the age of 19. His first job was with the U.S. Census effort of Hollerith successively taught mechanical engineering at the Massachusetts Institute of Technology and worked for the U.S. Patent Office. The young engineer developed an electrically actuated brake system for the railroads, but the Westinghouse steam-actuated brake prevailed. Hollerith began working on the tabulating system during his days at MIT, filing for the first patent in He developed a hand-fed 'press' that sensed the holes in punched cards; a wire would pass through the holes into a cup of mercury beneath the card closing the electrical circuit. This process triggered mechanical counters and sorter bins and tabulated the appropriate data. Hollerith's system-including punch, tabulator, and sorter-allowed the official 1890 population count to be tallied in six months, and in another two years all the census data was completed and defined; the cost was $5 million below the forecasts and saved more than two years' time. His later machines mechanized the card-feeding process, added numbers, and sorted cards, in addition to merely counting data. In 1896 Hollerith founded the Tabulating Machine Company, forerunner of Computer Tabulating Recording Company (CTR). He served as a consulting engineer with CTR until retiring in In 1924 CTR changed its name to IBM- the International Business Machines Corporation. Herman Hollerith Born February 29, Died November 17, 1929 Art of Compiling Statistics; Apparatus for Compiling Statistics Patent Nos. 395,781; 395,782; 395,783 Inducted 1990 Source: National Institute of Standards and Technology (NIST) Virtual Museum -

SortingCSE 326 Autumn RadixSort  Radix = “The base of a number system” (Webster’s dictionary)  alternate terminology: radix is number of bits needed to represent 0 to base-1; can say “base 8” or “radix 3”  Idea: BinSort on each digit, bottom up.

SortingCSE 326 Autumn RadixSort – magic! It works.  Input list: 126, 328, 636, 341, 416, 131, 328  BinSort on lower digit: 341, 131, 126, 636, 416, 328, 328  BinSort result on next-higher digit: 416, 126, 328, 328, 131, 636, 341  BinSort that result on highest digit: 126, 131, 328, 328, 341, 416, 636

SortingCSE 326 Autumn RadixSort – how it works

SortingCSE 326 Autumn Not magic. It provably works. Keys  K-digit numbers  base B Claim: after i th BinSort, least significant i digits are sorted  e.g. B=10, i=3, keys are 1776 and comes before 1776 for last 3 digits.

SortingCSE 326 Autumn RadixSort Proof by Induction Base case:  i=0. 0 digits are sorted (that wasn’t hard!) Induction step:  assume for i, prove for i+1.  consider two numbers: X, Y. Say X i is i th digit of X (from the right)  X i+1 < Y i+1 then i+1 th BinSort will put them in order  X i+1 > Y i+1, same thing  X i+1 = Y i+1, order depends on last i digits. Induction hypothesis says already sorted for these digits. (Careful about ensuring that your BinSort preserves order aka “stable”…)

SortingCSE 326 Autumn What data types can you RadixSort?  Any type T that can be BinSorted  Any type T that can be broken into parts A and B, such that:  You can reconstruct T from A and B  A can be RadixSorted  B can be RadixSorted  A is always more significant than B, in ordering

SortingCSE 326 Autumn RadixSort Examples  1-digit numbers can be BinSorted  2 to 5-digit numbers can be BinSorted without using too much memory  6-digit numbers, broken up into A=first 3 digits, B=last 3 digits.  A and B can reconstruct original 6-digits  A and B each RadixSortable as above  A more significant than B

SortingCSE 326 Autumn RadixSorting Strings  1 Character can be BinSorted  Break strings into characters  Need to know length of biggest string (or calculate this on the fly).  Null-pad shorter strings  Running time:  N is number of strings  L is length of longest string  RadixSort takes O(N*L)

SortingCSE 326 Autumn Evaluating Sorting Algorithms  What factors other than asymptotic complexity could affect performance?  Suppose two algorithms perform exactly the same number of instructions. Could one be better than the other?

SortingCSE 326 Autumn Memory Hierarchy Stats (made up 1 ) CPU cyclesSize L1 (on chip) cache032 KB L2 cache8512 KB RAM35256 MB Hard Drive500,0008 GB 1 But plausible : - )

SortingCSE 326 Autumn Memory Hierarchy Exploits Locality of Reference Idea: small amount of fast memory Keep frequently used data in the fast memory LRU replacement policy  Keep recently used data in cache  To free space, remove Least Recently Used data

SortingCSE 326 Autumn Cache Details (simplified) Main Memory Cache Cache line size (4 adjacent memory cells)

SortingCSE 326 Autumn Traversing an Array One miss for every 4 accesses in a traversal cache misses cache hits

SortingCSE 326 Autumn Iterative MergeSort Poor Cache Performance Cache Size no temporal locality!

SortingCSE 326 Autumn Recursive MergeSort Better Cache Performance Cache Size

SortingCSE 326 Autumn Quicksort Pretty Good Cache Performance  Initial partition causes a lot of cache misses  As subproblems become smaller, they fit into cache  Generally good cache performance

SortingCSE 326 Autumn Radix Sort Lousy Cache Performance On each BinSort  Sweep through input list – cache misses along the way (bad!)  Append to output list – indexed by pseudorandom digit (ouch!)  Truly evil for large Radix (e.g ), which reduces # of passes

SortingCSE 326 Autumn Sorting Summary  Linear-time sorting is possible if we know more about the input (e.g. that all input values fall within a known range)  BinSort  Moderately useful O(n) categorizer  Most useful as building block for RadixSort  RadixSort  O(n), but input must conform to fairly narrow profile  Poor cache performance  Memory Hierarchy and Cache  Cache locality can have major impact on performance of sorting, and other, algorithms