1 CSE 326: Data Structures Sorting It All Out Henry Kautz Winter Quarter 2002.

Slides:



Advertisements
Similar presentations
Sorting in Linear Time Introduction to Algorithms Sorting in Linear Time CSE 680 Prof. Roger Crawfis.
Advertisements

Analysis of Algorithms CS 477/677 Linear Sorting Instructor: George Bebis ( Chapter 8 )
Non-Comparison Based Sorting
Sorting Comparison-based algorithm review –You should know most of the algorithms –We will concentrate on their analyses –Special emphasis: Heapsort Lower.
CSE332: Data Abstractions Lecture 14: Beyond Comparison Sorting Dan Grossman Spring 2010.
Lower bound for sorting, radix sort COMP171 Fall 2005.
Sorting. Sorting Considerations We consider sorting a list of records, either into ascending or descending order, based upon the value of some field of.
CSCE 3110 Data Structures & Algorithm Analysis
Lecture 5: Linear Time Sorting Shang-Hua Teng. Sorting Input: Array A[1...n], of elements in arbitrary order; array size n Output: Array A[1...n] of the.
Sorting Heapsort Quick review of basic sorting methods Lower bounds for comparison-based methods Non-comparison based sorting.
DAST, Spring © L. Joskowicz 1 Data Structures – LECTURE 1 Introduction Motivation: algorithms and abstract data types Easy problems, hard problems.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 CSE 326: Sorting Henry Kautz Autumn Quarter 2002.
E.G.M. Petrakissorting1 Sorting  Put data in order based on primary key  Many methods  Internal sorting:  data in arrays in main memory  External.
FALL 2006CENG 351 Data Management and File Structures1 External Sorting.
CS 300 – Lecture 20 Intro to Computer Architecture / Assembly Language Caches.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Divide and Conquer Sorting
CSE 326: Data Structures Sorting Ben Lerner Summer 2007.
CSC 2300 Data Structures & Algorithms March 20, 2007 Chapter 7. Sorting.
1 Today’s Material Lower Bounds on Comparison-based Sorting Linear-Time Sorting Algorithms –Counting Sort –Radix Sort.
1  2004 Morgan Kaufmann Publishers Chapter Seven.
1 CSE 326: Data Structures: Sorting Lecture 16: Friday, Feb 14, 2003.
1 CSE 326: Data Structures: Sorting Lecture 17: Wednesday, Feb 19, 2003.
DAST, Spring © L. Joskowicz 1 Data Structures – LECTURE 1 Introduction Motivation: algorithms and abstract data types Easy problems, hard problems.
CSE 373 Data Structures Lecture 19
CS 146: Data Structures and Algorithms July 14 Class Meeting Department of Computer Science San Jose State University Summer 2015 Instructor: Ron Mak
Computer Algorithms Lecture 11 Sorting in Linear Time Ch. 8
Data Structure & Algorithm Lecture 7 – Linear Sort JJCAO Most materials are stolen from Prof. Yoram Moses’s course.
Sorting in Linear Time Lower bound for comparison-based sorting
CSE 373 Data Structures Lecture 15
1 Time Analysis Analyzing an algorithm = estimating the resources it requires. Time How long will it take to execute? Impossible to find exact value Depends.
1 Sorting in O(N) time CS302 Data Structures Section 10.4.
David Luebke 1 10/13/2015 CS 332: Algorithms Linear-Time Sorting Algorithms.
CSC 41/513: Intro to Algorithms Linear-Time Sorting Algorithms.
Sorting Fun1 Chapter 4: Sorting     29  9.
Analysis of Algorithms CS 477/677
Targil 6 Notes This week: –Linear time Sort – continue: Radix Sort Some Cormen Questions –Sparse Matrix representation & usage. Bucket sort Counting sort.
Fall 2015 Lecture 4: Sorting in linear time
1 Joe Meehean.  Problem arrange comparable items in list into sorted order  Most sorting algorithms involve comparing item values  We assume items.
Simple Iterative Sorting Sorting as a means to study data structures and algorithms Historical notes Swapping records Swapping pointers to records Description,
CSE 326: Data Structures Lecture 24 Spring Quarter 2001 Sorting, Part B David Kaplan
Mudasser Naseer 1 11/5/2015 CSC 201: Design and Analysis of Algorithms Lecture # 8 Some Examples of Recursion Linear-Time Sorting Algorithms.
CSE373: Data Structure & Algorithms Lecture 22: More Sorting Linda Shapiro Winter 2015.
CSE 326 Sorting David Kaplan Dept of Computer Science & Engineering Autumn 2001.
Searching and Sorting Recursion, Merge-sort, Divide & Conquer, Bucket sort, Radix sort Lecture 5.
1 CSE 326: Data Structures A Sort of Detour Henry Kautz Winter Quarter 2002.
Review 1 Selection Sort Selection Sort Algorithm Time Complexity Best case Average case Worst case Examples.
CS 146: Data Structures and Algorithms July 14 Class Meeting Department of Computer Science San Jose State University Summer 2015 Instructor: Ron Mak
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
CompSci 100E 30.1 Other N log N Sorts  Binary Tree Sort  Basic Recipe o Insert into binary search tree (BST) o Do Inorder Traversal  Complexity o Create:
COSC 3101A - Design and Analysis of Algorithms 6 Lower Bounds for Sorting Counting / Radix / Bucket Sort Many of these slides are taken from Monica Nicolescu,
1 Algorithms CSCI 235, Fall 2015 Lecture 17 Linear Sorting.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
1  2004 Morgan Kaufmann Publishers Locality A principle that makes having a memory hierarchy a good idea If an item is referenced, temporal locality:
1 CSE 326: Data Structures Sorting in (kind of) linear time Zasha Weinberg in lieu of Steve Wolfman Winter Quarter 2000.
CSE 326: Data Structures Lecture 23 Spring Quarter 2001 Sorting, Part 1 David Kaplan
Sorting & Lower Bounds Jeff Edmonds York University COSC 3101 Lecture 5.
CSE373: Data Structures & Algorithms Lecture 26: Memory Hierarchy and Locality 1 Kevin Quinn, Fall 2015.
CS6045: Advanced Algorithms Sorting Algorithms. Sorting So Far Insertion sort: –Easy to code –Fast on small inputs (less than ~50 elements) –Fast on nearly-sorted.
Sorting and Runtime Complexity CS255. Sorting Different ways to sort: –Bubble –Exchange –Insertion –Merge –Quick –more…
Advanced Sorting 7 2  9 4   2   4   7
Chapter 11 Sorting Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and Mount.
Sorting.
Introduction to Algorithms
Algorithm Design and Analysis (ADA)
Discrete Mathematics CMP-101 Lecture 12 Sorting, Bubble Sort, Insertion Sort, Greedy Algorithms Abdul Hameed
CSE 326: Data Structures Sorting
Analysis of Algorithms
CENG 351 Data Management and File Structures
Presentation transcript:

1 CSE 326: Data Structures Sorting It All Out Henry Kautz Winter Quarter 2002

2 Calendar Today: Finish Sorting –Read Weiss Ch 7 (skip 7.8) Friday, Feb. 15 th : Disjoint Sets & Union Find –Read Weiss Ch 8 –Some written homework problems to be due Wednesday, Feb. 20 th Monday, Feb. 18 th : President’s Day, no class Wednesday, Feb. 20 th : Graph Algorithms –Weiss Ch 9 + additional material from lecture notes –Several lectures Monday, Feb 25 th : Word-counting project due Various specialized data structures & algorithms –Mergeable heaps, quad-trees, Huffman codes, … Friday, March 8 th : final written homework due Friday, March 15 th : Last day of class –Final programming project – building and solving mazes – due

3 Sorting HUGE Data Sets US Telephone Directory: –300,000,000 records 64-bytes per record –Name: 32 characters –Address: 54 characters –Telephone number: 10 characters –About 2 gigabytes of data –Sort this on a machine with 128 MB RAM… Other examples?

4 MergeSort Good for Something! Basis for most external sorting routines Can sort any number of records using a tiny amount of main memory –in extreme case, only need to keep 2 records in memory at any one time!

5 External MergeSort Split input into two “tapes” (or areas of disk) Merge tapes so that each group of 2 records is sorted Split again Merge tapes so that each group of 4 records is sorted Repeat until data entirely sorted log N passes

6 Better External MergeSort Suppose main memory can hold M records. Initially read in groups of M records and sort them (e.g. with QuickSort). Number of passes reduced to log(N/M)

7 Sorting by Comparison: Summary Sorting algorithms that only compare adjacent elements are  (N 2 ) worst case – but may be  (N) best case HeapSort and MergeSort -  (N log N) both best and worst case QuickSort  (N 2 ) worst case but  (N log N) best and average case Any comparison-based sorting algorithm is  (N log N) worst case External sorting: MergeSort with  (log N/M) passes but not quite the end of the story…

8 BucketSort If all keys are 1…K Have array of K buckets (linked lists) Put keys into correct bucket of array –linear time! BucketSort is a stable sorting algorithm: –Items in input with the same key end up in the same order as when they began Impractical for large K…

9 RadixSort Radix = “The base of a number system” (Webster’s dictionary) –alternate terminology: radix is number of bits needed to represent 0 to base-1; can say “base 8” or “radix 3” Used in 1890 U.S. census by Hollerith Idea: BucketSort on each digit, bottom up.

10 The Magic of RadixSort Input list: 126, 328, 636, 341, 416, 131, 328 BucketSort on lower digit: 341, 131, 126, 636, 416, 328, 328 BucketSort result on next-higher digit: 416, 126, 328, 328, 131, 636, 341 BucketSort that result on highest digit: 126, 131, 328, 328, 341, 416, 636

11 Inductive Proof that RadixSort Works Keys: K-digit numbers, base B –(that wasn’t hard!) Claim: after i th BucketSort, least significant i digits are sorted. –Base case: i=0. 0 digits are sorted. –Inductive step: Assume for i, prove for i+1. Consider two numbers: X, Y. Say X i is i th digit of X: X i+1 < Y i+1 then i+1 th BucketSort will put them in order X i+1 > Y i+1, same thing X i+1 = Y i+1, order depends on last i digits. Induction hypothesis says already sorted for these digits because BucketSort is stable

12 Running time of Radixsort N items, K digit keys in base B How many passes? How much work per pass? Total time?

13 Running time of Radixsort N items, K digit keys in base B How many passes? K How much work per pass? N + B –just in case B>N, need to account for time to empty out buckets between passes Total time? O( K(N+B) )

14 RadixSorting Strings example 5 th pass 4 th pass 3 rd pass 2 nd pass 1 st pass String 1 zippy String 2 zap String 3 ants String 4 flaps NULLs are just like fake characters

15 Evaluating Sorting Algorithms What factors other than asymptotic complexity could affect performance? Suppose two algorithms perform exactly the same number of instructions. Could one be better than the other?

16 Example Memory Hierarchy Statistics NameExtra CPU cycles used to access Size L1 (on chip) cache 032 KB L2 cache8512 KB RAM35256 MB Hard Drive500,0008 GB

17 The Memory Hierarchy Exploits Locality of Reference Idea: small amount of fast memory Keep frequently used data in the fast memory LRU replacement policy –Keep recently used data in cache –To free space, remove Least Recently Used data

18 So what? Optimizing use of cache can make programs way faster One TA made RadixSort 2x faster, rewriting to use cache better! Not just for sorting

19 Cache Details (simplified) Main Memory Cache Cache line size (4 adjacent memory cells)

20 Traversing an Array One miss for every 4 accesses in a traversal

21 Iterative MergeSort Cache Sizecache misses cache hits

22 Iterative MergeSort – cont’d Cache Size no temporal locality!

23 “Tiled” MergeSort – better Cache Size

24 “Tiled” MergeSort – cont’d Cache Size

25 QuickSort Initial partition causes a lot of cache misses As subproblems become smaller, they fit into cache Good cache performance

26 Radix Sort – Very Naughty On each BucketSort –Sweep through input list – cache misses along the way (bad!) –Append to output list – indexed by pseudo- random digit (ouch!)

27 Instruction Count

28 Cache Misses

29 Sorting Execution Time

30 Conclusions Speed of cache, RAM, and external memory has a huge impact on sorting (and other algorithms as well) Algorithms with same asymptotic complexity may be best for different kinds of memory Tuning algorithm to improve cache performance can offer large improvements (iterative vs. tiled mergesort)