CSE373: Data Structure & Algorithms Lecture 22: Beyond Comparison Sorting Aaron Bauer Winter 2014.

Slides:



Advertisements
Similar presentations
Sorting in Linear Time Introduction to Algorithms Sorting in Linear Time CSE 680 Prof. Roger Crawfis.
Advertisements

Analysis of Algorithms
Counting the bits Analysis of Algorithms Will it run on a larger problem? When will it fail?
Non-Comparison Based Sorting
CSE332: Data Abstractions Lecture 14: Beyond Comparison Sorting Dan Grossman Spring 2010.
Programming Languages Marjan Sirjani 2 2. Language Design Issues Design to Run efficiently : early languages Easy to write correctly : new languages.
Lower bound for sorting, radix sort COMP171 Fall 2005.
Quick Sort, Shell Sort, Counting Sort, Radix Sort AND Bucket Sort
CSCE 3110 Data Structures & Algorithm Analysis
Chapter 5: Elementary Data Types Properties of types and objects –Data objects, variables and constants –Data types –Declarations –Type checking –Assignment.
Quicksort CS 3358 Data Structures. Sorting II/ Slide 2 Introduction Fastest known sorting algorithm in practice * Average case: O(N log N) * Worst case:
What is an Algorithm? (And how do we analyze one?)
CSE332: Data Abstractions Lecture 12: Introduction to Sorting Tyler Robison Summer
CSE332: Data Abstractions Lecture 14: Beyond Comparison Sorting Tyler Robison Summer
CS 330 Programming Languages 10 / 16 / 2008 Instructor: Michael Eckmann.
FALL 2006CENG 351 Data Management and File Structures1 External Sorting.
CSCE 121, Sec 200, 507, 508 Fall 2010 Prof. Jennifer L. Welch.
CSE 326: Data Structures Sorting Ben Lerner Summer 2007.
Introduction to a Programming Environment
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
1 CSE 326: Data Structures: Sorting Lecture 17: Wednesday, Feb 19, 2003.
CSE 373: Data Structure & Algorithms Lecture 25: Programming Languages Nicki Dell Spring 2014.
Sorting in Linear Time Lower bound for comparison-based sorting
CSE 373 Data Structures Lecture 15
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Introduction CSE 1310 – Introduction to Computers and Programming Vassilis Athitsos University of Texas at Arlington 1.
CSE373: Data Structure & Algorithms Lecture 20: More Sorting Lauren Milne Summer 2015.
HKOI 2006 Intermediate Training Searching and Sorting 1/4/2006.
Jessie Zhao Course page: 1.
David Luebke 1 10/13/2015 CS 332: Algorithms Linear-Time Sorting Algorithms.
CSC 41/513: Intro to Algorithms Linear-Time Sorting Algorithms.
CSE373: Data Structures & Algorithms Lecture 20: Beyond Comparison Sorting Dan Grossman Fall 2013.
CSE332: Data Abstractions Lecture 14: Beyond Comparison Sorting Dan Grossman Spring 2012.
1 CSE 326: Data Structures Sorting It All Out Henry Kautz Winter Quarter 2002.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Algorithms & Flowchart
Survey of Sorting Ananda Gunawardena. Naïve sorting algorithms Bubble sort: scan for flips, until all are fixed Etc...
Sorting CS 110: Data Structures and Algorithms First Semester,
CSE373: Data Structure & Algorithms Lecture 22: More Sorting Linda Shapiro Winter 2015.
Introduction to Compilers. Related Area Programming languages Machine architecture Language theory Algorithms Data structures Operating systems Software.
Introduction CPSC 388 Ellen Walker Hiram College.
CSE373: Data Structure & Algorithms Lecture 21: More Sorting and Other Classes of Algorithms Lauren Milne Summer 2015.
1 Radix Sort. 2 Classification of Sorting algorithms Sorting algorithms are often classified using different metrics:  Computational complexity: classification.
Concepts of programming languages Chapter 5 Names, Bindings, and Scopes Lec. 12 Lecturer: Dr. Emad Nabil 1-1.
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
How to execute Program structure Variables name, keywords, binding, scope, lifetime Data types – type system – primitives, strings, arrays, hashes – pointers/references.
CSE332: Data Abstractions Lecture 12: Introduction to Sorting Dan Grossman Spring 2010.
CSE373: Data Structure & Algorithms Lecture 22: More Sorting Nicki Dell Spring 2014.
CS 330 Programming Languages 10 / 23 / 2007 Instructor: Michael Eckmann.
Program Performance 황승원 Fall 2010 CSE, POSTECH. Publishing Hwang’s Algorithm Hwang’s took only 0.1 sec for DATASET1 in her PC while Dijkstra’s took 0.2.
19 March More on Sorting CSE 2011 Winter 2011.
Chapter 1: Preliminaries Lecture # 2. Chapter 1: Preliminaries Reasons for Studying Concepts of Programming Languages Programming Domains Language Evaluation.
CS6045: Advanced Algorithms Sorting Algorithms. Sorting So Far Insertion sort: –Easy to code –Fast on small inputs (less than ~50 elements) –Fast on nearly-sorted.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
CSE373: Data Structures & Algorithms
Component 1.6.
Zuse’s Plankalkül – 1945 Never implemented Problems Zuse Solved
May 26th –non-comparison sorting
CSE373: Data Structure & Algorithms Lecture 23: More Sorting and Other Classes of Algorithms Catie Baker Spring 2015.
May 17th – Comparison Sorts
CSE373: Data Structures & Algorithms
Instructor: Lilian de Greef Quarter: Summer 2017
CSCI1600: Embedded and Real Time Software
CSCE Fall 2013 Prof. Jennifer L. Welch.
Objective of This Course
CSE373: Data Structure & Algorithms Lecture 22: More Sorting
CSE373: Data Structure & Algorithms Lecture 21: Comparison Sorting
CSCE Fall 2012 Prof. Jennifer L. Welch.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
CSCI1600: Embedded and Real Time Software
Presentation transcript:

CSE373: Data Structure & Algorithms Lecture 22: Beyond Comparison Sorting Aaron Bauer Winter 2014

The Big Picture Surprising amount of juicy computer science: 2-3 lectures… Winter 20142CSE373: Data Structures & Algorithms Simple algorithms: O(n 2 ) Fancier algorithms: O(n log n) Comparison lower bound:  (n log n) Specialized algorithms: O(n) Handling huge data sets Insertion sort Selection sort Shell sort … Heap sort Merge sort Quick sort (avg) … Bucket sort Radix sort External sorting How??? Change the model – assume more than “compare(a,b)”

BucketSort (a.k.a. BinSort) If all values to be sorted are known to be integers between 1 and K (or any small range): –Create an array of size K –Put each element in its proper bucket (a.k.a. bin) –If data is only integers, no need to store more than a count of how times that bucket has been used Output result via linear pass through array of buckets Winter 20143CSE373: Data Structures & Algorithms count array Example: K=5 input (5,1,3,4,3,2,1,1,5,4,5) output: 1,1,1,2,3,3,4,4,5,5,5

Analyzing Bucket Sort Overall: O(n+K) –Linear in n, but also linear in K –  (n log n) lower bound does not apply because this is not a comparison sort Good when K is smaller (or not much larger) than n –We don’t spend time doing comparisons of duplicates Bad when K is much larger than n –Wasted space; wasted time during linear O(K) pass For data in addition to integer keys, use list at each bucket Winter 20144CSE373: Data Structures & Algorithms

Bucket Sort with Data Most real lists aren’t just keys; we have data Each bucket is a list (say, linked list) To add to a bucket, insert in O(1) (at beginning, or keep pointer to last element) count array Example: Movie ratings; scale 1-5;1=bad, 5=excellent Input= 5: Casablanca 3: Harry Potter movies 5: Star Wars Original Trilogy 1: Rocky V Rocky V Harry Potter CasablancaStar Wars Result: 1: Rocky V, 3: Harry Potter, 5: Casablanca, 5: Star Wars Easy to keep ‘stable’; Casablanca still before Star Wars Winter CSE373: Data Structures & Algorithms

Radix sort Radix = “the base of a number system” –Examples will use 10 because we are used to that –In implementations use larger numbers For example, for ASCII strings, might use 128 Idea: –Bucket sort on one digit at a time Number of buckets = radix Starting with least significant digit Keeping sort stable –Do one pass per digit –Invariant: After k passes (digits), the last k digits are sorted Aside: Origins go back to the 1890 U.S. census Winter 20146CSE373: Data Structures & Algorithms

Example Radix = 10 Input: Winter 20147CSE373: Data Structures & Algorithms First pass: bucket sort by ones digit Order now:

Example Winter 20148CSE373: Data Structures & Algorithms Second pass: stable bucket sort by tens digit Order now: Radix = 10 Order was:

Example Winter 20149CSE373: Data Structures & Algorithms Third pass: stable bucket sort by 100s digit Order now: Radix = Order was:

Analysis Input size: n Number of buckets = Radix: B Number of passes = “Digits”: P Work per pass is 1 bucket sort: O(B+n) Total work is O(P(B+n)) Compared to comparison sorts, sometimes a win, but often not –Example: Strings of English letters up to length 15 Run-time proportional to: 15*(52 + n) This is less than n log n only if n > 33,000 Of course, cross-over point depends on constant factors of the implementations –And radix sort can have poor locality properties Winter CSE373: Data Structures & Algorithms

Sorting massive data Need sorting algorithms that minimize disk/tape access time: –Quicksort and Heapsort both jump all over the array, leading to expensive random disk accesses –Mergesort scans linearly through arrays, leading to (relatively) efficient sequential disk access Mergesort is the basis of massive sorting Mergesort can leverage multiple disks 11 CSE373: Data Structures & Algorithms Fall 2013

External Merge Sort Sort 900 MB using 100 MB RAM –Read 100 MB of data into memory –Sort using conventional method (e.g. quicksort) –Write sorted 100MB to temp file –Repeat until all data in sorted chunks (900/100 = 9 total) Read first 10 MB of each sorted chuck, merge into remaining 10MB –writing and reading as necessary –Single merge pass instead of log n –Additional pass helpful if data much larger than memory Parallelism and better hardware can improve performance Distribution sorts (similar to bucket sort) are also used Winter CSE373: Data Structures & Algorithms

Last Slide on Sorting Simple O(n 2 ) sorts can be fastest for small n –Selection sort, Insertion sort (latter linear for mostly-sorted) –Good for “below a cut-off” to help divide-and-conquer sorts O(n log n) sorts –Heap sort, in-place but not stable nor parallelizable –Merge sort, not in place but stable and works as external sort –Quick sort, in place but not stable and O(n 2 ) in worst-case Often fastest, but depends on costs of comparisons/copies  (n log n) is worst-case and average lower-bound for sorting by comparisons Non-comparison sorts –Bucket sort good for small number of possible key values –Radix sort uses fewer buckets and more phases Best way to sort? It depends! Winter CSE373: Data Structures & Algorithms

What is a Programming Language? A set of symbols and associated tools that translate (if necessary) collections of symbols into instructions to a machine –Compiler, execution platform (e.g. Java Virtual Machine) –Designed by someone or some people Can have flaws, poor decisions, mistakes Syntax –What combinations of symbols are allowed Semantics –What those combinations mean These can be defined in different ways for different languages There are a lot of languages –Wikipedia lists 675 excluding dialects of BASIC and esoteric languages Winter CSE373: Data Structures & Algorithms

Before High-Level Languages Everything done machine code or an assembly language –Arithmetic operations (add, multiply, etc.) –Memory operations (storing, loading) –Control operations (jump, branch) Example: move 8-bit value into a register –1101 is binary code for move followed by 3-bit register id – –B0 61 –MOV AL, 61h; Load AL with 97 decimal (61 hex) Winter CSE373: Data Structures & Algorithms

A Criminally Brief History of Features First compiled high-level language: 1952 (Autocode) Math notation, subroutines, arrays: 1955 (Fortran) Recursion, higher-order functions, garbage collection: 1960 (LISP) Nested block structure, lexical scoping: 1960 (ALGOL) Object-orientated programming: 1967 (Simula) Generic programming: 1973 (ML) Winter CSE373: Data Structures & Algorithms

Language timeline C: 1973 C++: 1980 MATLAB: 1984 Objective-C: 1986 Mathematic (Wolfram): 1988 Python: 1991 Ruby: 1993 Java: 1995 Javascript: 1995 PHP: 1995 C#: 2001 Scala: 2003 Winter CSE373: Data Structures & Algorithms

What do we want from a Language? Performant Expressive Readable Portable Make dumb things difficult … Winter CSE373: Data Structures & Algorithms

Type System Collection of rules to assign types to elements of the language –Values, variables, functions, etc. The goal is to reduce bugs –Logic errors, memory errors (maybe) Governed by type theory, an incredibly deep and complex topic The type safety of a language is the extent to which its type system prevents or discourages relevant type errors –Via type checking We’ll cover the following questions: –When does the type system check? –What does the type system check? –What do we have to tell the type system? Winter CSE373: Data Structures & Algorithms

When Does It Check? Static type-checking (check at compile-time) –Based on source code (program text) –If program passes, it’s guaranteed to satisfy some type- safety properties on all possible inputs –Catches bugs early (program doesn’t have to be run) –Possibly better run-time performance Less (or no) checking to do while program runs Compiler can optimize based on type –Inherently conservative if then else –Not all useful features can be statically checked Many languages use both static and dynamic checking Winter CSE373: Data Structures & Algorithms

When Does it Check? Dynamic type-checking (check at run-time) –Performed as the program is executing –Often “tag” objects with their type information –Look up type information when performing operations –Possibly faster development time edit-compile-test-debug cycle –Fewer guarantees about program correctness Winter CSE373: Data Structures & Algorithms

What Does it Check? Nominal type system (name-based type system) –Equivalence of types based on declared type names –Objects are only subtypes if explicitly declared so –Can be statically or dynamically checked Structural type system (property-based type system) –Equivalence of types based on structure/definition –An element A is compatible with an element B if for each feature in B’s type, there’s an identical feature in A’s type Not symmetric, subtyping handled similarly Duck typing –Type-checking only based on features actually used –Only generates run-time errors Winter CSE373: Data Structures & Algorithms

How Much do we Have to Tell it? Type Inference –Automatically determining the type of an expression –Programmer can omit type annotations Instead of (in C++) std::vector ::const_iterator itr = myvec.cbegin() use (in C++11) auto itr = myvec.cbegin() –Can make programming tasks easier –Only happens at compile-time Otherwise, types must be manifest (always written out) Winter CSE373: Data Structures & Algorithms

What does it all mean? Most of these distinctions are not mutually exclusive –Languages that do static type-checking often have to do some dynamic type-checking as well –Some languages use a combination of nominal and duck typing Terminology useful shorthand for describing language characteristics The terms “strong” or “weak” typing are often applied –These lack any formal definition –Use more precise, informative descriptors instead Next lecture: –Overview of other important language attributes –Comparisons of common languages Winter CSE373: Data Structures & Algorithms