Priority Queues and Heapsort (9.1-9.4) Priority queues are used for many purposes, such as job scheduling, shortest path, file compression … Recall the.

Slides:



Advertisements
Similar presentations
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Advertisements

Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
COL 106 Shweta Agrawal and Amit Kumar
Greedy Algorithms Amihood Amir Bar-Ilan University.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Binary Heaps CSE 373 Data Structures Lecture 11. 2/5/03Binary Heaps - Lecture 112 Readings Reading ›Sections
Priority Queues and Heapsort ( ) Priority queues are used for many purposes, such as job scheduling, shortest path, file compression … Recall the.
© 2004 Goodrich, Tamassia Greedy Method and Compression1 The Greedy Method and Text Compression.
Binary Trees A binary tree is made up of a finite set of nodes that is either empty or consists of a node called the root together with two binary trees,
Chapter 9: Greedy Algorithms The Design and Analysis of Algorithms.
Data Structures – LECTURE 10 Huffman coding
6/26/2015 7:13 PMTries1. 6/26/2015 7:13 PMTries2 Outline and Reading Standard tries (§9.2.1) Compressed tries (§9.2.2) Suffix tries (§9.2.3) Huffman encoding.
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Priority Queues, Heaps & Leftist Trees
Chapter 9: Text Processing Pattern Matching Data Compression.
Topic #3: Lexical Analysis
Binary Trees A binary tree is made up of a finite set of nodes that is either empty or consists of a node called the root together with two binary trees.
Advanced Algorithm Design and Analysis (Lecture 5) SW5 fall 2004 Simonas Šaltenis E1-215b
Huffman Encoding Veronica Morales.
1 Analysis of Algorithms Chapter - 08 Data Compression.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
Data Structures Week 6: Assignment #2 Problem
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Priority Queues and Binary Heaps Chapter Trees Some animals are more equal than others A queue is a FIFO data structure the first element.
data ordered along paths from root to leaf
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
Outline Priority Queues Binary Heaps Randomized Mergeable Heaps.
Sorting. Pseudocode of Insertion Sort Insertion Sort To sort array A[0..n-1], sort A[0..n-2] recursively and then insert A[n-1] in its proper place among.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
Priority Queues, Trees, and Huffman Encoding CS 244 This presentation requires Audio Enabled Brent M. Dingle, Ph.D. Game Design and Development Program.
1 Heaps (Priority Queues) You are given a set of items A[1..N] We want to find only the smallest or largest (highest priority) item quickly. Examples:
Chapter 2: Basic Data Structures. Spring 2003CS 3152 Basic Data Structures Stacks Queues Vectors, Linked Lists Trees (Including Balanced Trees) Priority.
IS 2610: Data Structures Priority Queue, Heapsort, Searching March 15, 2004.
CMSC 330: Organization of Programming Languages Finite Automata NFAs  DFAs.
CPS Heaps, Priority Queues, Compression l Compression is a high-profile application .zip,.mp3,.jpg,.gif,.gz, …  What property of MP3 was a significant.
HEAPS. Review: what are the requirements of the abstract data type: priority queue? Quick removal of item with highest priority (highest or lowest key.
FALL 2005CENG 213 Data Structures1 Priority Queues (Heaps) Reference: Chapter 7.
Priority Queues, Heaps, and Heapsort CSE 2320 – Algorithms and Data Structures Vassilis Athitsos University of Texas at Arlington 1.
Properties: -The value in each node is greater than all values in the node’s subtrees -Complete tree! (fills up from left to right) Max Heap.
using Deterministic Finite Automata & Nondeterministic Finite Automata
Priority Queues, Heaps, and Heapsort CSE 2320 – Algorithms and Data Structures Vassilis Athitsos University of Texas at Arlington 1 Last modified: 2/22/2016.
Deterministic Finite Automata Nondeterministic Finite Automata.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Priority Queues, Heaps, and Heapsort CSE 2320 – Algorithms and Data Structures Vassilis Athitsos University of Texas at Arlington 1.
Lecture on Data Structures(Trees). Prepared by, Jesmin Akhter, Lecturer, IIT,JU 2 Properties of Heaps ◈ Heaps are binary trees that are ordered.
1 Priority Queues (Heaps). 2 Priority Queues Many applications require that we process records with keys in order, but not necessarily in full sorted.
HUFFMAN CODES.
Tries 07/28/16 11:04 Text Compression
Tries 5/27/2018 3:08 AM Tries Tries.
Priority Queues An abstract data type (ADT) Similar to a queue
Greedy Method 6/22/2018 6:57 PM Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015.
The Greedy Method and Text Compression
The Greedy Method and Text Compression
Data Structures & Algorithms Priority Queues & HeapSort
Priority Queues.
Heap Sort The idea: build a heap containing the elements to be sorted, then remove them in order. Let n be the size of the heap, and m be the number of.
Huffman Coding.
Priority Queues.
Chapter 11 Data Compression
CS Data Structure: Heaps.
Data Structure and Algorithms
Tries 2/23/2019 8:29 AM Tries 2/23/2019 8:29 AM Tries.
4b Lexical analysis Finite Automata
Priority Queues CSE 373 Data Structures.
Heaps & Multi-way Search Trees
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Analysis of Algorithms CS 477/677
Presentation transcript:

Priority Queues and Heapsort ( ) Priority queues are used for many purposes, such as job scheduling, shortest path, file compression … Recall the definition of a Priority Queue: operations insert(), delete_max() also max(), change_priority(), join() How would I sort a list, using a priority queue? for (i=0; i<n; i++) insert(A[i]); for (i=0; i<n; i++) cout << delete_max(); How would I implement a priority queue? how fast a sorting alg would your implementation yield? can we do better?

Priority Queue Implementations insert max delete change join priority sorted array n 1 n n n unsorted array 1 n 1 1 n heap lg n lg n lg n lg n n binomial queue lg n lg n lg n lg n lg n (best) 1 lg n lg n 1 1

Heaps How can we build a data structure to do this? Hints: we want to find the smallest element quickly we want to be able to remove an element quickly Tree of some sort? Heap: a full binary tree (all leaves at the same level, on left) each element is at least as large as its children (note: this is not a BST!) How to delete the maximum? How to add a number to a heap? How to build a heap out of a list of numbers?

Insert Implicit representation XTOGSMNAERAI (children of i at 2i and 2i+1) How would I do an insert()? add to the end of the array repeat: if larger than parent, swap template void insert(Item a[], Item newItem, int items) { n = ++items; a[n] = newItem; while (n>1 && a[n/2] < a[n]) { exch(a[n], a[n/2]); n/=2; } } XTOGSMNAERIP XTOGSPNAERIM XTPGSONAERIM Runtime?  (log n)

DeleteMax How would I delete X? Move last element to root If larger than either child, swap with larger child Item DeleteMax(Item a[], int items) { exch(a[1], a[items--]); reHeapify(a, items); return a[items+1]; } void reHeapify(Item a[], int items) { int n=1; while (2*n <= items) { int j = 2*n; if (j<items && a[j] < a[j+1]) j++; if (a[n] >= a[j]) break; exch(a[n],a[j]); n=j; } XTOGSMNAERAI ITOGSMNAERAX TIOGSMNAERAX TSOGIMNAERAX TSOGRMNAEIAX Runtime?  (log n)

BuildHeap (top down) Given an array, e.g. ASORTINGEXAMPLE, how do I make it a heap? Top-down: for (i=2; i<=items; i++) insert(a,a[i],i-1) Runtime:  (n log n) Can we do better?

BuildHeap (bottom up) Suppose we use the reHeapify() function instead and work bottom-up. For (i=items/2; i>=1; i--) reHeapify(a) ASORTINGEX ASORXINGET AXORSINGET AXORTINGES XAORTINGES XTORAINGES XTORSINGEA Runtime? … n/4 + 2(n/8) + 3(n/16) + 4(n/32) + … n(1/4 + 2/8 + 3/16 + 4/32 + …) n * 1  (n) !Top-down was  (n log n); bottom up is  (n)! cool!

Heapsort BuildHeap() for (i=1; i<=n; i++) DeleteMax(); Runtime?  (n log n) Almost competitive with quicksort

Priority Queue Operations insert(), max(), deleteMax() Could implement with heap Runtime for each operation? insert(), deleteMax() – O(log n) max() – O(1)

Example Application Suppose you have a text, abracadabra. Want to compress it. How many bits required? at 3 bits per letter, 33 bits. Can we do better? How about variable length codes? In order to be able to decode the file again, we would need a prefix code: no code is the prefix of another. How do we make a prefix code that compresses the text?

Huffman Coding Note: Put the letters at the leaves of a binary tree. Left=0, Right=1. Voila! A prefix code. Huffman coding: an optimal prefix code Algorithm: use a priority queue. insert all letters according to frequency if there is only one tree left, done. else, a=deleteMin(); b=deleteMin(); make tree t out of a and b with weight a.weight() + b.weight(); insert(t)

Huffman coding example abracadabra frequencies: a: 5, b: 2, c: 1, d: 1, r: 2 Huffman code: a: 0, b: 100, c: 1010, d: 1011, r: 11 bits: 5 * * * * * 2 = 23 Finite automaton to decode –  (n) Time to encode? Compute frequencies – O(n) Build heap – O(1) assuming alphabet has constant size Encode – O(n)

Huffman coding summary Huffman coding is very frequently used (You use it every time you watch HTDV or listen to mp3, for example) Text files often compress to 60% of original size In real life, Huffman coding is usually used in conjunction with a modeling algorithm… E.g. jpeg compression: DCT, quantization, and Huffman coding Text compression: dictionary + Huffman coding

Finite Automata and Regular Expressions How can I decode some Huffman-encoded text efficiently? (hand-design a dfa to recognize) Or: how can I find all instances of aardvark, aaardvark, aaaardvark, etc. or zyzzyva, zyzzzyva, zyzzzzyva, etc. in Microsoft Word? Unix? (grep) All words with 2 or more As or Zs? Important topic: regular expressions and finite automata. theoretician: regular expressions are grammars that define regular languages programmer: compact patterns for matching and replacing

DFA for abracadabra Huffman code: A=0, B=100, C=1010, D=1011, E=11 DFA: state read out new state 00A R B C D 0 (Actually, this looks just like the original tree, doesn’t it.)

Regular Expressions Regular expressions are one of a literal character a (regular expression) – in parentheses a concatenation of two R.E.s the alternation (“or”) of two R.E.s, denoted + the closure of an R.E., denoted * (i.e 0 or more occurrences) Regular expressions define regular languages Examples abracadabra abra(cadabra)* = {abra, abracadabra, abracadabracadabra, … } (a*b + ac)d (a(a+b)b*)*

RE Variants Different programming languages, text editors, etc. use different syntaxes. Perl regexps:. any character [1-4a-c^&*] any of 1, 2, 3, 4, a, b, c, ^, &, * [^abc] any letter other than a, b, or c | alternation (or) * 0 or more occurrences (maximal) + 1 or more occurrences *?, +? same, but use minimal matching ? 0 or 1 occurrences $1, $2 back references \s any white-space character \w any “word character” ([a-zA-Z0-9_]) \d any digit ([0-9])

RE Examples (perl syntax) s/s/th/g s/\s+/ /g s/aa+/aa/ s/(\w+)\s+(\w+)/$2 $1/ s/(\w+?)\s+(\w+?)/$2 $1/ m/ ]*)class\s*=\s*(“’)(.*?)\2([^>]*)>/ `date` =~ m/(\d+):(\d+):(\d+)/ $hours = $1; $minutes = $2; $seconds = $3; if (`ls` =~ m/index.htm/) … if (`cat myname.txt` =~ m/Joe Smith/) … stat program

Finite Automata Finite automata: machines that recognize regular languages. Deterministic Finite Automaton (DFA): a set of states including a start state and one or more accepting states a transition function: given current state and input letter, what’s the new state? Non-deterministic Finite Automaton (NDFA): like a DFA, but there may be more than one transition out of a state on the same letter (Pick the right one non-deterministically, i.e. via lucky guess!) epsilon-transitions, i.e optional transitions on no input letter

RE  NDFA Given a Regular Expression, how can I build a DFA? Work bottom up. Letter: Concatenation: Or:Closure:

RE -> NDFA Ex Construct an NDFA for the RE (A*B + AC)D A A* A*B A*B + AC (A*B + AC)D

NDFA -> DFA Keep track of the set of states you are in. On each new input letter, compute the new set of states you could be in. The set of states for the DFA is the power set of the NDFA states. I.e. up to 2 n states, where there were n in the DFA.

Recognizing Regular Languages Suppose your language is given by a DFA. How to recognize? Build a table. One row for every (state,input letter) pair. Give resulting state. For each letter of input string, compute new state When done, check whether the last state is an accepting state. Runtime? O(n), where n is the number of input letters Another approach: use a C program to simulate NDFA with backtracking. Less space, more time. (perl, egrep vs. fgrep)

Examples Unix grep Perl $input =~ s/t[wo]?o/2; $input =~ s| ]*>\s*||gs; $input =~ $input =~ s|\s*mso-[^>"]*"|"|gis; $input =~ s/([^ ]+) +([^ ]+)/$2 $1/; $input =~ m/^[0-9]+\.?[0-9]*|\.[0-9]+$/; ($word1,$word2,$rest) = ($foo =~ m/^ *([^ ]+) +([^ ]+) +(.*)$/); $input=~s| ]*>\s* ]*>\s* | |gis;