Scoreboard What else might we want to do with a data structure?

Slides:



Advertisements
Similar presentations
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Advertisements

Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Binary Trees CSC 220. Your Observations (so far data structures) Array –Unordered Add, delete, search –Ordered Linked List –??
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Trees Chapter 8.
Fall 2007CS 2251 Trees Chapter 8. Fall 2007CS 2252 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information.
Trees Chapter 8. Chapter 8: Trees2 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information To learn how.
Version TCSS 342, Winter 2006 Lecture Notes Priority Queues Heaps.
A Data Compression Algorithm: Huffman Compression
CS 206 Introduction to Computer Science II 04 / 29 / 2009 Instructor: Michael Eckmann.
Chapter 9: Huffman Codes
CS 206 Introduction to Computer Science II 12 / 10 / 2008 Instructor: Michael Eckmann.
1 CSC 427: Data Structures and Algorithm Analysis Fall 2010 transform & conquer  transform-and-conquer approach  balanced search trees o AVL, 2-3 trees,
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
CS 46B: Introduction to Data Structures July 30 Class Meeting Department of Computer Science San Jose State University Summer 2015 Instructor: Ron Mak.
Algorithm Design & Analysis – CS632 Group Project Group Members Bijay Nepal James Hansen-Quartey Winter
Huffman Encoding Veronica Morales.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
CPS 100, Fall YAQ, YAQ, haha! (Yet Another Queue) l What is the dequeue policy for a Queue?  Why do we implement Queue with LinkedList Interface.
Spring 2010CS 2251 Trees Chapter 6. Spring 2010CS 2252 Chapter Objectives Learn to use a tree to represent a hierarchical organization of information.
ICS 220 – Data Structures and Algorithms Lecture 11 Dr. Ken Cosh.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
CompSci 100E 24.1 Data Compression  Compression is a high-profile application .zip,.mp3,.jpg,.gif,.gz, …  What property of MP3 was a significant factor.
Priority Queues, Trees, and Huffman Encoding CS 244 This presentation requires Audio Enabled Brent M. Dingle, Ph.D. Game Design and Development Program.
CPS Heaps, Priority Queues, Compression l Compression is a high-profile application .zip,.mp3,.jpg,.gif,.gz, …  What property of MP3 was a significant.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
CPS 100, Spring Huffman Coding l D.A Huffman in early 1950’s l Before compressing data, analyze the input stream l Represent data using variable.
CompSci 100e 8.1 Scoreboard l What else might we want to do with a data structure? AlgorithmInsertionDeletionSearch Unsorted Vector/array Sorted vector/array.
CompSci 100e 8.1 Plan for the Course! l Understand Huffman Coding  Data compression  Priority Queues  Bits and Bytes  Greedy Algorithms l Algorithms.
CPS Heaps, Priority Queues, Compression l Compression is a high-profile application .zip,.mp3,.jpg,.gif,.gz, …  Why is compression important?
CompSci 100e 10.1 Binary Digits (Bits) l Yes or No l On or Off l One or Zero l
Lecture on Data Structures(Trees). Prepared by, Jesmin Akhter, Lecturer, IIT,JU 2 Properties of Heaps ◈ Heaps are binary trees that are ordered.
CSE373: Data Structures & Algorithms Priority Queues
Heaps (8.3) CSE 2011 Winter May 2018.
Huffman Codes ASCII is a fixed length 7 bit code that uses the same number of bits to define each character regardless of how frequently it occurs. Huffman.
HUFFMAN CODES.
B/B+ Trees 4.7.
Data Coding Run Length Coding
Compression & Huffman Codes
Applied Algorithmics - week7
Huffman Coding Based on slides by Ethan Apter & Marty Stepp
Data Compression Compression is a high-profile application
The Greedy Method and Text Compression
The Greedy Method and Text Compression
Hashing Exercises.
Compsci 201 Priority Queues & Autocomplete
CompSci 201 Data Representation & Huffman Coding
Heaps, Priority Queues, Compression
Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.
Map interface Empty() - return true if the map is empty; else return false Size() - return the number of elements in the map Find(key) - if there is an.
Chapter 8 – Binary Search Tree
CMSC 341: Data Structures Priority Queues – Binary Heaps
Chapter 8 – Binary Search Tree
Chapter 9: Huffman Codes
Part-D1 Priority Queues
CSE 373: Data Structures and Algorithms
Lecture 24: Priority Queues and Huffman Encoding
Heaps and the Heapsort Heaps and priority queues
Ch. 8 Priority Queues And Heaps
Chapter 11 Data Compression
Huffman Coding CSE 373 Data Structures.
Priority Queues & Heaps
Heaps and Priority Queues
CSC 380: Design and Analysis of Algorithms
Priority Queues CSE 373 Data Structures.
Priority Queues & Heaps
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Tree (new ADT) Terminology: A tree is a collection of elements (nodes)
Presentation transcript:

Scoreboard What else might we want to do with a data structure? Algorithm Insertion Deletion Search Unsorted Vector/array Sorted vector/array Linked list Hash Maps Binary search tree AVL tree

Priority Queues Basic operations Insert Remove largest What properties must the data have? Applications Event-driven simulation: Colliding particles AI A* - Best-first search Operating systems Load balancing & scheduling Statistics Maintain largest m values Graph searching Dijkstra's algorithm Data Compression: Huffman coding

Data Compression Compression is a high-profile application .zip, .mp3, .jpg, .gif, .gz, … What property of MP3 was a significant factor in what made Napster work (why did Napster ultimately fail?) Why do we care? Secondary storage capacity doubles every year Disk space fills up quickly on every computer system More data to compress than ever before

More on Compression What’s the difference between compression techniques? .mp3 files and .zip files? .gif and .jpg? Lossless and lossy Is it possible to compress (lossless) every file? Why? Lossy methods Good for pictures, video, and audio (JPEG, MPEG, etc.) Lossless methods Run-length encoding, Huffman, LZW, … 11 3 5 3 2 6 2 6 5 3 5 3 5 3 10

Priority Queue Compression motivates the study of the ADT priority queue Supports two basic operations insert -– an element into the priority queue delete – the minimal element from the priority queue Implementations may allow getmin separate from delete Analogous to top/pop, front/dequeue in stacks, queues See PQDemo.java and UsePQ.java, code below sorts, complexity? Scanner s; PriortyQueue pq = new PriorityQueue(); while (s.hasNext()) pq.add(s.next()); while (pq.size() > 0) { System.out.println(pq.remove()); }

Priority Queue implementations Implementing priority queues: average and worst case Insert average Getmin (delete) worst Unsorted vector Sorted vector Search tree Balanced tree Heap Heap has O(1) find-min (no delete) and O(n) build heap

PriorityQueue.java (Java 5) What about objects inserted into pq? If deletemin is supported, what properties must inserted objects have, e.g., insert non-comparable? Change what minimal means? Implementation uses heap If we use a Comparator for comparing entries we can make a min-heap act like a max-heap, see PQDemo Where is class Comparator declaration? How used? What's a static inner class? A non-static inner class? In Java 5 there is a Queue interface and PriorityQueue class The PriorityQueue class also uses a heap

Sorting w/o Collections.sort(…) public static void sort(ArrayList<String> a) { PriorityQueue pq = new PriorityQueue<String>(); for(int k=0; k < a.size(); k++) pq.add(a.get(k)); for(int k=0; k < a.size(); k++) a.set(k,pq.remove()); } How does this work, regardless of pqueue implementation? What is the complexity of this method? add O(1), remove O(log n)? If add O(log n)? heapsort uses array as the priority queue rather than separate pq object. From a big-Oh perspective no difference: O(n log n) Is there a difference? What’s hidden with O notation?

Priority Queue implementation PriorityQueue uses heaps, fast and reasonably simple Why not use inheritance hierarchy as was used with Map? Trade-offs when using HashMap and TreeMap: Time, space Ordering properties, e.g., what does TreeMap support? Changing method of comparison when calculating priority? Create object to replace, or in lieu of compareTo Comparable interface compares this to passed object Comparator interface compares two passed objects Both comparison methods: compareTo() and compare() Compare two objects (parameters or self and parameter) Returns –1, 0, +1 depending on <, ==, >

Creating Heaps Heap is an array-based implementation of a binary tree used for implementing priority queues, supports: insert, findmin, deletemin: complexities? Using array minimizes storage (no explicit pointers), faster too --- children are located by index/position in array Heap is a binary tree with shape property, heap/value property shape: tree filled at all levels (except perhaps last) and filled left-to-right (complete binary tree) each node has value smaller than both children

Array-based heap store “node values” in array beginning at index 1 for node with index k left child: index 2*k right child: index 2*k+1 why is this conducive for maintaining heap shape? what about heap property? is the heap a search tree? where is minimal node? where are nodes added? deleted? 1 2 3 4 5 6 7 8 9 10 17 13 25 21 19 6 10 7 17 13 9 21 19 25

Thinking about heaps Where is minimal element? Root, why? Where is maximal element? Leaves, why? How many leaves are there in an N-node heap (big-Oh)? O(n), but exact? What is complexity of find max in a minheap? Why? O(n), but ½ N? Where is second smallest element? Why? Near root? 6 10 7 17 13 9 21 19 25 1 2 3 4 5 6 7 8 9 10 17 13 25 21 19

Adding values to heap to maintain heap shape, must add new value in left-to-right order of last level could violate heap property move value “up” if too small change places with parent if heap property violated stop when parent is smaller stop when root is reached pull parent down, swapping isn’t necessary (optimization) 13 6 10 7 17 9 21 19 25 insert 8 13 6 10 7 17 9 21 19 25 8 6 10 7 17 9 21 19 25 bubble 8 up 8 13 6 8 7 13 10 17 9 21 19 25

Adding values, details (pseudocode) void add(Object elt) { // add elt to heap in myList myList.add(elt); int loc = myList.size(); while (1 < loc && elt < myList[loc/2]) myList[loc] = myList[loc/2]; loc = loc/2; // go to parent } // what’s true here? myList.set(loc,elt); 13 6 10 7 17 9 21 19 25 13 6 10 7 17 9 21 19 25 8 1 2 3 4 5 6 7 8 9 10 17 13 25 21 19 tvector myList

Removing minimal element Where is minimal element? If we remove it, what changes, shape/property? How can we maintain shape? “last” element moves to root What property is violated? After moving last element, subtrees of root are heaps, why? Move root down (pull child up) does it matter where? When can we stop “re-heaping”? Less than both children Reach a leaf 13 6 10 7 17 9 21 19 25 13 25 10 7 17 9 21 19 7 10 25 17 13 9 21 19 7 10 9 17 13 25 21 19

Text Compression Input: String S Output: String S Shorter S can be reconstructed from S

Text Compression: Examples “abcde” in the different formats ASCII: 01100001011000100110001101100100… Fixed: 000001010011100 Var: 000110100110 Encodings ASCII: 8 bits/character Unicode: 16 bits/character 1 a b c d e Symbol ASCII Fixed length Var. a 01100001 000 b 01100010 001 11 c 01100011 010 01 d 01100100 011 e 01100101 100 10 a d b c e 1

Huffman coding: go go gophers Encoding uses tree: 0 left/1 right How many bits? 37!! Savings? Worth it? ASCII 3 bits Huffman g 103 1100111 000 00 o 111 1101111 001 01 p 112 1110000 010 1100 h 104 1101000 011 1101 e 101 1100101 100 1110 r 114 1110010 101 1111 s 115 1110011 110 101 sp. 32 1000000 111 101 g 3 o 6 13 3 2 p 1 h e r 4 s * 7 g 3 o 6 2 p 1 h e r 4 3 s 1 * 2

Huffman Coding D.A Huffman in early 1950’s Before compressing data, analyze the input stream Represent data using variable length codes Variable length codes though Prefix codes Each letter is assigned a codeword Codeword is for a given letter is produced by traversing the Huffman tree Property: No codeword produced is the prefix of another Letters appearing frequently have short codewords, while those that appear rarely have longer ones Huffman coding is optimal per-character coding method

Building a Huffman tree Begin with a forest of single-node trees (leaves) Each node/tree/leaf is weighted with character count Node stores two values: character and count There are n nodes in forest, n is size of alphabet? Repeat until there is only one node left: root of tree Remove two minimally weighted trees from forest Create new tree with minimal trees as children, New tree root's weight: sum of children (character ignored) Does this process terminate? How do we get minimal trees? Remove minimal trees, hummm……

Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 11 6 I 5 E 5 N 4 S 4 M 3 A 3 B 3 O 3 T 2 G 2 D 2 L 2 R 2 U 1 P 1 F 1 C

Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 2 1 F 1 C 11 6 I 5 E 5 N 4 S 4 M 3 A 3 B 3 O 3 T 2 2 G 2 D 2 L 2 R 2 U 1 P

Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 2 3 1 C 1 F 1 P 2 U 11 6 I 5 E 5 N 4 S 4 M 3 3 A 3 B 3 O 3 T 2 2 G 2 D 2 L 2 R

Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 4 2 3 2 L 2 R 1 C 1 F 1 P 2 U 11 6 I 5 E 5 N 4 4 S 4 M 3 3 A 3 B 3 O 3 T 2 2 G 2 D

Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 4 4 2 3 2 G 2 D 2 L 2 R 1 C 1 F 1 P 2 U 11 6 I 5 E 5 N 4 4 4 S 4 M 3 3 A 3 B 3 O 3 T 2

Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 5 4 4 3 O 2 3 2 G 2 D 2 L 2 R 1 C 1 F 1 P 2 U 11 6 I 5 5 E 5 N 4 4 4 S 4 M 3 3 A 3 B 3 T

Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 5 6 4 4 3 O 2 3 T 3 2 G 2 D 2 L 2 R 1 C 1 F 1 P 2 U 11 6 6 I 5 5 E 5 N 4 4 4 S 4 M 3 A 3 B

Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 6 5 6 4 4 3 A 3 B 3 O 2 3 T 3 2 G 2 D 2 L 2 R 1 C 1 F 1 P 2 U 11 6 6 6 I 5 5 E 5 N 4 4 4 S 4 M

Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 6 8 5 6 4 S 4 M 4 4 3 A 3 B 3 O 2 3 T 3 2 G 2 D 2 L 2 R 1 C 1 F 1 P 2 U 11 8 6 6 6 I 5 5 E 5 N 4 4

Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 6 8 8 5 6 4 S 4 M 4 4 3 A 3 B 3 O 2 3 T 3 2 G 2 D 2 L 2 R 1 C 1 F 1 P 2 U 11 8 8 6 6 6 I 5 5 E 5 N

Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 10 6 8 8 5 E 5 6 4 S 4 M 4 4 3 A 3 B 3 O 2 3 T 3 2 G 2 D 2 L 2 R 1 C 1 F 1 P 2 U 11 10 8 8 6 6 6 I 5 5 N

Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 10 11 6 8 8 5 E 5 5 N 6 4 S 4 M 4 4 3 A 3 B 3 O 2 3 T 3 2 G 2 D 2 L 2 R 1 C 1 F 1 P 2 U 11 11 10 8 8 6 6 I

Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 12 10 11 6 I 6 8 8 5 E 5 5 N 6 4 S 4 M 4 4 3 A 3 B 3 O 2 3 T 3 2 G 2 D 2 L 2 R 1 C 1 F 1 P 2 U 12 11 11 10 8 8

Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 16 12 10 11 6 I 6 8 8 5 E 5 5 N 6 4 S 4 M 4 4 3 A 3 B 3 O 2 3 T 3 2 G 2 D 2 L 2 R 1 C 1 F 1 P 2 U 16 12 11 11 10

Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 21 16 12 10 11 6 I 6 8 8 5 E 5 5 N 6 4 S 4 M 4 4 3 A 3 B 3 O 2 3 T 3 2 G 2 D 2 L 2 R 1 C 1 F 1 P 2 U 21 16 12 11

Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 23 21 16 12 11 10 11 6 I 6 8 8 5 E 5 5 N 6 4 S 4 M 4 4 3 A 3 B 3 O 2 3 T 3 2 G 2 D 2 L 2 R 1 C 1 F 1 P 2 U 23 21 16

Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 37 23 21 16 12 11 10 11 6 I 6 8 8 5 E 5 5 N 6 4 S 4 M 4 4 3 A 3 B 3 O 2 3 T 3 2 G 2 D 2 L 2 R 1 C 1 F 1 P 2 U 37 23

Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 11 6 I 5 N E 1 F C P 2 U R L D G 3 O T B A 4 M S 8 16 10 21 12 23 37 60 60

Encoding Count occurrence of all occurring character O( ) Build priority queue O( ) Build Huffman tree O( ) Create Table of codes from tree O( ) Write Huffman tree and coded data to file O( )

Properties of Huffman coding Want to minimize weighted path length L(T)of tree T wi is the weight or count of each codeword i di is the leaf corresponding to codeword i How do we calculate character (codeword) frequencies? Huffman coding creates pretty full bushy trees? When would it produce a “bad” tree? How do we produce coded compressed data from input efficiently?

Writing code out to file How do we go from characters to encodings? Build Huffman tree Root-to-leaf path generates encoding Need way of writing bits out to file Platform dependent? Complicated to write bits and read in same ordering See BitInputStream and BitOutputStream classes Depend on each other, bit ordering preserved How do we know bits come from compressed file? Store a magic number

Decoding a message 01100000100001001101 I E N S M A B O T G D L R C F 60 37 23 21 16 12 11 10 11 6 I 6 8 8 5 E 5 5 N 6 4 S 4 M 4 4 3 A 3 B 3 O 2 3 T 3 2 G 2 D 2 L 2 R 1 C 1 F 1 P 2 U

Decoding a message 1100000100001001101 I E N S M A B O T G D L R C F P 60 37 23 21 16 12 11 10 11 6 I 6 8 8 5 E 5 5 N 6 4 S 4 M 4 4 3 A 3 B 3 O 2 3 T 3 2 G 2 D 2 L 2 R 1 C 1 F 1 P 2 U

Decoding a message 100000100001001101 I E N S M A B O T G D L R C F P 60 37 23 21 16 12 11 10 11 6 I 6 8 8 5 E 5 5 N 6 4 S 4 M 4 4 3 A 3 B 3 O 2 3 T 3 2 G 2 D 2 L 2 R 1 C 1 F 1 P 2 U

Decoding a message 00000100001001101 I E N S M A B O T G D L R C F P U 60 37 23 21 16 12 11 10 11 6 I 6 8 8 5 E 5 5 N 6 4 S 4 M 4 4 3 A 3 B 3 O 2 3 T 3 2 G 2 D 2 L 2 R 1 C 1 F 1 P 2 U

G Decoding a message 0000100001001101 I E N S M A B O T G D L R C F P 60 37 23 21 16 12 11 10 11 6 I 6 8 8 5 E 5 5 N 6 4 S 4 M 4 4 3 A 3 B 3 O 2 3 T 3 2 G 2 D 2 L 2 R 1 C 1 F 1 P 2 U G

G Decoding a message 000100001001101 I E N S M A B O T G D L R C F P U 60 37 23 21 16 12 11 10 11 6 I 6 8 8 5 E 5 5 N 6 4 S 4 M 4 4 3 A 3 B 3 O 2 3 T 3 2 G 2 D 2 L 2 R 1 C 1 F 1 P 2 U G

G Decoding a message 00100001001101 I E N S M A B O T G D L R C F P U 60 37 23 21 16 12 11 10 11 6 I 6 8 8 5 E 5 5 N 6 4 S 4 M 4 4 3 A 3 B 3 O 2 3 T 3 2 G 2 D 2 L 2 R 1 C 1 F 1 P 2 U G

G Decoding a message 0100001001101 I E N S M A B O T G D L R C F P U 60 37 23 21 16 12 11 10 11 6 I 6 8 8 5 E 5 5 N 6 4 S 4 M 4 4 3 A 3 B 3 O 2 3 T 3 2 G 2 D 2 L 2 R 1 C 1 F 1 P 2 U G

G Decoding a message 100001001101 I E N S M A B O T G D L R C F P U 60 37 23 21 16 12 11 10 11 6 I 6 8 8 5 E 5 5 N 6 4 S 4 M 4 4 3 A 3 B 3 O 2 3 T 3 2 G 2 D 2 L 2 R 1 C 1 F 1 P 2 U G

GO Decoding a message 00001001101 I E N S M A B O T G D L R C F P U 60 37 23 21 16 12 11 10 11 6 I 6 8 8 5 E 5 5 N 6 4 S 4 M 4 4 3 A 3 B 3 O 2 3 T 3 2 G 2 D 2 L 2 R 1 C 1 F 1 P 2 U GO

GO Decoding a message 0001001101 I E N S M A B O T G D L R C F P U 60 37 23 21 16 12 11 10 11 6 I 6 8 8 5 E 5 5 N 6 4 S 4 M 4 4 3 A 3 B 3 O 2 3 T 3 2 G 2 D 2 L 2 R 1 C 1 F 1 P 2 U GO

GO Decoding a message 001001101 I E N S M A B O T G D L R C F P U 60 37 23 21 16 12 11 10 11 6 I 6 8 8 5 E 5 5 N 6 4 S 4 M 4 4 3 A 3 B 3 O 2 3 T 3 2 G 2 D 2 L 2 R 1 C 1 F 1 P 2 U GO

GO Decoding a message 01001101 I E N S M A B O T G D L R C F P U 60 37 23 21 16 12 11 10 11 6 I 6 8 8 5 E 5 5 N 6 4 S 4 M 4 4 3 A 3 B 3 O 2 3 T 3 2 G 2 D 2 L 2 R 1 C 1 F 1 P 2 U GO

GO Decoding a message 1001101 I E N S M A B O T G D L R C F P U 60 37 23 21 16 12 11 10 11 6 I 6 8 8 5 E 5 5 N 6 4 S 4 M 4 4 3 A 3 B 3 O 2 3 T 3 2 G 2 D 2 L 2 R 1 C 1 F 1 P 2 U GO

GOO Decoding a message 001101 I E N S M A B O T G D L R C F P U 60 37 23 21 16 12 11 10 11 6 I 6 8 8 5 E 5 5 N 6 4 S 4 M 4 4 3 A 3 B 3 O 2 3 T 3 2 G 2 D 2 L 2 R 1 C 1 F 1 P 2 U GOO

GOO Decoding a message 01101 I E N S M A B O T G D L R C F P U 60 37 23 21 16 12 11 10 11 6 I 6 8 8 5 E 5 5 N 6 4 S 4 M 4 4 3 A 3 B 3 O 2 3 T 3 2 G 2 D 2 L 2 R 1 C 1 F 1 P 2 U GOO

GOO Decoding a message 1101 I E N S M A B O T G D L R C F P U 60 37 23 21 16 12 11 10 11 6 I 6 8 8 5 E 5 5 N 6 4 S 4 M 4 4 3 A 3 B 3 O 2 3 T 3 2 G 2 D 2 L 2 R 1 C 1 F 1 P 2 U GOO

GOO Decoding a message 101 I E N S M A B O T G D L R C F P U 60 37 23 21 16 12 11 10 11 6 I 6 8 8 5 E 5 5 N 6 4 S 4 M 4 4 3 A 3 B 3 O 2 3 T 3 2 G 2 D 2 L 2 R 1 C 1 F 1 P 2 U GOO

GOO Decoding a message 01 I E N S M A B O T G D L R C F P U 60 37 23 21 16 12 11 10 11 6 I 6 8 8 5 E 5 5 N 6 4 S 4 M 4 4 3 A 3 B 3 O 2 3 T 3 2 G 2 D 2 L 2 R 1 C 1 F 1 P 2 U GOO

GOOD Decoding a message 1 I E N S M A B O T G D L R C F P U 60 37 23 21 16 12 11 10 11 6 I 6 8 8 5 E 5 5 N 6 4 S 4 M 4 4 3 A 3 B 3 O 2 3 T 3 2 G 2 D 2 L 2 R 1 C 1 F 1 P 2 U GOOD

GOOD Decoding a message 01100000100001001101 I E N S M A B O T G D L R 60 37 23 21 16 12 11 10 11 6 I 6 8 8 5 E 5 5 N 6 4 S 4 M 4 4 3 A 3 B 3 O 2 3 T 3 2 G 2 D 2 L 2 R 1 C 1 F 1 P 2 U GOOD

Decoding Read in tree data O( ) Decode bit string with tree O( )

Huffman Tree 2 “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” E.g. “ A SIMPLE”  “10101101001000101001110011100000”

Huffman Tree 2 “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” E.g. “ A SIMPLE”  “10101101001000101001110011100000”

Huffman Tree 2 “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” E.g. “ A SIMPLE”  “10101101001000101001110011100000”

Huffman Tree 2 “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” E.g. “ A SIMPLE”  “10101101001000101001110011100000”

Huffman Tree 2 “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” E.g. “ A SIMPLE”  “10101101001000101001110011100000”

Huffman Tree 2 “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” E.g. “ A SIMPLE”  “10101101001000101001110011100000”

Huffman Tree 2 “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” E.g. “ A SIMPLE”  “10101101001000101001110011100000”

Huffman Tree 2 “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” E.g. “ A SIMPLE”  “10101101001000101001110011100000”

Huffman Tree 2 “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” E.g. “ A SIMPLE”  “10101101001000101001110011100000”

Other methods Adaptive Huffman coding Lempel-Ziv algorithms Build the coding table on the fly while reading document Coding table changes dynamically Protocol between encoder and decoder so that everyone is always using the right coding scheme Works well in practice (compress, gzip, etc.) More complicated methods Burrows-Wheeler (bunzip2) PPM statistical methods