CompSci 100e 8.1 Plan for the Course! l Understand Huffman Coding  Data compression  Priority Queues  Bits and Bytes  Greedy Algorithms l Algorithms.

Slides:



Advertisements
Similar presentations
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Advertisements

Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Binary Trees CSC 220. Your Observations (so far data structures) Array –Unordered Add, delete, search –Ordered Linked List –??
File Processing - Organizing file for Performance MVNC1 Organizing Files for Performance Chapter 6 Jim Skon.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Trees Chapter 8.
Fall 2007CS 2251 Trees Chapter 8. Fall 2007CS 2252 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information.
Trees Chapter 8. Chapter 8: Trees2 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information To learn how.
Trees Chapter 8. Chapter 8: Trees2 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information To learn how.
A Data Compression Algorithm: Huffman Compression
CS 206 Introduction to Computer Science II 04 / 29 / 2009 Instructor: Michael Eckmann.
Chapter 9: Huffman Codes
CS 206 Introduction to Computer Science II 12 / 10 / 2008 Instructor: Michael Eckmann.
1 CSC 427: Data Structures and Algorithm Analysis Fall 2010 transform & conquer  transform-and-conquer approach  balanced search trees o AVL, 2-3 trees,
CSE Lectures 22 – Huffman codes
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
CS 46B: Introduction to Data Structures July 30 Class Meeting Department of Computer Science San Jose State University Summer 2015 Instructor: Ron Mak.
Computers Organization & Assembly Language
Data Structures Arrays both single and multiple dimensions Stacks Queues Trees Linked Lists.
Trees. Tree Terminology Chapter 8: Trees 2 A tree consists of a collection of elements or nodes, with each node linked to its successors The node at the.
Compsci 100, Fall ’.1 Views of programming l Writing code from the method/function view is pretty similar across languages  Organizing methods.
CompSci 100 Prog Design and Analysis II
Huffman Encoding Veronica Morales.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
CPS 100, Fall YAQ, YAQ, haha! (Yet Another Queue) l What is the dequeue policy for a Queue?  Why do we implement Queue with LinkedList Interface.
Trees Chapter 8. Chapter 8: Trees2 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information To learn how.
Spring 2010CS 2251 Trees Chapter 6. Spring 2010CS 2252 Chapter Objectives Learn to use a tree to represent a hierarchical organization of information.
CPS 100, Fall PF-AT-EOS l Data compression  Seriously important, why? l Priority Queue  Seriously important, why? l Assignment overview  Huff.
ICS 220 – Data Structures and Algorithms Lecture 11 Dr. Ken Cosh.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
UTILITIES Group 3 Xin Li Soma Reddy. Data Compression To reduce the size of files stored on disk and to increase the effective rate of transmission by.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
CompSci 100E 24.1 Data Compression  Compression is a high-profile application .zip,.mp3,.jpg,.gif,.gz, …  What property of MP3 was a significant factor.
Priority Queues, Trees, and Huffman Encoding CS 244 This presentation requires Audio Enabled Brent M. Dingle, Ph.D. Game Design and Development Program.
Compsci 100, Fall Review: Why Compression l What gets compressed?  Save on storage, why is this a good idea?  Save on data transmission, how.
CPS Heaps, Priority Queues, Compression l Compression is a high-profile application .zip,.mp3,.jpg,.gif,.gz, …  What property of MP3 was a significant.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
CompSci 100e Program Design and Analysis II April 7, 2011 Prof. Rodger CompSci 100e, Spring p 1 h 1 2 e 1 r 1 4 s 1 * 2 7 g 3 o
CPS 100, Spring Huffman Coding l D.A Huffman in early 1950’s l Before compressing data, analyze the input stream l Represent data using variable.
Building Java Programs Priority Queues, Huffman Encoding.
Main Index Contents 11 Main Index Contents Complete Binary Tree Example Complete Binary Tree Example Maximum and Minimum Heaps Example Maximum and Minimum.
CompSci 100e 8.1 Scoreboard l What else might we want to do with a data structure? AlgorithmInsertionDeletionSearch Unsorted Vector/array Sorted vector/array.
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
1. What is it? It is a queue that access elements according to their importance value. Eg. A person with broken back should be treated before a person.
CPS Heaps, Priority Queues, Compression l Compression is a high-profile application .zip,.mp3,.jpg,.gif,.gz, …  Why is compression important?
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Basics
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
CompSci 100e 10.1 Binary Digits (Bits) l Yes or No l On or Off l One or Zero l
CompSci From bits to bytes to ints  At some level everything is stored as either a zero or a one  A bit is a binary digit a byte is a binary.
Lecture on Data Structures(Trees). Prepared by, Jesmin Akhter, Lecturer, IIT,JU 2 Properties of Heaps ◈ Heaps are binary trees that are ordered.
From bits to bytes to ints
HUFFMAN CODES.
B/B+ Trees 4.7.
Bottom up meets Top down
Top 50 Data Structures Interview Questions
Data Compression Compression is a high-profile application
Compsci 201 Priority Queues & Autocomplete
CompSci 201 Data Representation & Huffman Coding
13 Text Processing Hongfei Yan June 1, 2016.
Heaps, Priority Queues, Compression
Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.
Chapter 8 – Binary Search Tree
Chapter 9: Huffman Codes
Huffman Coding CSE 373 Data Structures.
Podcast Ch23d Title: Huffman Compression
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Scoreboard What else might we want to do with a data structure?
From bit to byte to char to int to long
Presentation transcript:

CompSci 100e 8.1 Plan for the Course! l Understand Huffman Coding  Data compression  Priority Queues  Bits and Bytes  Greedy Algorithms l Algorithms + Data Structures = Programs  What does this mean and who said it? l Graphs & the Oracle of Bacon

CompSci 100e 8.2 Scoreboard l What else might we want to do with a data structure? AlgorithmInsertionDeletionSearch Unsorted ArrayList Sorted ArrayList Linked list Hash Table/Map Binary Search Tree

CompSci 100e 8.3 Text Compression l Input: String S Output: String S  Shorter  S can be reconstructed from S

CompSci 100e 8.4 Text Compression: Examples SymbolASCIIFixed length Var. length a b c d e “abcde” in the different formats ASCII: … Fixed: Var: abcde a d bce Encodings ASCII: 8 bits/character Unicode: 16 or 32 bits/character

CompSci 100e 8.5 Huffman coding: go go gophers l Encoding uses tree:  0 left/1 right  How many bits? 37!!  Savings? Worth it? ASCII 3 bits Huffman g o p h e r s sp s 1 * 2 2 p 1 h 1 2 e 1 r 1 4 g 3 o p 1 h 1 2 e 1 r 1 4 s 1 * 2 7 g 3 o

CompSci 100e 8.6 Huffman Coding l D.A Huffman in early 1950’s l Before compressing data, analyze the input stream l Represent data using variable length codes l Variable length codes though Prefix codes  Each letter is assigned a codeword  Codeword is for a given letter is produced by traversing the Huffman tree  Property: No codeword produced is the prefix of another  Letters appearing frequently have short codewords, while those that appear rarely have longer ones l Huffman coding is optimal per-character coding method

CompSci 100e 8.7 Building a Huffman tree l Begin with a forest of single-node trees (leaves)  Each node/tree/leaf is weighted with character count  Node stores two values: character and count  There are n nodes in forest, n is size of alphabet? l Repeat until there is only one node left: root of tree  Remove two minimally weighted trees from forest  Create new tree with minimal trees as children, New tree root's weight: sum of children (character ignored) l Does this process terminate? How do we get minimal trees?  Remove minimal trees, need to order based on what?

CompSci 100e 8.8 Priority Queue l Stacks: Last-in First-Out  java.util.Stack, java.util.Deque l Queues: First-in First-out  java.util.LinkedList, java.util.Deque l Priority Queues: Highest-priority first-out  java.util.PriorityQueue  Supports two basic operations insert -– an element into the priority queue delete – the minimal element from the priority queue  Code below sorts. Complexity? public static void sort(ArrayList a){ PriorityQueue pq = new PriorityQueue (); pq.addAll(a); for(int k=0; k < a.size(); k++) a.set(k, pq.remove()); }

CompSci 100e 8.9 Priority Queues l Basic operations  Insert  Remove extremal l What properties must the data have? l Applications  Event-driven simulation:Colliding particles  AIA* - Best-first search  Operating systemsLoad balancing & scheduling  StatisticsMaintain largest m values  Graph searchingDijkstra's algorithm  Data Compression: Huffman coding  PhysicsMolecular dynamics simulation

CompSci 100e 8.10 PriorityQueue.java l What about objects inserted into pq?  If deletemin is supported, what properties must inserted objects have, e.g., insert non-comparable?  Change what minimal means?  Implementation uses heap l If we use a Comparator for comparing entries we can make a min-heap act like a max-heap, see PQDemo  Where is class Comparator declaration? How used?  What's a static inner class? A non-static inner class? In Java 5/6 there is a Queue interface and PriorityQueue class  The PriorityQueue class also uses a heap

CompSci 100e 8.11 Heap implements PriorityQueue l Heap is an array-based implementation of a binary tree used for implementing priority queues, supports:  insert, findmin, deletemin: complexities? l Using array minimizes storage (no explicit pointers), faster too --- children are located by index/position in array l Heap is a binary tree with shape property, heap/value property  shape: tree filled at all levels (except perhaps last) and filled left-to-right (complete binary tree)  each node has value smaller than both children

CompSci 100e 8.12 Array-based heap l store “node values” in array beginning at index 1 l for node with index k  left child: index 2*k  right child: index 2*k+1 l why is this conducive for maintaining heap shape? l what about heap property? l is the heap a search tree? l where is minimal node? l where are nodes added? deleted?

CompSci 100e 8.13 Thinking about heaps Where is minimal element?  Root, why? l Where is maximal element?  Leaves, why? l How many leaves are there in an N-node heap (big-Oh)?  O(n), but exact? l What is complexity of find max in a minheap? Why?  O(n), but ½ N? l Where is second smallest element? Why?  Near root?

CompSci 100e 8.14 Adding values to heap l to maintain heap shape, must add new value in left-to-right order of last level  could violate heap property  move value “up” if too small l change places with parent if heap property violated  stop when parent is smaller  stop when root is reached l pull parent down, swapping isn’t necessary (optimization) insert 8 bubble 8 up

CompSci 100e 8.15 Adding values, details (pseudocode) void add(Object elt) { // add elt to heap in myList myList.add(elt); int loc = myList.size(); while (1 < loc && elt.compareTo(myList[loc/2]) < 0 ) { myList[loc] = myList[loc/2]; loc = loc/2; // go to parent } // what’s true here? myList.set(loc,elt); } int[] myList

CompSci 100e 8.16 Removing minimal element l Where is minimal element?  If we remove it, what changes, shape/property? l How can we maintain shape?  “last” element moves to root  What property is violated? l After moving last element, subtrees of root are heaps, why?  Move root down (pull child up) does it matter where? l When can we stop “re- heaping”?  Less than both children  Reach a leaf

CompSci 100e 8.17 Priority Queue implementations l Implementing priority queues: average and worst case Insert average Getmin (peek) Insert worst Getmin (delete) Unsorted vector Sorted vector Heap Balanced binary search tree ???? l Heap has O(1) find-min (no delete) and O(n) build heap

CompSci 100e 8.18 How do we create Huffman Tree/Trie? l Insert weighted values into priority queue  What are initial weights? Why?  l Remove minimal nodes, weight by sums, re-insert  Total number of nodes? PriorityQueue pq = new PriorityQueue (); for(int k=0; k < freq.length; k++){ pq.add(new TreeNode(k,freq[k],null,null)); } while (pq.size() > 1){ TreeNode left = pq.remove(); TreeNode right = pq.remove(); pq.add(new TreeNode(0,left.weight+right.weight, left,right)); } TreeNode root = pq.remove();

CompSci 100e 8.19 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 116 I 5 E 5 N 1 C 1 F 1 P 2 U 2 R 2 L 2 D 2 G 3 T 3 O 3 B 3 A 4 M 4 S

CompSci 100e 8.20 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 116 I 5 E 5 N 1 C 1 F 1 P 2 U 2 R 2 L 2 D 2 G 3 T 3 O 3 B 3 A 4 M 4 S 2 2

CompSci 100e 8.21 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 116 I 5 E 5 N 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 T 3 O 3 B 3 A 4 M 4 S

CompSci 100e 8.22 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 116 I 5 E 5 N 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 T 3 O 3 B 3 A 4 M 4 S

CompSci 100e 8.23 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 116 I 5 E 5 N 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 T 3 O 3 B 3 A 4 M 4 S

CompSci 100e 8.24 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 116 I 5 E 5 N 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S

CompSci 100e 8.25 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 116 I 5 E 5 N 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S

CompSci 100e 8.26 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 116 I 5 E 5 N 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S

CompSci 100e 8.27 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 116 I 5 E 5 N 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S

CompSci 100e 8.28 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 116 I 5 E 5 N 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S

CompSci 100e 8.29 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 116 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S

CompSci 100e 8.30 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 116 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S

CompSci 100e 8.31 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S

CompSci 100e 8.32 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S

CompSci 100e 8.33 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S

CompSci 100e 8.34 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S

CompSci 100e 8.35 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S

CompSci 100e 8.36 Building a tree “A SIMPLE STRING TO BE ENCODED USING A MINIMAL NUMBER OF BITS” 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S

CompSci 100e 8.37 Encoding 1. Count occurrence of all occurring character O( ) 2. Build priority queue O ( ) 3. Build Huffman tree O( ) 4. Create Table of codes from tree O( ) 5. Write Huffman tree and coded data to file O( )

CompSci 100e 8.38 Properties of Huffman coding l Want to minimize weighted path length L ( T )of tree T l  w i is the weight or count of each codeword i  d i is the leaf corresponding to codeword i l How do we calculate character (codeword) frequencies? l Huffman coding creates pretty full bushy trees?  When would it produce a “bad” tree? l How do we produce coded compressed data from input efficiently?

CompSci 100e 8.39 Writing code out to file l How do we go from characters to encodings?  Build Huffman tree  Root-to-leaf path generates encoding l Need way of writing bits out to file  Platform dependent?  Complicated to write bits and read in same ordering l See BitInputStream and BitOutputStream classes  Depend on each other, bit ordering preserved l How do we know bits come from compressed file?  Store a magic number

CompSci 100e 8.40 Creating compressed file l Once we have new encodings, read every character  Write encoding, not the character, to compressed file  Why does this save bits?  What other information needed in compressed file? l How do we uncompress?  How do we know foo.hf represents compressed file?  Is suffix sufficient? Alternatives? l Why is Huffman coding a two-pass method?  Alternatives?

CompSci 100e 8.41 Uncompression with Huffman l We need the trie to uncompress  l As we read a bit, what do we do?  Go left on 0, go right on 1  When do we stop? What to do? l How do we get the trie?  How did we get it originally? Store 256 int/counts How do we read counts?  How do we store a trie? Traversal order? Reading a trie? Leaf indicator? Node values?

CompSci 100e 8.42 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S

CompSci 100e 8.43 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S

CompSci 100e 8.44 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S

CompSci 100e 8.45 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S

CompSci 100e 8.46 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S G

CompSci 100e 8.47 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S G

CompSci 100e 8.48 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S G

CompSci 100e 8.49 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S G

CompSci 100e 8.50 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S G

CompSci 100e 8.51 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S GO

CompSci 100e 8.52 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S GO

CompSci 100e 8.53 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S GO

CompSci 100e 8.54 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S GO

CompSci 100e 8.55 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S GO

CompSci 100e 8.56 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S GOO

CompSci 100e 8.57 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S GOO

CompSci 100e 8.58 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S GOO

CompSci 100e 8.59 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S GOO

CompSci 100e 8.60 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S GOO

CompSci 100e 8.61 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S GOOD

CompSci 100e 8.62 Decoding a message 11 6 I 5 N 5 E 1 F 1 C 1 P 2 U 2 R 2 L 2 D 2 G 3 O 3 T 3 B 3 A 4 M 4 S GOOD

CompSci 100e 8.63 Decoding 1. Read in tree dataO( ) 2. Decode bit string with treeO( )

CompSci 100e 8.64 Other Huffman Issues l What do we need to decode?  How did we encode? How will we decode?  What information needed for decoding? l Reading and writing bits: chunks and stopping  Can you write 3 bits? Why not? Why?  PSEUDO_EOF  BitInputStream and BitOutputStream: API l What should happen when the file won’t compress?  Silently compress bigger? Warn user? Alternatives?

CompSci 100e 8.65 Good Compsci 100(e) Assignment? l Array of character/chunk counts, or is this a map?  Map character/chunk to count, why array? l Priority Queue for generating tree/trie  Do we need a heap implementation? Why? l Tree traversals for code generation, uncompression  One recursive, one not, why and which? l Deal with bits and chunks rather than ints and chars  The good, the bad, the ugly l Create a working compression program  How would we deploy it? Make it better? l Benchmark for analysis  What’s a corpus ?

CompSci 100e 8.66 Other methods l Adaptive Huffman coding l Lempel-Ziv algorithms  Build the coding table on the fly while reading document  Coding table changes dynamically  Protocol between encoder and decoder so that everyone is always using the right coding scheme  Works well in practice ( compress, gzip, etc.) l More complicated methods  Burrows-Wheeler ( bunzip2 )  PPM statistical methods

CompSci 100e 8.67 Data Compression YearSchemeBit/Cha r 1967ASCII Huffman Lempel-Ziv (LZ77) Lempel-Ziv-Welch (LZW) – Unix compress (LZH) used by zip and unzip Move-to-front gzip Burrows-Wheeler BOA (statistical data compression) 1.99 l Why is data compression important? l How well can you compress files losslessly?  Is there a limit?  How to compare? l How do you measure how much information?

CompSci 100e 8.68 Views of programming l Writing code from the method/function view is pretty similar across languages  Organizing methods is different, organizing code is different, not all languages have classes,  Loops, arrays, arithmetic, … l Program using abstractions and high level concepts  Do we need to understand 32-bit twos-complement storage to understand x =x+1?  Do we need to understand how arrays map to contiguous memory to use ArrayLists?  Top-down vs. bottom-up?

CompSci 100e 8.69 From bit to byte to char to int to long l Ultimately everything is stored as either a 0 or 1  Bit is binary digit a byte is a binary term (8 bits)  We should be grateful we can deal with Strings rather than sequences of 0's and 1's.  We should be grateful we can deal with an int rather than the 32 bits that comprise an int l If we have 255 values for R, G, B, how can we pack this into an int?  Why should we care, can’t we use one int per color?  How do we do the packing and unpacking?

CompSci 100e 8.70 Signed, unsigned, and why we care l Some applications require attention to memory-use  Differences: one-million bytes, chars, and int First requires a megabyte, last requires four megabytes When do we care about these differences? l int values are stored as two's complement numbers with 32 bits, for 64 bits use the type long, a char is 16 bits l Java byte, int, long are signed values, char unsigned Java signed byte : , # bits?  What if we only want 0-255? (Huff, pixels, …)  Convert negative values or use char, trade-offs? Java char unsigned: 0..65,536 # bits?  Why is char unsigned? Why not as in C++/C?

CompSci 100e 8.71 More details about bits l How is 13 represented?  … _0_ _0_ _1_ _1_ _0_ _1_  Total is = 13 l What is bit representation of 32? Of 15? Of 1023?  What is bit-representation of 2 n - 1 ?  What is bit-representation of 0? Of -1? Study later, but -1 is all 1’s, left-most bit determines < 0 l Determining what bits are on? How many on?  Understanding, problem-solving

CompSci 100e 8.72 How are data stored? l To facilitate Huffman coding we need to read/write one bit  Why do we need to read one bit?  Why do we need to write one bit?  When do we read 8 bits at a time? 32 bits? l We can't actually write one bit-at-a-time. We can't really write one char at a time either.  Output and input are buffered,minimize memory accesses and disk accesses  Why do we care about this when we talk about data structures and algorithms? Where does data come from?

CompSci 100e 8.73 How do we buffer char output? l Done for us as part of InputStream and Reader classes  InputStreams are for reading bytes  Readers are for reading char values  Why do we have both and how do they interact? Reader r = new InputStreamReader(System.in);  Do we need to flush our buffers? l In the past Java IO has been notoriously slow  Do we care about I? About O?  This is changing, and the java.nio classes help Map a file to a region in memory in one operation

CompSci 100e 8.74 Buffer bit output l To buffer bits we store bits in a buffer (duh)  When the buffer is full, we write it.  The buffer might overflow, e.g., in process of writing 10 bits to 32-bit capacity buffer that has 29 bits in it  How do we access bits, add to buffer, etc.? l We need to use bit operations  Mask bits -- access individual bits  Shift bits – to the left or to the right  Bitwise and/or/negate bits

CompSci 100e 8.75 Representing pixels l Pixel typically stores RGB and alpha/transparency values  Each RGB is a value in the range 0 to 255  The alpha value is also in range 0 to 255 Pixel red = new Pixel(255,0,0,0); Pixel white = new Pixel(255,255,255,0); l A picture is simply an array of int values void process(int pixel){ int blue = pixel & 0xff; int green = (pixel >> 8) & 0xff; int red = (pixel >> 16) & 0xff; }

CompSci 100e 8.76 Bit masks and shifts void process(int pixel){ int blue = pixel & 0xff; int green = (pixel >> 8) & 0xff; int red = (pixel >> 16) & 0xff; } l Hexadecimal number: 0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f  f is 15, in binary this is 1111, one less than  The hex number 0xff is an 8 bit number, all ones l Bitwise & operator creates an 8 bit value, 0—255  Must use an int/char, what happens with byte?  1&1 == 1, otherwise we get 0 like logical and  Similarly we have |, bitwise or