Huffman Compression
Compress + Expand
Compress Expand A Tabulate frequency counts Build trie Build codebook (write trie) Encode input Build codebook (read trie) Decode compressed message Compress: compute freq, buildTrie() A Trie Compress: writeTrie() Expand: readTrie() Compress: write length Expand: read length Compress: encode Expand: decode Compressed file format: Codebook Input length Compressed message
Compute frequency counts Eerie eyes seen near lake. Contain letters: ‘E’ ‘e’ ‘r’ ‘i’ ‘y’ ‘s’ ‘n’ ‘a’ ‘r’ ‘l’ ‘k’ ‘.’ space Generate frequency counts: E: 1, e: 8, r: 2, i: 1, y: 1, s: 2, n: 2, a: 2, l: 1, k: 1, .: 1, space: 4
Build trie (buildTrie()) Merge nodes based on frequency until there is only one left Special case: only one node Priority queue delMin: Removes and returns a smallest key on this priority queue. insert: adds a new key to the priority queue. Node{ char ch int freq Node left, right }
Build trie private static Node buildTrie(int[] freq) { MinPQ<Node> pq = new MinPQ<Node>(); for (char ch = 0; ch < R; ++ch if (freq[ch] > 0) pq.insert(new Node(c, freq[c], null, null)); while (pq.size() > 1) { Node x=pq.delMin(); Node y=pq.delMin(); Node parent = new Node(‘\0’, x.freq + y.freq, x, y); pq.insert(parent); } return pq.delMin();
E 1 i y l k . r 2 s n a sp 4 e 8 private static Node buildTrie(int[] freq) { MinPQ<Node> pq = new MinPQ<Node>(); for (char ch = 0; ch < R; ++ch if (freq[ch] > 0) pq.insert(new Node(c, freq[c], null, null)); while (pq.size() > 1) { Node x=pq.delMin(); Node y=pq.delMin(); Node parent = new Node(‘\0’, x.freq + y.freq, x, y); pq.insert(parent); } return pq.delMin();
y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 sp 4 e 8 private static Node buildTrie(int[] freq) { MinPQ<Node> pq = new MinPQ<Node>(); for (char ch = 0; ch < R; ++ch if (freq[ch] > 0) pq.insert(new Node(c, freq[c], null, null)); while (pq.size() > 1) { Node x=pq.delMin(); Node y=pq.delMin(); Node parent = new Node(‘\0’, x.freq + y.freq, x, y); pq.insert(parent); } return pq.delMin(); 2 E 1 i 1
An Introduction to Huffman Coding March 21, 2000 y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 2 sp 4 e 8 E 1 i 1 private static Node buildTrie(int[] freq) { MinPQ<Node> pq = new MinPQ<Node>(); for (char ch = 0; ch < R; ++ch if (freq[ch] > 0) pq.insert(new Node(c, freq[c], null, null)); while (pq.size() > 1) { Node x=pq.delMin(); Node y=pq.delMin(); Node parent = new Node(‘\0’, x.freq + y.freq, x, y); pq.insert(parent); } return pq.delMin(); Mike Scott
An Introduction to Huffman Coding March 21, 2000 k 1 . 1 r 2 s 2 n 2 a 2 2 sp 4 e 8 private static Node buildTrie(int[] freq) { MinPQ<Node> pq = new MinPQ<Node>(); for (char ch = 0; ch < R; ++ch if (freq[ch] > 0) pq.insert(new Node(c, freq[c], null, null)); while (pq.size() > 1) { Node x=pq.delMin(); Node y=pq.delMin(); Node parent = new Node(‘\0’, x.freq + y.freq, x, y); pq.insert(parent); } return pq.delMin(); E 1 i 1 2 y 1 l 1 Mike Scott
An Introduction to Huffman Coding March 21, 2000 2 k 1 . 1 r 2 s 2 n 2 a 2 2 sp 4 e 8 y 1 l 1 E 1 i 1 private static Node buildTrie(int[] freq) { MinPQ<Node> pq = new MinPQ<Node>(); for (char ch = 0; ch < R; ++ch if (freq[ch] > 0) pq.insert(new Node(c, freq[c], null, null)); while (pq.size() > 1) { Node x=pq.delMin(); Node y=pq.delMin(); Node parent = new Node(‘\0’, x.freq + y.freq, x, y); pq.insert(parent); } return pq.delMin(); Mike Scott
An Introduction to Huffman Coding March 21, 2000 r 2 s 2 n 2 a 2 2 2 sp 4 e 8 y 1 l 1 E 1 i 1 private static Node buildTrie(int[] freq) { MinPQ<Node> pq = new MinPQ<Node>(); for (char ch = 0; ch < R; ++ch if (freq[ch] > 0) pq.insert(new Node(c, freq[c], null, null)); while (pq.size() > 1) { Node x=pq.delMin(); Node y=pq.delMin(); Node parent = new Node(‘\0’, x.freq + y.freq, x, y); pq.insert(parent); } return pq.delMin(); 2 k 1 . 1 Mike Scott
An Introduction to Huffman Coding March 21, 2000 r 2 s 2 n 2 a 2 2 sp 4 e 8 2 2 k 1 . 1 E 1 i 1 y 1 l 1 private static Node buildTrie(int[] freq) { MinPQ<Node> pq = new MinPQ<Node>(); for (char ch = 0; ch < R; ++ch if (freq[ch] > 0) pq.insert(new Node(c, freq[c], null, null)); while (pq.size() > 1) { Node x=pq.delMin(); Node y=pq.delMin(); Node parent = new Node(‘\0’, x.freq + y.freq, x, y); pq.insert(parent); } return pq.delMin(); Mike Scott
An Introduction to Huffman Coding March 21, 2000 n 2 a 2 2 sp 4 e 8 2 2 E 1 i 1 y 1 l 1 k 1 . 1 4 r 2 s 2 Mike Scott
An Introduction to Huffman Coding March 21, 2000 4 4 4 2 sp 4 e 8 2 2 r 2 s 2 n 2 a 2 k 1 . 1 E 1 i 1 y 1 l 1 Mike Scott
An Introduction to Huffman Coding March 21, 2000 4 4 4 e 8 2 2 r 2 s 2 n 2 a 2 E 1 i 1 y 1 l 1 6 2 sp 4 k 1 . 1 Mike Scott
An Introduction to Huffman Coding March 21, 2000 4 4 6 4 e 8 2 sp 4 2 2 r 2 s 2 n 2 a 2 k 1 . 1 E 1 i 1 y 1 l 1 Mike Scott
An Introduction to Huffman Coding March 21, 2000 4 6 e 8 2 2 2 sp 4 k 1 . 1 E 1 i 1 y 1 l 1 8 4 4 r 2 s 2 n 2 a 2 Mike Scott
An Introduction to Huffman Coding March 21, 2000 4 6 e 8 8 2 2 2 sp 4 4 4 k 1 . 1 E 1 i 1 y 1 l 1 r 2 s 2 n 2 a 2 Mike Scott
An Introduction to Huffman Coding March 21, 2000 8 e 8 4 4 10 r 2 s 2 n 2 a 2 4 6 2 2 2 sp 4 E 1 i 1 y 1 l 1 k 1 . 1 Mike Scott
An Introduction to Huffman Coding March 21, 2000 8 e 8 10 4 4 4 6 2 2 r 2 s 2 n 2 a 2 2 sp 4 E 1 i 1 y 1 l 1 k 1 . 1 Mike Scott
An Introduction to Huffman Coding March 21, 2000 10 16 4 6 2 2 e 8 8 2 sp 4 E 1 i 1 y 1 l 1 k 1 . 1 4 4 r 2 s 2 n 2 a 2 Mike Scott
An Introduction to Huffman Coding March 21, 2000 10 16 4 6 e 8 8 2 2 2 sp 4 4 4 E 1 i 1 y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 Mike Scott
An Introduction to Huffman Coding March 21, 2000 26 16 10 4 e 8 8 6 2 2 2 sp 4 4 4 E 1 i 1 y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 Mike Scott
An Introduction to Huffman Coding March 21, 2000 26 16 10 4 e 8 8 6 2 2 2 sp 4 4 4 E 1 i 1 y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 Mike Scott
Generate codebook (buildCode()) Perform a traversal of the tree to obtain new code words Going left is a 0 going right is a 1 code word is only completed when a leaf node is reached private static void buildCode(String[] st, Node x, String s) { if (!x.isLeaf()) { buildCode(st, x.left, s + '0'); buildCode(st, x.right, s + '1'); } else { st[x.ch] = s; st is the code book, in which st[ch] stores the code for character ch, for example: st[‘E’]=“0000“ st[‘s’]=“1101” String s records the path to the current node x, for example, “left, left, left, right” is encoded as “0001”
An Introduction to Huffman Coding March 21, 2000 Generate codebook Char Code E 0000 i 0001 y 0010 l 0011 k 0100 . 0101 space 011 e 10 r 1100 s 1101 n 1110 a 1111 E 1 i sp 4 e 8 2 y l k . r s n a 6 10 16 26 Mike Scott
Encode message Eerie eyes seen near lake. Char Code E 0000 i 0001 y 0010 l 0011 k 0100 . 0101 space 011 e 10 r 1100 s 1101 n 1110 a 1111 Eerie eyes seen near lake. 000010110000011001110001010 110110100111110101111110001 1001111110100100101 Codebook Input length Compressed message
Expand Compress A Tabulate frequency counts Build trie Build codebook (write trie) Encode input Build codebook (read trie) Decode compressed message Compress: compute freq, buildTrie() A Trie Compress: writeTrie() Expand: readTrie() Compress: write length Expand: read length Compress: encode Expand: decode Compressed file format: Codebook Input length Compressed message
Build codebook (readTrie()) readTrie / writeTrie How to deserialize (read) trie ? Boolean flag + data If flag == 1 (leaf), then the next 8bits represent a character If flag == 0 (internal node), then recursively desterilize left and right children Codebook Input length Compressed message
Build codebook readTrie / writeTrie sp e - y l k . r s n a readTrie / writeTrie 00001E(8bits)1i(8bits)01y(8bits) 1l(8bits)... // write bitstring-encoded trie to standard output private static void writeTrie(Node x) { if (x.isLeaf()) { BinaryStdOut.write(true); BinaryStdOut.write(x.ch, 8); return; } BinaryStdOut.write(false); writeTrie(x.left); writeTrie(x.right);
Build codebook readTrie / writeTrie sp e - y l k . r s n a readTrie / writeTrie 00001E(8bits)1i(8bits)01y(8bits) 1l(8bits)... private static Node readTrie() { boolean isLeaf = BinaryStdIn.readBoolean(); if (isLeaf) { return new Node(BinaryStdIn.readChar(), -1, null, null); } else { return new Node('\0', -1, readTrie(), readTrie());
Decode message E i sp e - y l k . r s n a Scan the bit string while traversal the tree: 0 go left, 1 go right end at leaf node 000010110000011001110001010 110110100111110101111110001 1001111110100100101 Eerie eyes seen near lake.