File Compression Techniques Alex Robertson. Outline History Lossless vs Lossy Basics Huffman Coding Getting Advanced Lossy Explained Limitations Future.

File Compression Techniques Alex Robertson

Outline History Lossless vs Lossy Basics Huffman Coding Getting Advanced Lossy Explained Limitations Future

History, where this all started The Problem! 1940s Shannon-Fano coding Properties Different codes have different numbers of bits. Codes for symbols with low probabilities have more bits, and codes for symbols with high probabilities have fewer bits. Though the codes are of different bit lengths, they can be uniquely decoded.

Lossless vs Lossy Lossless DEFLATE Data, every little detail is important Lossy JPEG MP3 Data can be lost and unnoticed

Understanding the Basics Properties Different codes have different numbers of bits. Codes for symbols with low probabilities have more bits, and codes for symbols with high probabilities have fewer bits. Though the codes are of different bit lengths, they can be uniquely decoded. Encode “SATA” S = 10A = 0T = 11

Prefix Rule S = 01A = 0T = 00 SATA SAAAA STT 010000 No code can be the prefix of another code. If 0 is a code, 0* can’t be a code.

Make a Tree Create a Tree A = 010 B = 11 C = 00 D = 10 R = 011

Decode 01011011010000101001011011010 A = 010 B = 11 C = 00 D = 10 R = 011 Violates the property: Codes for symbols with low probabilities have more bits, and codes for symbols with high probabilities have fewer bits.

Huffman Coding Create a Tree Encode “ABRACADABRA” Determine Frequencies 1. The two least frequent “nodes” are located. 2. A parent node is created from the two above nodes and it is given a weight equal to the sum of the two contain node frequencies. 3. One of the child nodes is given the 0 bit and the other the 1 bit 4. Repeat the above steps until only one node is left.

Does it work? Re-encode 01011011010000101001011011010 29 bits

It Works! 01011001110011110101100 = 23 bits ABRACADABRA = 11 character * 7 bits each = 77 bits but…

It Works… With Issues. Header includes the probability table Not the best in certain cases Example. ‘A’ 100 times Huffman only reduces this to 100 bits (minus the header)

Moving Forward Arithmetic Method Not Specific Code Continuously changing single floating- point output number Example

“BILL GATES” CharacterProbabilityRange SPACE1/100.0 >= r > 0.1 A1/100.1 >= r > 0.2 B1/100.2 >= r > 0.3 E1/100.3 >= r > 0.4 G1/100.4 >= r > 0.5 I1/100.5 >= r > 0.6 L2/100.6 >= r > 0.8 S1/100.8 >= r > 0.9 T1/100.9 >= r > 1.0

Dictionary Based Implemented in the late 70s Uses previously seen words as a dictionary. the quick brown fox jumped over the lazy dog I bought a Mississippi Banana in Mississippi.

Lossy Compression Lossy Formula Lossless Formula My Sound!

Mathematical Limitations Claude E. Shannon http://www.data-compression.com/theory.html

Example DEFLATE http://en.wikipedia.org/wiki/DEFLATE

Future Hardware is getting better Theories are the same

Thanks You Questions

File Compression Techniques Alex Robertson. Outline History Lossless vs Lossy Basics Huffman Coding Getting Advanced Lossy Explained Limitations Future.

Similar presentations

Presentation on theme: "File Compression Techniques Alex Robertson. Outline History Lossless vs Lossy Basics Huffman Coding Getting Advanced Lossy Explained Limitations Future."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

File Compression Techniques Alex Robertson. Outline History Lossless vs Lossy Basics Huffman Coding Getting Advanced Lossy Explained Limitations Future.

Similar presentations

Presentation on theme: "File Compression Techniques Alex Robertson. Outline History Lossless vs Lossy Basics Huffman Coding Getting Advanced Lossy Explained Limitations Future."— Presentation transcript:

Similar presentations

About project

Feedback