Download presentation
Presentation is loading. Please wait.
Published byMagdalene Simon Modified over 9 years ago
1
Squishin’ Stuff Huffman Compression
2
Data Compression Begin with a computer file (text, picture, movie, sound, executable, etc) Most file contain extra information or redundancy Goal: Reorganize the file to remove the excess information and redundancy Lossless Compression: Compress the file in such a way that none of the information is lost (good for text files and executables) Lossy Compression: Allow some information to be thrown away in order to get a better level of compression (good for pictures, movies, or sounds) Many, many, many algorithms out there to compress files Different types of files work best with different algorithms (need to consider the structure of the file and how things are connected). We’re going to focus on Huffman compression which is used many compression programs, most notably winzip. We’re just going to play with text files.
3
Text Files Each character is represented by one byte. Each byte is a sequence of 8 bits (1’s and 0’s) (ASCII code). International standard for how a character is represented. A 01000001 B01000010 ~01111110 300110011 Most text files use less than 128 characters; this code has room for 256. Extra information!! Goal: Use shorter codes to represent more frequent characters. You have seen this before…
4
Morse Code A.- B -... C -.-. D -.. E. F..-. G --. H.... I.. J.--- K -.- L.-.. M -- N -. O --- P.--. Q --.- R.-. S... T - U..- V...- W.-- X -..- Y -.-- Z --.. 0 ----- 1.---- 2..--- 3...-- 4....- 5..... 6 -.... 7 --... 8 ---.. 9 ----. Fullstop.-.-.- Comma --..-- Query..--..
5
Example
7
RAWA AWIS RINBABBE That didn’t work. If we do this, we need a way to know when a letter stops. Huffman coding provides this, though we’ll lose some compression. Huffman Coding Named after some guy called Huffman (1952). Use a tree to construct the code, and then use the tree to interpret the code.
8
Huffman Chart
9
Issues and Problems
11
What’s the best you can do? Obviously, there is a limit to how far down you can compress a file. Assume your file has n different characters in it, say a 1 …a n, each with probability p 1 …p n (so p 1 +p 2 +…+p n = 1). The entropy of the file is defined to be negative of the sum of p i log 2 (p i ). Measures the least number of bits, on average, needed to represent a character. For my name, the entropy is 3.12 (takes at least 3.12 bits per character to represent my name). Huffman gave an average of 3.19 bits per character. Huffman compression will always give an average that is within one bit of entropy.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.