Compression & Huffman Codes

Slides:



Advertisements
Similar presentations
T.Sharon-A.Frank 1 Multimedia Compression Basics.
Advertisements

Data Compression CS 147 Minh Nguyen.
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Image Compression. Data and information Data is not the same thing as information. Data is the means with which information is expressed. The amount of.
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Greedy Algorithms Amihood Amir Bar-Ilan University.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture3.
SIMS-201 Compressing Information. 2  Overview Chapter 7: Compression Introduction Entropy Huffman coding Universal coding.
Greedy Algorithms (Huffman Coding)
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Data Compression Michael J. Watts
Compression & Huffman Codes
Introduction to Data Compression
Optimal Merging Of Runs
Compression JPG compression, Source: Original 10:1 Compression 45:1 Compression.
A Data Compression Algorithm: Huffman Compression
DL Compression – Beeri/Feitelson1 Compression דחיסה Introduction Information theory Text compression IL compression.
Compression & Huffman Codes Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
CS 206 Introduction to Computer Science II 04 / 29 / 2009 Instructor: Michael Eckmann.
CS 206 Introduction to Computer Science II 12 / 10 / 2008 Instructor: Michael Eckmann.
Data Compression Basics & Huffman Coding
Spring 2015 Mathematics in Management Science Binary Linear Codes Two Examples.
8. Compression. 2 Video and Audio Compression Video and Audio files are very large. Unless we develop and maintain very high bandwidth networks (Gigabytes.
Chapter 2 Source Coding (part 2)
Algorithm Design & Analysis – CS632 Group Project Group Members Bijay Nepal James Hansen-Quartey Winter
Data Compression1 File Compression Huffman Tries ABRACADABRA
Huffman Encoding Veronica Morales.
1 Analysis of Algorithms Chapter - 08 Data Compression.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
COMPRESSION. Compression in General: Why Compress? So Many Bits, So Little Time (Space) CD audio rate: 2 * 2 * 8 * = 1,411,200 bps CD audio storage:
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman Code and Data Decomposition Pranav Shah CS157B.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Bahareh Sarrafzadeh 6111 Fall 2009
1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
بسم الله الرحمن الرحيم My Project Huffman Code. Introduction Introduction Encoding And Decoding Encoding And Decoding Applications Applications Advantages.
Huffman code and Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Lecture 12 Huffman Coding King Fahd University of Petroleum & Minerals College of Computer Science & Engineering Information & Computer Science Department.
Computer Sciences Department1. 2 Data Compression and techniques.
Submitted To-: Submitted By-: Mrs.Sushma Rani (HOD) Aashish Kr. Goyal (IT-7th) Deepak Soni (IT-8 th )
Data Compression: Huffman Coding in Weiss (p.389)
Data Compression Michael J. Watts
Textbook does not really deal with compression.
Design & Analysis of Algorithm Huffman Coding
HUFFMAN CODES.
COMP261 Lecture 22 Data Compression 2.
Data Coding Run Length Coding
Data Compression.
Tries 07/28/16 11:04 Text Compression
Information and Coding Theory
Data Compression.
Chapter 5. Greedy Algorithms
Applied Algorithmics - week7
The Greedy Method and Text Compression
The Greedy Method and Text Compression
Data Compression.
Huffman Coding, Arithmetic Coding, and JBIG2
Data Compression CS 147 Minh Nguyen.
Heaps, Priority Queues, Compression
Chapter 9: Huffman Codes
Why Compress? To reduce the volume of data to be transmitted (text, fax, images) To reduce the bandwidth required for transmission and to reduce storage.
Topic 3: Data Compression.
Huffman Coding CSE 373 Data Structures.
Data Structure and Algorithms
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Presentation transcript:

Compression & Huffman Codes Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park

Compression Definition Benefits Reduce size of data (number of bits needed to represent data) Benefits Reduce storage needed Reduce transmission cost / latency / bandwidth

Compression Examples Tools Formats winzip, pkzip, compress, gzip Images .jpg, .gif Audio .mp3, .wav Video mpeg1 (VCD), mpeg2 (DVD), mpeg4 (Divx) General .zip, .gz

Sources of Compressibility Redundancy Recognize repeating patterns Exploit using Dictionary Variable length encoding Human perception Less sensitive to some information Can discard less important data

Types of Compression Lossless Lossy Preserves all information Exploits redundancy in data Applied to general data Lossy May lose some information Exploits redundancy & human perception Applied to audio, image, video

Effectiveness of Compression Metrics Bits per byte (8 bits) 2 bits / byte  ¼ original size 8 bits / byte  no compression Percentage 75% compression  ¼ original size

Effectiveness of Compression Depends on data Random data  hard Example: 1001110100  ? Organized data  easy Example: 1111111111  110 Corollary No universally best compression algorithm

Effectiveness of Compression Compression is not guaranteed Pigeonhole principle Reduce size 1 bit  can only store ½ of data Example 000, 001, 010, 011, 100, 101, 110, 111  00, 01, 10, 11 If compression is always possible (alternative view) Compress file (reduce size by 1 bit) Recompress output Repeat (until we can store data with 0 bits)

Lossless Compression Techniques LZW (Lempel-Ziv-Welch) compression Build pattern dictionary Replace patterns with index into dictionary Burrows-Wheeler transform Block sort data to improve compression Run length encoding Find & compress repetitive sequences Huffman code Use variable length codes based on frequency

Huffman Code Approach Principle Variable length encoding of symbols Exploit statistical frequency of symbols Efficient when symbol probabilities vary widely Principle Use fewer bits to represent frequent symbols Use more bits to represent infrequent symbols A A B A A A B A

Huffman Code Example Expected size Symbol Dog Cat Bird Fish Frequency Original  1/82 + 1/42 + 1/22 + 1/82 = 2 bits / symbol Huffman  1/83 + 1/42 + 1/21 + 1/83 = 1.75 bits / symbol Symbol Dog Cat Bird Fish Frequency 1/8 1/4 1/2 Original Encoding 00 01 10 11 2 bits Huffman Encoding 110 111 3 bits 1 bit

Huffman Code Data Structures Binary (Huffman) tree Represents Huffman code Edge  code (0 or 1) Leaf  symbol Path to leaf  encoding Example A = “11”, H = “10”, C = “0” Priority queue To efficiently build binary tree A H C 1 1

Huffman Code Algorithm Overview Encoding Calculate frequency of symbols in file Create binary tree representing “best” encoding Use binary tree to encode compressed file For each symbol, output path from root to leaf Size of encoding = length of path Save binary tree

Huffman Code – Creating Tree Algorithm Place each symbol in leaf Weight of leaf = symbol frequency Select two trees L and R (initially leafs) Such that L, R have lowest frequencies in tree Create new (internal) node Left child  L Right child  R New frequency  frequency( L ) + frequency( R ) Repeat until all nodes merged into one tree

Huffman Tree Construction 1 5 8 2 7 3

Huffman Tree Construction 2 5 8 7 3 2 5

Huffman Tree Construction 3 8 7 3 2 C 5 5 10

Huffman Tree Construction 4 8 7 3 2 C 15 5 5 10

Huffman Tree Construction 5 3 2 C E I 1 5 8 7 5 1 1 15 10 1 25

Huffman Coding Example Huffman code Input ACE Output (111)(10)(01) = 1111001 E = 01 I = 00 C = 10 A = 111 H = 110

Huffman Code Algorithm Overview Decoding Read compressed file & binary tree Use binary tree to decode file Follow path from root to leaf

Huffman Decoding 1 A H 1111001 3 2 C E I 1 5 8 7 5 1 1 15 10 1 25

Huffman Decoding 2 A H 1111001 3 2 C E I 1 5 8 7 5 1 1 15 10 1 25

Huffman Decoding 3 A H 1111001 A 3 2 C E I 1 5 8 7 5 1 1 15 10 1 25

Huffman Decoding 4 A H 1111001 A 3 2 C E I 1 5 8 7 5 1 1 15 10 1 25

Huffman Decoding 5 A H 1111001 AC 3 2 C E I 1 5 8 7 5 1 1 15 10 1 25

Huffman Decoding 6 A H 1111001 AC 3 2 C E I 1 5 8 7 5 1 1 15 10 1 25

Huffman Decoding 7 A H 1111001 ACE 3 2 C E I 1 5 8 7 5 1 1 15 10 1 25

Huffman Code Properties Prefix code No code is a prefix of another code Example Huffman(“dog”)  ab Huffman(“cat”)  abc // not legal prefix code Can stop as soon as complete code found No need for end-of-code marker Nondeterministic Multiple Huffman coding possible for same input If more than two trees with same minimal weight

Huffman Code Properties Greedy algorithm Chooses best local solution at each step Combines 2 trees with lowest frequency Still yields overall best solution Optimal prefix code Based on statistical frequency Better compression possible (depends on data) Using other approaches (e.g., pattern dictionary)