Data Compression Gabriel Laden CS146 – Dr. Sin-Min Lee Spring 2004.

Slides:



Advertisements
Similar presentations
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Advertisements

Algorithm Design Techniques: Greedy Algorithms. Introduction Algorithm Design Techniques –Design of algorithms –Algorithms commonly used to solve problems.
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Greedy Algorithms Amihood Amir Bar-Ilan University.
22C:19 Discrete Structures Trees Spring 2014 Sukumar Ghosh.
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Greedy Algorithms (Huffman Coding)
CS38 Introduction to Algorithms Lecture 5 April 15, 2014.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Lecture04 Data Compression.
Compression & Huffman Codes
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
CPSC 411, Fall 2008: Set 4 1 CPSC 411 Design and Analysis of Algorithms Set 4: Greedy Algorithms Prof. Jennifer Welch Fall 2008.
Chapter 9: Greedy Algorithms The Design and Analysis of Algorithms.
A Data Compression Algorithm: Huffman Compression
DL Compression – Beeri/Feitelson1 Compression דחיסה Introduction Information theory Text compression IL compression.
CS 206 Introduction to Computer Science II 04 / 29 / 2009 Instructor: Michael Eckmann.
Data Structures – LECTURE 10 Huffman coding
Chapter 9: Huffman Codes
CS 206 Introduction to Computer Science II 12 / 10 / 2008 Instructor: Michael Eckmann.
Greedy Algorithms Huffman Coding
CPSC 411, Fall 2008: Set 4 1 CPSC 411 Design and Analysis of Algorithms Set 4: Greedy Algorithms Prof. Jennifer Welch Fall 2008.
16.Greedy algorithms Hsu, Lih-Hsing. Computer Theory Lab. Chapter 16P An activity-selection problem Suppose we have a set S = {a 1, a 2,..., a.
1 Lossless Compression Multimedia Systems (Module 2) r Lesson 1: m Minimum Redundancy Coding based on Information Theory: Shannon-Fano Coding Huffman Coding.
Huffman Codes Message consisting of five characters: a, b, c, d,e
CSE Lectures 22 – Huffman codes
Algorithm Design & Analysis – CS632 Group Project Group Members Bijay Nepal James Hansen-Quartey Winter
Data Compression1 File Compression Huffman Tries ABRACADABRA
Huffman Encoding Veronica Morales.
1 Analysis of Algorithms Chapter - 08 Data Compression.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
CS-2852 Data Structures LECTURE 13B Andrew J. Wozniewicz Image copyright © 2010 andyjphoto.com.
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd.
Dijkstra’s Algorithm. Announcements Assignment #2 Due Tonight Exams Graded Assignment #3 Posted.
File Compression Techniques Alex Robertson. Outline History Lossless vs Lossy Basics Huffman Coding Getting Advanced Lossy Explained Limitations Future.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman Code and Data Decomposition Pranav Shah CS157B.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Bahareh Sarrafzadeh 6111 Fall 2009
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
Huffman Codes. Overview  Huffman codes: compressing data (savings of 20% to 90%)  Huffman’s greedy algorithm uses a table of the frequencies of occurrence.
1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
Greedy Algorithms Analysis of Algorithms.
Huffman encoding.
Huffman code and Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 18.
Greedy Algorithms. p2. Activity-selection problem: Problem : Want to schedule as many compatible activities as possible., n activities. Activity i, start.
Data Compression: Huffman Coding in Weiss (p.389)
Design & Analysis of Algorithm Huffman Coding
HUFFMAN CODES.
Compression & Huffman Codes
Greedy Technique.
The Greedy Method and Text Compression
The Greedy Method and Text Compression
Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.
Chapter 9: Huffman Codes
Advanced Algorithms Analysis and Design
Huffman Coding CSE 373 Data Structures.
Data Compression Section 4.8 of [KT].
Greedy: Huffman Codes Yin Tat Lee
Data Structure and Algorithms
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Huffman Coding Greedy Algorithm
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Analysis of Algorithms CS 477/677
Presentation transcript:

Data Compression Gabriel Laden CS146 – Dr. Sin-Min Lee Spring 2004

What is Data Compression? There is lossless and lossy compression, either way, file size is reduced This saves both time and space (premium) Data Compression Algorithms are more successful if they are based on statistical analysis of the frequency of the data and the accuracy needed to represent the data.

Examples in computers jpeg is a compressed image file mp3 is a compressed audio file zip is a compressed archive of files there are lots of encoding algorithms, we will look at Huffman’s Algorithm (see our textbook pp )

What is Greedy Algorithm Solve a problem in stages Make a locally optimum decision Algorithm is good if local optimum is equal is to the global optimum

Examples of Greedy Dijkstra, Prim, Kruskal Bin Packing problem Huffman Code

Problem with Greedy Greedy Algorithm does not always work with the set of data, there can be some conflicts What if all characters are equally distributed? What if characters are very unequally distributed? A problem from our text book: If we had such a thing as a 12cent coin, and we are asked to make 15cents change, Greedy Algorithm would produce : 1(12cent) + 3(penny) = 15 incorrect answer 1(dime) + 1(nickel) = 15 correct answer

David Huffman Paper published in 1952 “A Method for the Construction of Minimum Redundancy Codes” What we call “Data Compression” is what he termed “Minimum Redundancy”

ASCII Code 128 characters includes punctuation log 128 = 7 bits 1 byte = 8 bits All characters are 8 bits long “Fixed-Length Encoding” “Etaoin Shrdlu” most common letters!!!

Intro to Huffman Algorithm Method of construction for an encoding tree Full Binary Tree Representation Each edge of the tree has a value, (0 is the left child, 1 is the right child) Data is at the leaves, not internal nodes Result: encoding tree “Variable-Length Encoding”

Huffman Algorithm (English) 1. Maintain a forest of trees 2. Weight of tree = sum frequency of leaves 3. For 0 to N-1 –Select two smallest weight trees –Form a new tree

Huffman Algorithm (Technical) n  |C| Q  C For i  1 to n – 1 –Do z  AllocateNode() –x  left[z]  ExtractMin(Q) –y  right[z]  ExtractMin(Q) –f[z]  f[x] + f[y] –Insert(Q, z) Return Extract-Min(Q)

Ambiguity in using code? What if you have an encoded string: How do you know where to break it up? Prefix Coding Rule –No code is a prefix of another –The way the tree is built disallows this –If there is a “00” code, there cannot be a “0”

Step0: (q) (w) (e) (r) (t) (y) (u) Step1: ( ) / \ (q) (w) (r) (t) (u) (y) (e) Step2: ( )16 / \ ( ) (q) / \ (w) (r) (t) (u) (y) (e) Step3: ( )16 / \ ( )31 ( ) (q) / \ / \ (w) (t) (r) (u) (y) (e)

Step3: ( )16 / \ ( )31 ( ) (q) / \ / \ (w) (t) (r) (u) (y) (e) Step4 : ( )36 / \ ( ) (w) / \ ( )31 ( ) (q) 25 / \ / \ (t) (r) (u) (y) (e) Step5 : ( )36 / \ ( )56 ( ) (w) / \ / \ (t) ( ) ( ) (q) / \ / \ (r) (u) (y) (e) Step6 : ( )92 / \ ( ) ( ) / \ / \ ( ) (w) (t) ( ) / \ / \ ( ) (q) (r) (u) / \ (y) (e)

Step6 : ( )92 / \ ( ) ( ) / \ / \ ( ) (w) (t) ( ) / \ / \ ( ) (q) (r) (u) / \ (y) (e) When tree is used to encode a file it is written as a header above the body of the encoded bits of text. 0 is left, 1 is right edge use a stack to do this Table: 01 w 0000 y 10 t 0001 e 110 r 001 q 111 u Header: 0000y0001e001q01w10t110r111u

Proof: part 1 Lemma: –Let C be an alphabet in which each character c in C has frequency f[c] –Let x and y be two characters in C having lowest frequencies –There exists an optimal prefix code in C in which the codes for x and y have the same length and differ only in last bit

Proof: part 2 Lemma: –Let T be a full binary tree representing an optimal prefix code over an alphabet C –Let z be the parent of two leaves x and y –Then T” = T – {x,y} represents an optimal prefix code for C” = C – {x,y}U{z}

Lengths of Encoding Set root / \ / \ / \ / \ / \ / \ / \ Length of set is: (8 nodes) * (3 edges) = 24bits This is what you would get if the nodes are mostly random and equal in probability

Lengths of Encoding Set root / \ / \ 8 / \ 7 / \ 6 / \ 5 / \ 4 / \ Length of set is: = 35bits This is what you would get if the nodes vary the most in probability.

Expected Value / character In example 1: 8 * (1/2^3) * 3) = 3 bits In example 2: 2 * (1/2^7 * 7) + (1/2^6 * 6) + (1/2^5 * 5) + (1/2^4 * 4) + (1/2^3 * 3) + (1/2^2 * 2) + (1/2^1 * 1) = 1.98 bits

Main Point Statistical methods work better when the symbols in the data set have varying probabilities. Otherwise you need to use a different method for compression. (Example jpeg)

Image Compression “Lossy” – meaning details are lost An approximation of original image is made where large areas of similar color are combined into a single block This introduces a certain amount of error, which is a tradeoff

Steps to Image Compression Specify requested output file size Divide image into several areas Divide file size by the # of areas Quantize each area (information lost here) Encode each area separately, write to file

Image Decomposition

References Data Structures & Algorithm Analysis - Mark Allen Weiss Introduction to Algorithms – Thomas H. Cormen