Greedy Algorithms (Huffman Coding)

Slides:



Advertisements
Similar presentations
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Advertisements

Lecture 4 (week 2) Source Coding and Compression
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Binary Trees CSC 220. Your Observations (so far data structures) Array –Unordered Add, delete, search –Ordered Linked List –??
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Compression & Huffman Codes
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Huffman Coding: An Application of Binary Trees and Priority Queues
CPSC 411, Fall 2008: Set 4 1 CPSC 411 Design and Analysis of Algorithms Set 4: Greedy Algorithms Prof. Jennifer Welch Fall 2008.
A Data Compression Algorithm: Huffman Compression
Lecture 6: Greedy Algorithms I Shang-Hua Teng. Optimization Problems A problem that may have many feasible solutions. Each solution has a value In maximization.
Is ASCII the only way? For computers to do anything (besides sit on a desk and collect dust) they need two things: 1. PROGRAMS 2. DATA A program is a.
Compression & Huffman Codes Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Data Structures – LECTURE 10 Huffman coding
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Lossless Data Compression Using run-length and Huffman Compression pages
CPSC 411, Fall 2008: Set 4 1 CPSC 411 Design and Analysis of Algorithms Set 4: Greedy Algorithms Prof. Jennifer Welch Fall 2008.
Data Compression Basics & Huffman Coding
Huffman Codes Message consisting of five characters: a, b, c, d,e
CSE Lectures 22 – Huffman codes
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
Huffman Codes. Encoding messages  Encode a message composed of a string of characters  Codes used by computer systems  ASCII uses 8 bits per character.
Data Compression1 File Compression Huffman Tries ABRACADABRA
Huffman Encoding Veronica Morales.
1 Analysis of Algorithms Chapter - 08 Data Compression.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
CS-2852 Data Structures LECTURE 13B Andrew J. Wozniewicz Image copyright © 2010 andyjphoto.com.
 The amount of data we deal with is getting larger  Not only do larger files require more disk space, they take longer to transmit  Many times files.
© Jalal Kawash 2010 Trees & Information Coding: 3 Peeking into Computer Science.
ICS 220 – Data Structures and Algorithms Lecture 11 Dr. Ken Cosh.
ALGORITHMS FOR ISNE DR. KENNETH COSH WEEK 13.
© Jalal Kawash 2010 Trees & Information Coding Peeking into Computer Science.
Huffman Coding. Huffman codes can be used to compress information –Like WinZip – although WinZip doesn’t use the Huffman algorithm –JPEGs do use Huffman.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
COMPRESSION. Compression in General: Why Compress? So Many Bits, So Little Time (Space) CD audio rate: 2 * 2 * 8 * = 1,411,200 bps CD audio storage:
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman Code and Data Decomposition Pranav Shah CS157B.
Priority Queues, Trees, and Huffman Encoding CS 244 This presentation requires Audio Enabled Brent M. Dingle, Ph.D. Game Design and Development Program.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
CPS 100, Spring Huffman Coding l D.A Huffman in early 1950’s l Before compressing data, analyze the input stream l Represent data using variable.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Main Index Contents 11 Main Index Contents Complete Binary Tree Example Complete Binary Tree Example Maximum and Minimum Heaps Example Maximum and Minimum.
Chapter 3 Data Representation. 2 Compressing Files.
1 Algorithms CSCI 235, Fall 2015 Lecture 30 More Greedy Algorithms.
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
Greedy Algorithms Analysis of Algorithms.
CSE 143 Lecture 22 Huffman slides created by Ethan Apter
Huffman code and Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
Submitted To-: Submitted By-: Mrs.Sushma Rani (HOD) Aashish Kr. Goyal (IT-7th) Deepak Soni (IT-8 th )
Design & Analysis of Algorithm Huffman Coding
Huffman Codes ASCII is a fixed length 7 bit code that uses the same number of bits to define each character regardless of how frequently it occurs. Huffman.
HUFFMAN CODES.
COMP261 Lecture 22 Data Compression 2.
Assignment 6: Huffman Code Generation
Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.
Advanced Algorithms Analysis and Design
Huffman Coding CSE 373 Data Structures.
Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths.
Trees Addenda.
Data Structure and Algorithms
File Compression Even though disks have gotten bigger, we are still running short on disk space A common technique is to compress files so that they take.
Podcast Ch23d Title: Huffman Compression
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Huffman Coding Greedy Algorithm
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Presentation transcript:

Greedy Algorithms (Huffman Coding)

Huffman Coding A technique to compress data effectively Compressed file Huffman coding Original file A technique to compress data effectively Usually between 20%-90% compression Lossless compression No information is lost When decompress, you get the original file

Huffman Coding: Applications Compressed file Original file Saving space Store compressed files instead of original files Transmitting files or data Send compressed data to save transmission time and power Encryption and decryption Cannot read the compressed file without knowing the “key”

Main Idea: Frequency-Based Encoding Assume in this file only 6 characters appear E, A, C, T, K, N The frequencies are: Character Frequency E 10,000 A 4,000 C 300 T 200 K 100 N Option I (No Compression) Each character = 1 Byte (8 bits) Total file size = 14,700 * 8 = 117,600 bits Option 2 (Fixed size compression) We have 6 characters, so we need 3 bits to encode them Total file size = 14,700 * 3 = 44,100 bits Character Fixed Encoding E 000 A 001 C 010 T 100 K 110 N 111

Main Idea: Frequency-Based Encoding (Cont’d) Assume in this file only 6 characters appear E, A, C, T, K, N The frequencies are: Character Frequency E 10,000 A 4,000 C 300 T 200 K 100 N Option 3 (Huffman compression) Variable-length compression Assign shorter codes to more frequent characters and longer codes to less frequent characters Total file size: Char. HuffmanEncoding E A 10 C 110 T 1110 K 11110 N 11111 (10,000 x 1) + (4,000 x 2) + (300 x 3) + (200 x 4) + (100 x 5) + (100 x 5) = 20,700 bits

Huffman Coding A variable-length coding for characters More frequent characters  shorter codes Less frequent characters  longer codes It is not like ASCII coding where all characters have the same coding length (8 bits) Two main questions How to assign codes (Encoding process)? How to decode (from the compressed file, generate the original file) (Decoding process)?

Decoding for fixed-length codes is much easier 010001100110111000 Character Fixed-length Encoding E 000 A 001 C 010 T 100 K 110 N 111 Divide into 3’s 010 001 100 110 111 000 Decode C A T K N E

Decoding for variable-length codes is not that easy… 000001 Character Variable-length Encoding E A 00 C 001 … It means what??? EEEC EAC AEC Huffman encoding guarantees to avoid this uncertainty …Always have a single decoding

Huffman Algorithm Step 1: Get Frequencies Scan the file to be compressed and count the occurrence of each character Sort the characters based on their frequency Step 2: Build Tree & Assign Codes Build a Huffman-code tree (binary tree) Traverse the tree to assign codes Step 3: Encode (Compress) Scan the file again and replace each character by its code Step 4: Decode (Decompress) Huffman tree is the key to decompress the file

Eerie eyes seen near lake. Step 1: Get Frequencies Input File: Eerie eyes seen near lake.

Step 2: Build Huffman Tree & Assign Codes It is a binary tree in which each character is a leaf node Initially each node is a separate root At each step Select two roots with smallest frequency and connect them to a new parent (Break ties arbitrary) [The greedy choice] The parent will get the sum of frequencies of the two child nodes Repeat until you have one root

Example Each char. has a leaf node with its frequency

Find the smallest two frequencies…Replace them with their parent

Find the smallest two frequencies…Replace them with their parent

Find the smallest two frequencies…Replace them with their parent

Find the smallest two frequencies…Replace them with their parent

Find the smallest two frequencies…Replace them with their parent

Find the smallest two frequencies…Replace them with their parent

Find the smallest two frequencies…Replace them with their parent

Find the smallest two frequencies…Replace them with their parent

Find the smallest two frequencies…Replace them with their parent

Find the smallest two frequencies…Replace them with their parent

Find the smallest two frequencies…Replace them with their parent

Now we have a single root…This is the Huffman Tree

Lets Analyze Huffman Tree All characters are at the leaf nodes The number at the root = # of characters in the file High-frequency chars (E.g., “e”) are near the root Low-frequency chars are far from the root

Lets Assign Codes Traverse the tree Any left edge  add label 0 As right edge  add label 1 The code for each character is its root-to-leaf label sequence

Lets Assign Codes Traverse the tree Any left edge  add label 0 1 1 1 1 1 1 1 1 1 1 1 Traverse the tree Any left edge  add label 0 As right edge  add label 1 The code for each character is its root-to-leaf label sequence

Lets Assign Codes Coding Table Traverse the tree Any left edge  add label 0 As right edge  add label 1 The code for each character is its root-to-leaf label sequence

Huffman Algorithm Step 1: Get Frequencies Scan the file to be compressed and count the occurrence of each character Sort the characters based on their frequency Step 2: Build Tree & Assign Codes Build a Huffman-code tree (binary tree) Traverse the tree to assign codes Step 3: Encode (Compress) Scan the file again and replace each character by its code Step 4: Decode (Decompress) Huffman tree is the key to decompess the file

Step 3: Encode (Compress) The File Input File: Eerie eyes seen near lake. Coding Table + Generate the encoded file 0000 10 1100 0001 10 …. Notice that no code is prefix to any other code  Ensures the decoding will be unique (Unlike Slide 8)

Step 4: Decode (Decompress) Must have the encoded file + the coding tree Scan the encoded file For each 0  move left in the tree For each 1  move right Until reach a leaf node  Emit that character and go back to the root 0000 10 1100 0001 …. + Eerie … Generate the original file

Huffman Algorithm Step 1: Get Frequencies Scan the file to be compressed and count the occurrence of each character Sort the characters based on their frequency Step 2: Build Tree & Assign Codes Build a Huffman-code tree (binary tree) Traverse the tree to assign codes Step 3: Encode (Compress) Scan the file again and replace each character by its code Step 4: Decode (Decompress) Huffman tree is the key to decompess the file