Bits and Huffman Encoding Please get a piece of paper and a pen and put your name and netid on it. Make sure you can turn in it after class without losing.

Slides:



Advertisements
Similar presentations
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Advertisements

Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Binary Trees CSC 220. Your Observations (so far data structures) Array –Unordered Add, delete, search –Ordered Linked List –??
Greedy Algorithms Amihood Amir Bar-Ilan University.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture3.
Data Structures: A Pseudocode Approach with C 1 Chapter 6 Objectives Upon completion you will be able to: Understand and use basic tree terminology and.
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Greedy Algorithms (Huffman Coding)
Lecture 10 : Huffman Encoding Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University Lecture notes : courtesy.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Text Compression 1 Assigning 16 bits to each character in a document uses too much file space We need ways to store and transmit text efficiently Text.
Huffman Encoding 16-Apr-17.
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Text Operations: Coding / Compression Methods. Text Compression Motivation –finding ways to represent the text in fewer bits –reducing costs associated.
CS 206 Introduction to Computer Science II 04 / 29 / 2009 Instructor: Michael Eckmann.
CS 206 Introduction to Computer Science II 12 / 10 / 2008 Instructor: Michael Eckmann.
CSE 143 Lecture 18 Huffman slides created by Ethan Apter
Q&A II – Sunday Feb 13 th 2011 BITS. Signed binary  What are the following numbers in signed binary?     
Lossless Data Compression Using run-length and Huffman Compression pages
Squishin’ Stuff Huffman Compression. Data Compression Begin with a computer file (text, picture, movie, sound, executable, etc) Most file contain extra.
 Wisegeek.com defines Data Compression as:  “Data compression is a general term for a group of technologies that encode large files in order to shrink.
CSE Lectures 22 – Huffman codes
Data Structures Arrays both single and multiple dimensions Stacks Queues Trees Linked Lists.
MA/CSSE 473 Day 31 Student questions Data Compression Minimal Spanning Tree Intro.
Huffman Codes. Encoding messages  Encode a message composed of a string of characters  Codes used by computer systems  ASCII uses 8 bits per character.
Data Compression1 File Compression Huffman Tries ABRACADABRA
Huffman Encoding Veronica Morales.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
Data Structures Week 6: Assignment #2 Problem
 The amount of data we deal with is getting larger  Not only do larger files require more disk space, they take longer to transmit  Many times files.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
ICS 220 – Data Structures and Algorithms Lecture 11 Dr. Ken Cosh.
ALGORITHMS FOR ISNE DR. KENNETH COSH WEEK 13.
© Jalal Kawash 2010 Trees & Information Coding Peeking into Computer Science.
Huffman Coding. Huffman codes can be used to compress information –Like WinZip – although WinZip doesn’t use the Huffman algorithm –JPEGs do use Huffman.
File Compression Techniques Alex Robertson. Outline History Lossless vs Lossy Basics Huffman Coding Getting Advanced Lossy Explained Limitations Future.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Huffman Encodings Section 9.4. Data Compression: Array Representation Σ denotes an alphabet used for all strings Each element in Σ is called a character.
CPS 100, Spring Huffman Coding l D.A Huffman in early 1950’s l Before compressing data, analyze the input stream l Represent data using variable.
Main Index Contents 11 Main Index Contents Complete Binary Tree Example Complete Binary Tree Example Maximum and Minimum Heaps Example Maximum and Minimum.
Chapter 3 Data Representation. 2 Compressing Files.
1 Algorithms CSCI 235, Fall 2015 Lecture 30 More Greedy Algorithms.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
CSE 143 Lecture 22 Huffman slides created by Ethan Apter
Huffman code and Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
Submitted To-: Submitted By-: Mrs.Sushma Rani (HOD) Aashish Kr. Goyal (IT-7th) Deepak Soni (IT-8 th )
Learning to use a ‘For Loop’ and a ‘Variable’. Learning Objective To use a ‘For’ loop to build shapes within your program Use a variable to detect input.
3.3 Fundamentals of data representation
Design & Analysis of Algorithm Huffman Coding
Huffman Codes ASCII is a fixed length 7 bit code that uses the same number of bits to define each character regardless of how frequently it occurs. Huffman.
HUFFMAN CODES.
B/B+ Trees 4.7.
Tries 07/28/16 11:04 Text Compression
Assignment 6: Huffman Code Generation
3.3 Fundamentals of data representation
Algorithms for iSNE Dr. Kenneth Cosh Week 13.
ISNE101 – Introduction to Information Systems and Network Engineering
Huffman Coding Based on slides by Ethan Apter & Marty Stepp
Data Encoding Characters.
Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.
The Huffman Algorithm We use Huffman algorithm to encode a long message as a long bit string - by assigning a bit string code to each symbol of the alphabet.
Advanced Algorithms Analysis and Design
Huffman Coding CSE 373 Data Structures.
Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths.
File Compression Even though disks have gotten bigger, we are still running short on disk space A common technique is to compress files so that they take.
Huffman Encoding.
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Huffman Coding Greedy Algorithm
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Presentation transcript:

Bits and Huffman Encoding Please get a piece of paper and a pen and put your name and netid on it. Make sure you can turn in it after class without losing your notes.

How is data stored in a computer? Several different ways are possible In an ASCII-encoded text file (hithero known as a “text file”), each character consists of 8- bit chunks What’s a bit?

Bits Mean What You Want Them To Mean MI K E But why 8-bit chunks? Couldn’t we interpret the same string as 4 bit chunks? With 8 bit chucks there are 2 8 (256) possible characters. With 4 bit chunks there are 2 4 (16) possible characters.

What if we want to save memory? Could just use fewer bits per chunk, but that limits the number of characters we can represent But my document contains a lot more ‘a’s than ‘%’s. What if the code for a was “01” and the code for % was “ ”

A Problem Mike decides he wants to compress his vast collection of Harry Potter fanfic (all stored a textfiles). He notices that the character ‘H’ occurs much more frequently in the textfiles than does the character ‘%’, but they both use 8 bits. So he changes the encoding so that ‘H’ is represented by ‘01’ (2 bits) and ‘%’ is represented by ‘ ’ (16 bits). Assuming all other characters stay at 8 bits (a bit unlikely…but just pretend)…how much more frequent do ‘H’s need to be than ‘%’s for there to be a space savings? 1.There must be more than 3Hs for every %s 2.There must be more than 3Hs for every 2%s 3.There must be more than 4Hs for every 3%s 4.As long as there more Hs than %s you get a savings

We can get savings if we know what characters occur frequently So imagine you look at some set of data with character frequencies. How much can you save?

Consider the string: AACCCAAABAADAE LetterFrequency A8 C3 B1 D1 E1 What’s the best encoding?

Why will this encoding not work? LetterFrequencyEncoding A80 C31 B110 D101 E111 What is the encoding for AAE? What is the encoding for ADC?

Consider the string: AACCCAAABAADAE The rule: one character’s encoding cannot be the prefix of another. So if A=01, no other encoding can begin with 01 LetterFrequencyEncoding A8? C3? B1? D1? E1? Work with your row to find the most efficient possible encoding for each character. I was able to encode the entire string in 25 bits. See if you can do it that efficiently – but be careful with prefixes. Write your encoding, plus your computation of its total length, on your handin sheet.

My Encoding AACCCAAABAADAE The rule: one character’s encoding cannot be the prefix of another. So if A=01, no other encoding can begin with 01 LetterFrequencyEncoding A80 C311 B1100 D11010 E11011 There a several possible encodings that can give you 25. I’m pretty sure there’s no more efficient encoding, at least as far as compressing individual characters.

Given a set of frequencies, how should you select the character encoding for maximum efficiency? Huffman Encoding

The basic idea We will make a “huffman tree”. By repeatedly combining nodes with small frequencies into “meta nodes” with larger frequency A letter’s position in the tree will tell us what its encoded form is

Generate a huffman tree and encodings for this circumstance Follow the tutorial linked off the resources section in Sakai Letter2FrequencyEncoding A12? C9? B8? D8? E5? F5? Write the huffman tree and the resultant encoding on your handin sheet.

The Header The Magic Number Counts Or a tree for extra credit

The Psuedo-EOF character A made up character to write to your file

Please turn your sheet in at the back of the room (try for a neat pile)