File Compression Even though disks have gotten bigger, we are still running short on disk space A common technique is to compress files so that they take.

Slides:



Advertisements
Similar presentations
CREATING a HUFFMAN CODE EVERY EGG IS GREEN E ///// V/V/ R // Y/Y/ I/I/ S/S/ N/N/ Sp /// V/V/ Y/Y/ I/I/ S/S/ N/N/ R // Sp /// G /// E /////
Advertisements

Lecture 4 (week 2) Source Coding and Compression
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Greedy Algorithms Amihood Amir Bar-Ilan University.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture3.
Greedy Algorithms (Huffman Coding)
Lecture 10 : Huffman Encoding Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University Lecture notes : courtesy.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
Compression & Huffman Codes
Huffman Encoding 16-Apr-17.
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
A Data Compression Algorithm: Huffman Compression
Lossless Data Compression Using run-length and Huffman Compression pages
Spring 2015 Mathematics in Management Science Binary Linear Codes Two Examples.
Huffman code uses a different number of bits used to encode characters: it uses fewer bits to represent common characters and more bits to represent rare.
Huffman Codes Message consisting of five characters: a, b, c, d,e
Huffman Codes. Encoding messages  Encode a message composed of a string of characters  Codes used by computer systems  ASCII uses 8 bits per character.
Data Compression1 File Compression Huffman Tries ABRACADABRA
Huffman Encoding Veronica Morales.
Lecture Objectives  To learn how to use a Huffman tree to encode characters using fewer bytes than ASCII or Unicode, resulting in smaller files and reduced.
CS-2852 Data Structures LECTURE 13B Andrew J. Wozniewicz Image copyright © 2010 andyjphoto.com.
 The amount of data we deal with is getting larger  Not only do larger files require more disk space, they take longer to transmit  Many times files.
Image Compression (Chapter 8) CSC 446 Lecturer: Nada ALZaben.
Huffman Coding. Huffman codes can be used to compress information –Like WinZip – although WinZip doesn’t use the Huffman algorithm –JPEGs do use Huffman.
File Compression Techniques Alex Robertson. Outline History Lossless vs Lossy Basics Huffman Coding Getting Advanced Lossy Explained Limitations Future.
COMPRESSION. Compression in General: Why Compress? So Many Bits, So Little Time (Space) CD audio rate: 2 * 2 * 8 * = 1,411,200 bps CD audio storage:
Introduction to Algorithms Chapter 16: Greedy Algorithms.
Huffman Code and Data Decomposition Pranav Shah CS157B.
Huffman Codes Juan A. Rodriguez CS 326 5/13/2003.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Huffman’s Algorithm 11/02/ Weighted 2-tree A weighted 2-tree T is an extended binary tree with n external nodes and each of the external nodes is.
1 Algorithms CSCI 235, Fall 2015 Lecture 30 More Greedy Algorithms.
بسم الله الرحمن الرحيم My Project Huffman Code. Introduction Introduction Encoding And Decoding Encoding And Decoding Applications Applications Advantages.
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
Lossless Compression-Statistical Model Lossless Compression One important to note about entropy is that, unlike the thermodynamic measure of entropy,
Data Compression: Huffman Coding in Weiss (p.389)
3.3 Fundamentals of data representation
Design & Analysis of Algorithm Huffman Coding
Huffman Codes ASCII is a fixed length 7 bit code that uses the same number of bits to define each character regardless of how frequently it occurs. Huffman.
HUFFMAN CODES.
COMP261 Lecture 22 Data Compression 2.
Tries 07/28/16 11:04 Text Compression
Assignment 6: Huffman Code Generation
Madivalappagouda Patil
CPSC 231 Organizing Files for Performance (D.H.)
Data Compression.
Huffman Coding Based on slides by Ethan Apter & Marty Stepp
The Greedy Method and Text Compression
The Greedy Method and Text Compression
13 Text Processing Hongfei Yan June 1, 2016.
Data Compression If you’ve ever sent a large file to a friend, you may have compressed it into a zip archive like the one on this slide before doing so.
Chapter 8 – Binary Search Tree
The Huffman Algorithm We use Huffman algorithm to encode a long message as a long bit string - by assigning a bit string code to each symbol of the alphabet.
Huffman Coding.
Huffman Encoding Visualization
Math 221 Huffman Codes.
Advanced Algorithms Analysis and Design
Chapter 11 Data Compression
Huffman Coding CSE 373 Data Structures.
Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths.
Data Structure and Algorithms
Tries 2/23/2019 8:29 AM Tries 2/23/2019 8:29 AM Tries.
Huffman Encoding.
Data Compression.
Podcast Ch23d Title: Huffman Compression
Algorithms CSCI 235, Spring 2019 Lecture 30 More Greedy Algorithms
Huffman Coding Greedy Algorithm
Algorithms CSCI 235, Spring 2019 Lecture 31 Huffman Codes
Presentation transcript:

File Compression Even though disks have gotten bigger, we are still running short on disk space A common technique is to compress files so that they take up less space on the disk We can save space by taking advantage of the fact that most files have a relatively low “information content” 4/7/2019 Compression

Run Length Encoding The simplest type of redundancy in a file is long runs of repeated characters AAAABBBAABBBBBCCCCCCCC This string can be represented more compactly by replacing each repeated string with a single occurrence of the character and a count 4A3B2A5B8C For binary files a refined version of this method can yield dramatic savings 4/7/2019 Compression

Variable Length Encoding Suppose we wish to encode ABRACADABRA Instead of using the standard 8 (or 16) bits to represent these letters, why not use 3? A = 000 000001100000010000011000001100000 B = 001 C = 010 D = 011 R = 100 4/7/2019 Compression

We can do better!! Why use the same number of bits for each letter This is not really a code because it depends on the blanks 011100101001110 4/7/2019 Compression

Consider this Tree B D A C R 4/7/2019 Compression

More Formally Start with a frequency table A 5 B 2 R C 1 D 4/7/2019 Compression

More Formally Create a binary tree out of the two elements with the lowest frequencies New frequency is the sum of the frequencies Add new node to the frequency table 2 C, 1 D, 1 4/7/2019 Compression

More Formally Repeat until only one element is left in the table 11 6 2 4 C, 1 D, 1 B,2 R, 2 4/7/2019 Compression

Huffman Coding The general method for finding this code was developed by D. Huffman in 1952 4/7/2019 Compression