Lossless Compression Multimedia Systems (Module 2 Lesson 3)

Slides:



Advertisements
Similar presentations
Data compression. INTRODUCTION If you download many programs and files off the Internet, we have probably encountered.
Advertisements

T.Sharon-A.Frank 1 Multimedia Compression Basics.
15-583:Algorithms in the Real World
Data Compression CS 147 Minh Nguyen.
CSCI 3280 Tutorial 6. Outline  Theory part of LZW  Tree representation of LZW  Table representation of LZW.
Greedy Algorithms (Huffman Coding)
Lecture 10: Dictionary Coding
Processing of large document collections
Algorithms for Data Compression
Lecture 6 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan
Lossless Compression - II Hao Jiang Computer Science Department Sept. 18, 2007.
Algorithm Programming Some Topics in Compression Bar-Ilan University תשס"ח by Moshe Fresko.
Lempel-Ziv Compression Techniques Classification of Lossless Compression techniques Introduction to Lempel-Ziv Encoding: LZ77 & LZ78 LZ78 Encoding Algorithm.
Lempel-Ziv Compression Techniques
CSCI 3 Chapter 1.8 Data Compression. Chapter 1.8 Data Compression  For the purpose of storing or transferring data, it is often helpful to reduce the.
Text Operations: Coding / Compression Methods. Text Compression Motivation –finding ways to represent the text in fewer bits –reducing costs associated.
CS336: Intelligent Information Retrieval
Algorithm Programming Some Topics in Compression
Lempel-Ziv Compression Techniques
1 Lossless Compression Multimedia Systems (Module 2) r Lesson 1: m Minimum Redundancy Coding based on Information Theory: Shannon-Fano Coding Huffman Coding.
8. Compression. 2 Video and Audio Compression Video and Audio files are very large. Unless we develop and maintain very high bandwidth networks (Gigabytes.
Lecture 10 Data Compression.
Chapter 2 Source Coding (part 2)
Text Compression Spring 2007 CSE, POSTECH. 2 2 Data Compression Deals with reducing the size of data – Reduce storage space and hence storage cost Compression.
1 Lossless Compression Multimedia Systems (Module 2 Lesson 2) Summary:  Adaptive Coding  Adaptive Huffman Coding Sibling Property Update Algorithm 
Source Coding-Compression
Page 110/6/2015 CSE 40373/60373: Multimedia Systems So far  Audio (scalar values with time), image (2-D data) and video (2-D with time)  Higher fidelity.
Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression.
Fundamental Structures of Computer Science March 23, 2006 Ananda Guna Lempel-Ziv Compression.
Fundamental Data Structures and Algorithms Aleks Nanevski February 10, 2004 based on a lecture by Peter Lee LZW Compression.
1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from some alphabet. What do we use strings for? 1) Text.
Survey on Improving Dynamic Web Performance Guide:- Dr. G. ShanmungaSundaram (M.Tech, Ph.D), Assistant Professor, Dept of IT, SMVEC. Aswini. S M.Tech CSE.
Multimedia Specification Design and Production 2012 / Semester 1 / L3 Lecturer: Dr. Nikos Gazepidis
Images 01/29/04 Resources: Yale Web Style Guide The GIF Controversy Unisys - lzw.
Multimedia Data Introduction to Lossless Data Compression Dr Sandra I. Woolley Electronic, Electrical.
The LZ family LZ77 LZ78 LZR LZSS LZB LZH – used by zip and unzip
Addressing Image Compression Techniques on current Internet Technologies By: Eduardo J. Moreira & Onyeka Ezenwoye CIS-6931 Term Paper.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Fundamental Data Structures and Algorithms Margaret Reid-Miller 24 February 2005 LZW Compression.
UNIT II TEXT COMPRESSION. a. Outline Compression techniques Run length coding Huffman coding Adaptive Huffman Coding Arithmetic coding Shannon-Fano coding.
1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.
Lecture 7 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan
Lempel-Ziv-Welch Compression
Multi-media Data compression
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
Page 1KUT Graduate Course Data Compression Jun-Ki Min.
Lampel ZIV (LZ) code The Lempel-Ziv algorithm is a variable-to-fixed length code Basically, there are two versions of the algorithm LZ77 and LZ78 are the.
LZW (Lempel-Ziv-welch) compression method The LZW method to compress data is an evolution of the method originally created by Abraham Lempel and Jacob.
CS 1501: Algorithm Implementation
Computer Sciences Department1. 2 Data Compression and techniques.
Submitted To-: Submitted By-: Mrs.Sushma Rani (HOD) Aashish Kr. Goyal (IT-7th) Deepak Soni (IT-8 th )
HUFFMAN CODES.
Data Coding Run Length Coding
Data Compression.
Digital Image Processing Lecture 20: Image Compression May 16, 2005
Increasing Information per Bit
Lempel-Ziv-Welch (LZW) Compression Algorithm
COMP261 Lecture 21 Data Compression.
Applied Algorithmics - week7
Lempel-Ziv Compression Techniques
Information of the LO Subject: Information Theory
Lempel-Ziv-Welch (LZW) Compression Algorithm
Lempel-Ziv-Welch (LZW) Compression Algorithm
Why Compress? To reduce the volume of data to be transmitted (text, fax, images) To reduce the bandwidth required for transmission and to reduce storage.
Data Compression Reduce the size of data.
Lempel-Ziv Compression Techniques
Strings CopyWrite D.Bockus.
Table 3. Decompression process using LZW
CPS 296.3:Algorithms in the Real World
Lempel-Ziv-Welch (LZW) Compression Algorithm
Presentation transcript:

Lossless Compression Multimedia Systems (Module 2 Lesson 3) Summary: Dictionary based Compression Adaptive Mechanism Limpel Ziv Welch (LZW) mechanism Sources: The Data Compression Book, 2nd Ed., Mark Nelson and Jean-Loup Gailly. LZW Compression Article from Dr. Dobbs Journal: Implementing LZW compression using Java, by Laurence Vanhelsuwé We will first look at what constitutes compression with loss and without loss. We will then delve into the Shannon’s Information Theory that forms the basis for the first class of lossless compression mechanisms. As a demonstration of his theory Shannon and Fano independently developed a coding mechanism which we refer to as Shannon-Fano coding. Later Huffman came up with his coding mechanisms which in addition to being an improvement was proven to be optimal in terms of code lengths. The material covered here is derived partly from the Data Compression Book by Mark Nelson and Jean-Loup Gailly, and from Dr. Ze-Nian Li’s course material at Simon Fraser University in Canada.

Dictionary-Based Compression The compression algorithms we studied so far use a statistical model to encode single symbols Compression: Encode symbols into bit strings that use fewer bits. Dictionary-based algorithms do not encode single symbols as variable-length bit strings; they encode variable-length strings of symbols as single tokens The tokens form an index into a phrase dictionary If the tokens are smaller than the phrases they replace, compression occurs. Dictionary-based compression is easier to understand because it uses a strategy that programmers are familiar with-> using indexes into databases to retrieve information from large amounts of storage. Telephone numbers Postal codes

Dictionary-Based Compression: Example Consider the Random House Dictionary of the English Language, Second edition, Unabridged. Using this dictionary, the string: A good example of how dictionary based compression works can be coded as: 1/1 822/3 674/4 1343/60 928/75 550/32 173/46 421/2 Coding: Uses the dictionary as a simple lookup table Each word is coded as x/y, where, x gives the page in the dictionary and y gives the number of the word on that page. The dictionary has 2,200 pages with less than 256 entries per page: Therefore x requires 12 bits and y requires 8 bits, i.e., 20 bits per word (2.5 bytes per word). Using ASCII coding the above string requires 48 bytes, whereas our encoding requires only 20 (<-2.5 * 8) bytes: 50% compression.

Adaptive Dictionary-based Compression Build the dictionary adaptively Necessary when the source data is not plain text, say audio or video data. Is better tailored to the specific source. Original methods due to Ziv and Lempel in 1977 (LZ77) and 1978 (LZ78). Terry Welch improved the scheme in 1984 (called LZW compression). It is used in, UNIX compress, and, GIF. LZ77: A sliding window technique in which the dictionary consists of a set of fixed length phrases found in a window into the previously processed text LZ78: Instead of using fixed-length phrases from a window into the text, it builds phrases up one symbol at a time, adding a new symbol to an existing phrase when a match occurs.

LZW Algorithm Preliminaries: A dictionary that is indexed by “codes” is used. The dictionary is assumed to be initialized with 256 entries (indexed with ASCII codes 0 through 255) representing the ASCII table. The compression algorithm assumes that the output is either a file or a communication channel. The input being a file or buffer. Conversely, the decompression algorithm assumes that the input is a file or a communication channel and the output is a file or a buffer. Compression Decompression file/buffer Compressed file/ Communication channel file/buffer

LZW Algorithm LZW Compression: set w = NIL loop read a character k if wk exists in the dictionary w = wk else output the code for w add wk to the dictionary w = k endloop The program reads one character at a time. If the code is in the dictionary, then it adds the character to the current work string, and waits for the next one. This occurs on the first character as well. If the work string is not in the dictionary, (such as when the second character comes across), it adds the work string to the dictionary and sends over the wire (or writes to a file) the code assigned to the work string without the new character. It then sets the work string to the new character.

Example of LZW: Compression Input String: ^WED^WE^WEE^WEB^WET w k Output Index Symbol NIL ^ W 256 ^W E 257 WE D 258 ED 259 D^ 260 ^WE 261 E^ 262 ^WEE 263 E^W B 264 WEB 265 B^ T 266 ^WET EOF set w = NIL loop read a character k if wk exists in the dictionary w = wk else output the code for w add wk to the dictionary w = k endloop

LZW Algorithm LZW Decompression: read fixed length token k (code or char) output k w = k loop read a fixed length token k entry = dictionary entry for k output entry add w + first char of entry to the dictionary w = entry endloop The nice thing is that the decompressor builds its own dictionary on its side, that matches exactly the compressor's, so that only the codes need to be sent.

Example of LZW Input String (to decode): ^WED<256>E<260><261><257>B<260>T w k Output Index Symbol ^ W 256 ^W E 257 WE D 258 ED <256> 259 D^ 260 ^WE <260> 261 E^ <261> 262 ^WEE <257> 263 E^W B 264 WEB 265 B^ T 266 ^WET read a fixed length token k (code or char) output k w = k loop entry = dictionary entry for k output entry add w + first char of entry to the dictionary w = entry endloop

LZW Algorithm - Discussion Where is the compression? Original String to decode : ^WED^WE^WEE^WEB^WET Decoded String : ^WED<256>E<260><261><257>B<260>T Plain ASCII coding of the string : 19 * 8 bits = 152 bits LZW coding of the string: 12*9 bits = 108 bits (7 symbols and 5 codes, each of 9 bits) Why 9 bits? An ASCII character has a value ranging from 0 to 255 All tokens have fixed length There has to be a distinction in representation between an ASCII character and a Code (assigned to strings of length 2 or more) Codes can only have values 256 and above <- ASCII characters (0 to 255) 9 bits <- Codes (256 to 512) 1 9 bits

LZW Algorithm – Discussion (continued) With 9 bits we can only have a maximum of 256 codes for strings of length 2 or above (with the first 256 entries for ASCII characters) Original LZW uses dictionary with 4K entries, with the length of each symbol/code being 12 bits <- ASCII characters (0 to 255 entries) 12 bits 1 <- Codes (256 to 4096 entries) 1 1 With 12 bits, we can have a maximum of 212 – 256 codes.

Practical implementations of LZW algorithm follow the two approaches: Flush the dictionary periodically – no wasted codes Grow the length of the codes as the algorithm proceeds - First start with a length of 9 bits for the codes. - Once we run out of codes, increase the length to 10 bits. When we run out of codes with 10 bits then we increase the code length to 11 bits and so on. - more efficient. ASCII 1 Codes 256-512 ASCII 1 Codes 256-511 Codes 512-767 Codes 768-1023 ASCII 1 Codes 256-511 Codes 512-767 Codes 768-1023 Codes 1024-1279 Codes 1280-1535 Codes 1536-1791 Codes 1792-2047 Out of codes Out of codes