Text Compression Spring 2007 CSE, POSTECH. 2 2 Data Compression Deals with reducing the size of data – Reduce storage space and hence storage cost Compression.

Slides:



Advertisements
Similar presentations
Data Compression CS 147 Minh Nguyen.
Advertisements

Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Source Coding Data Compression A.J. Han Vinck. DATA COMPRESSION NO LOSS of information and exact reproduction (low compression ratio 1:4) general problem.
CSCI 3280 Tutorial 6. Outline  Theory part of LZW  Tree representation of LZW  Table representation of LZW.
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Greedy Algorithms (Huffman Coding)
Lecture 6 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan
Lempel-Ziv Compression Techniques Classification of Lossless Compression techniques Introduction to Lempel-Ziv Encoding: LZ77 & LZ78 LZ78 Encoding Algorithm.
Lempel-Ziv Compression Techniques
CSCI 3 Chapter 1.8 Data Compression. Chapter 1.8 Data Compression  For the purpose of storing or transferring data, it is often helpful to reduce the.
A Data Compression Algorithm: Huffman Compression
Lempel-Ziv Compression Techniques
Klara Nahrstedt Spring 2014
Data Compression Basics & Huffman Coding
Data dan Teknologi Multimedia Sesi 08 Nofriyadi Nurdam.
Lossless Compression Multimedia Systems (Module 2 Lesson 3)
Yehong, Wang Wei, Wang Sheng, Jinyang, Gordon. Outline Introduction Overview of Huffman Coding Arithmetic Coding Encoding and Decoding Probabilistic Model.
Spring 2015 Mathematics in Management Science Binary Linear Codes Two Examples.
Lecture 10 Data Compression.
Efficient encoding methods  Coding theory refers to study of code properties and their suitability to specific applications.  Efficient codes are used,
Compression Algorithms
Chapter 2 Source Coding (part 2)
Noiseless Coding. Introduction Noiseless Coding Compression without distortion Basic Concept Symbols with lower probabilities are represented by the binary.
Page 110/6/2015 CSE 40373/60373: Multimedia Systems So far  Audio (scalar values with time), image (2-D data) and video (2-D with time)  Higher fidelity.
1 Analysis of Algorithms Chapter - 08 Data Compression.
Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression.
Lecture 29. Data Compression Algorithms 1. Commonly, algorithms are analyzed on the base probability factor such as average case in linear search. Amortized.
Fundamental Structures of Computer Science March 23, 2006 Ananda Guna Lempel-Ziv Compression.
Fundamental Data Structures and Algorithms Aleks Nanevski February 10, 2004 based on a lecture by Peter Lee LZW Compression.
1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from some alphabet. What do we use strings for? 1) Text.
 The amount of data we deal with is getting larger  Not only do larger files require more disk space, they take longer to transmit  Many times files.
Multimedia Specification Design and Production 2012 / Semester 1 / L3 Lecturer: Dr. Nikos Gazepidis
Image Compression (Chapter 8) CSC 446 Lecturer: Nada ALZaben.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
The LZ family LZ77 LZ78 LZR LZSS LZB LZH – used by zip and unzip
Addressing Image Compression Techniques on current Internet Technologies By: Eduardo J. Moreira & Onyeka Ezenwoye CIS-6931 Term Paper.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Msccomputerscience.com. Compression is a key aspect of multimedia It is a technique used in almost all multimedia application Compression is done on source.
Huffman Code and Data Decomposition Pranav Shah CS157B.
Fundamental Data Structures and Algorithms Margaret Reid-Miller 24 February 2005 LZW Compression.
Data Compression Reduce the size of data.  Reduces storage space and hence storage cost. Compression ratio = original data size/compressed data size.
CS Spring 2011 CS 414 – Multimedia Systems Design Lecture 6 – Basics of Compression (Part 1) Klara Nahrstedt Spring 2011.
Lecture 4: Lossless Compression(1) Hongli Luo Fall 2011.
1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.
Multimedia – Data Compression
Lecture 7 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan
Comp 335 File Structures Data Compression. Why Study Data Compression? Conserves storage space Files can be transmitted faster because there are less.
Data Compression 황승원 Fall 2010 CSE, POSTECH 2 2 포항공과대학교 황승원 교 수는 데이터구조를 수강하 는 포항공과대학교 재학생 들에게 데이터구조를 잘해 야 전산학을 잘할수 있으니 더욱 열심히 해야한다고 말 했다. 포항공과대학교 A 데이터구조를.
CS 1501: Algorithm Implementation LZW Data Compression.
Multi-media Data compression
Page 1KUT Graduate Course Data Compression Jun-Ki Min.
LZW (Lempel-Ziv-welch) compression method The LZW method to compress data is an evolution of the method originally created by Abraham Lempel and Jacob.
CS 1501: Algorithm Implementation
Computer Sciences Department1. 2 Data Compression and techniques.
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
IS502:M ULTIMEDIA D ESIGN FOR I NFORMATION S YSTEM M ULTIMEDIA OF D ATA C OMPRESSION Presenter Name: Mahmood A.Moneim Supervised By: Prof. Hesham A.Hefny.
Submitted To-: Submitted By-: Mrs.Sushma Rani (HOD) Aashish Kr. Goyal (IT-7th) Deepak Soni (IT-8 th )
Data Compression: Huffman Coding in Weiss (p.389)
CSE 589 Applied Algorithms Spring 1999
Textbook does not really deal with compression.
Data Compression.
Lesson Objectives Aims You should know about: 1.3.1:
Applied Algorithmics - week7
Why Compress? To reduce the volume of data to be transmitted (text, fax, images) To reduce the bandwidth required for transmission and to reduce storage.
Data Compression Reduce the size of data.
Compression, Lossy, Lossless
Strings CopyWrite D.Bockus.
UNIT IV.
CSE 589 Applied Algorithms Spring 1999
فشرده سازي داده ها Reduce the size of data.
Presentation transcript:

Text Compression Spring 2007 CSE, POSTECH

2 2 Data Compression Deals with reducing the size of data – Reduce storage space and hence storage cost Compression ratio = compressed data size / original data size – Reduce time to retrieve and transmit data File coding is done by a compressor and decoding by a decompressor

3 3 Lossless and Lossy Compression compressedData = compress(originalData) decompressedData = decompress(compressedData) When originalData = decompressedData, the compression is lossless. When originalData != decompressedData, the compression is lossy.

4 4 Lossless and Lossy Compression Lossless compression is essential in applications such as text file compression. – e.g., ZIP Lossy compressors generally obtain much higher compression ratios than do lossless compressors. – e.g., JPG, MPEG Lossy compression is acceptable in many imaging applications. – In video transmissions, a slight loss in the transmitted video is not noticed by the human eye.

5 5 Text Compression Lossless compression is essential in text compression Popular text compressors such as zip and compress are based on the LZW (Lempel-Ziv- Welch) method – The method is simple and employs hashing for storing the code table

6 6 LZW Compression Character strings in the original text are replaced by codes that are mapped dynamically The mapping between character strings and their codes is stored in a dictionary Each dictionary entry has two fields: key and code Code table is not encoded in the compressed data  because it may be used to reconstruct the compressed text during decompression

7 7 LZW Compression Algorithm Scan the text from left to right Find the longest prefix p for which there is a code in the code table Represent p by its code pCode Assign the next available code number to pc, where c is the next character in the text that is to be compressed See Programs 7.16, 7.17, 7.18, 7.19

8 8 LZW Compression Example Compress abababbabaabbabbaabba Assume the letters in the text are limited to {a,b}. In practice, the alphabet may be 256 character ASCII set. The characters in the alphabet are assigned code numbers beginning at 0. The initial code table is:

9 9 LZW Compression Example Original text = abababbabaabbabbaabba p = a pCode = 0 c = b Represent “a” by 0 and enter “ab” into code table Compressed text = 0

10 LZW Compression Original text = abababbabaabbabbaabba Compressed text = 0 p = b pCode = 1 c = a Represent “b” by 1 and enter “ba” into code table Compressed text = 01

11 LZW Compression Original text = abababbabaabbabbaabba Compressed text = 01 p = ab pCode = 2 c = a Represent “ab” by 2 and enter “aba” into code table. Compressed text = 012

12 LZW Compression Original text = abababbabaabbabbaabba Compressed text = 012 p = ab pCode = 2 c = b Represent “ab” by 2 and enter “abb” into code table. Compressed text = 0122

13 LZW Compression Original text = abababbabaabbabbaabba Compressed text = 0122 p = ba pCode = 3 c = b Represent “ba” by 3 and enter “bab” into code table. Compressed text = 01223

14 LZW Compression Original text = abababbabaabbabbaabba Compressed text = p = ba pCode = 3 c = a Represent “ba” by 3 and enter “baa” into code table. Compressed text =

15 LZW Compression Original text = abababbabaabbabbaabba Compressed text = p = abb pCode = 5 c = a Represent “abb” by 3 and enter “abba” into code table. Compressed text =

16 LZW Compression Original text = abababbabaabbabbaabba Compressed text = p = abba pCode = 8 c = a Represent “abba” by 8 and enter “abbaa” into code table Compressed text =

17 LZW Compression Original text = abababbabaabbabbaabba Compressed text = p = abba pCode = 8 c = null Represent “abba” by 8 Compressed text =

18 Code Table Representation Dictionary – Pairs are (key, element) = (key, code). – Operations are: get(key) and put(key, code). Use a hash table – But, key has a variable size – Takes time to generate a hash key and compare the actual key Can we have fixed length keys? If so, how?

19 Code Table Representation Use a hash table – Convert variable length keys into fixed length keys – Each key has the form pc, where the string p is a key that is already in the table – Replace the key pc with (pCode)c

20 LZW Decompression Compressed text = Convert codes to text from left to right 0 represents a Decompressed text = a pCode = 0 and p = a p = a followed by next text character (c) is entered into the code table

21 LZW Decompression Compressed text = represents b Decompressed text = ab pCode = 1 and p = b lastP = a followed by first character of p is entered into the code table.

22 LZW Decompression Compressed text = represents ab Decompressed text = abab pCode = 2 and p = ab lastP = b followed by first character of p is entered into the code table.

23 LZW Decompression Compressed text = represents ab Decompressed text = ababab pCode = 2 and p = ab lastP = ab followed by first character of p is entered into the code table.

24 LZW Decompression Compressed text = represents ba Decompressed text = abababba pCode = 3 and p = ba lastP = ab followed by first character of p is entered into the code table.

25 LZW Decompression Compressed text = represents ba Decompressed text = abababbaba pCode = 3 and p = ba lastP = ba followed by first character of p is entered into the code table.

26 LZW Decompression Compressed text = represents abb Decompressed text = abababbabaabb pCode = 5 and p = abb lastP = ba followed by first character of p is entered into the code table.

27 LZW Decompression Compressed text = represents ???. When a code is not in the table, its key is lastP followed by first character of lastP. lastP = abb. So 8 represents abba.

28 LZW Decompression Compressed text = represents abba. Decompressed text = abababbabaabbabbaabba pCode = 8 and p = abba lastP = abba followed by first character of p is entered into the code table

29 Code Table Representation Dictionary – pairs are (key,element) = (code, what the code represents) = (code, codeKey) – Operations are: get(key) and put(key,code) Keys are integers 0,1,2,… Use a 1D array codeTable. – codeTable[code] = codeKey – Each code key has the form pc, where the string p is a code key that is already in the table. – Replace pc with (pCode)c.

30 Time Complexity Compression – O(n) expected time, where n is the length of the text that is being compressed. Decompression – O(n) time, where n is the length of decompressed text.

31 READING See Programs 7.20, 7.21, 7.22, 7.23, 7.24 Read Section 7.5 Useful site -