Data Compression Reduce the size of data.

Slides:



Advertisements
Similar presentations
Data Compression CS 147 Minh Nguyen.
Advertisements

Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Lempel-Ziv Compression Techniques Classification of Lossless Compression techniques Introduction to Lempel-Ziv Encoding: LZ77 & LZ78 LZ78 Encoding Algorithm.
A Data Compression Algorithm: Huffman Compression
Lempel-Ziv Compression Techniques
Data Compression Basics & Huffman Coding
Data dan Teknologi Multimedia Sesi 08 Nofriyadi Nurdam.
Lossless Compression Multimedia Systems (Module 2 Lesson 3)
Representation of Strings  Background  Huffman Encoding.
Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)
Lecture 10 Data Compression.
Efficient encoding methods  Coding theory refers to study of code properties and their suitability to specific applications.  Efficient codes are used,
Compression Algorithms
Chapter 2 Source Coding (part 2)
Text Compression Spring 2007 CSE, POSTECH. 2 2 Data Compression Deals with reducing the size of data – Reduce storage space and hence storage cost Compression.
 Refers to sampling the gray/color level in the picture at MXN (M number of rows and N number of columns )array of points.  Once points are sampled,
Page 110/6/2015 CSE 40373/60373: Multimedia Systems So far  Audio (scalar values with time), image (2-D data) and video (2-D with time)  Higher fidelity.
1 Analysis of Algorithms Chapter - 08 Data Compression.
Lecture 29. Data Compression Algorithms 1. Commonly, algorithms are analyzed on the base probability factor such as average case in linear search. Amortized.
Fundamental Structures of Computer Science March 23, 2006 Ananda Guna Lempel-Ziv Compression.
Fundamental Data Structures and Algorithms Aleks Nanevski February 10, 2004 based on a lecture by Peter Lee LZW Compression.
1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from some alphabet. What do we use strings for? 1) Text.
Multimedia Specification Design and Production 2012 / Semester 1 / L3 Lecturer: Dr. Nikos Gazepidis
Image Compression (Chapter 8) CSC 446 Lecturer: Nada ALZaben.
The LZ family LZ77 LZ78 LZR LZSS LZB LZH – used by zip and unzip
Huffman Code and Data Decomposition Pranav Shah CS157B.
Fundamental Data Structures and Algorithms Margaret Reid-Miller 24 February 2005 LZW Compression.
Data Compression Reduce the size of data.  Reduces storage space and hence storage cost. Compression ratio = original data size/compressed data size.
1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.
Multimedia – Data Compression
Chapter 3 Data Representation. 2 Compressing Files.
Comp 335 File Structures Data Compression. Why Study Data Compression? Conserves storage space Files can be transmitted faster because there are less.
Data Compression 황승원 Fall 2010 CSE, POSTECH 2 2 포항공과대학교 황승원 교 수는 데이터구조를 수강하 는 포항공과대학교 재학생 들에게 데이터구조를 잘해 야 전산학을 잘할수 있으니 더욱 열심히 해야한다고 말 했다. 포항공과대학교 A 데이터구조를.
CS 1501: Algorithm Implementation LZW Data Compression.
Characters CS240.
Multi-media Data compression
Images. Audio. Cryptography - Steganography MultiMedia Compression } Movies.
Page 1KUT Graduate Course Data Compression Jun-Ki Min.
CS 1501: Algorithm Implementation
Computer Sciences Department1. 2 Data Compression and techniques.
IS502:M ULTIMEDIA D ESIGN FOR I NFORMATION S YSTEM M ULTIMEDIA OF D ATA C OMPRESSION Presenter Name: Mahmood A.Moneim Supervised By: Prof. Hesham A.Hefny.
Submitted To-: Submitted By-: Mrs.Sushma Rani (HOD) Aashish Kr. Goyal (IT-7th) Deepak Soni (IT-8 th )
Data Compression: Huffman Coding in Weiss (p.389)
Generic Trees—Trie, Compressed Trie, Suffix Trie (with Analysi
CSE 589 Applied Algorithms Spring 1999
Textbook does not really deal with compression.
Compression & Huffman Codes
Data Compression.
Data Compression.
Lempel-Ziv-Welch (LZW) Compression Algorithm
Lesson Objectives Aims You should know about: 1.3.1:
Applied Algorithmics - week7
Lempel-Ziv Compression Techniques
Information of the LO Subject: Information Theory
Lempel-Ziv-Welch (LZW) Compression Algorithm
Lempel-Ziv-Welch (LZW) Compression Algorithm
Data Compression CS 147 Minh Nguyen.
Why Compress? To reduce the volume of data to be transmitted (text, fax, images) To reduce the bandwidth required for transmission and to reduce storage.
Compression, Lossy, Lossless
Lempel-Ziv Compression Techniques
Strings CopyWrite D.Bockus.
UNIT IV.
CSE 589 Applied Algorithms Spring 1999
فشرده سازي داده ها Reduce the size of data.
COMS 161 Introduction to Computing
Do Now! Convert the following sequence of bits into an image using the protocol we discussed (first 8 bits are lengthxwidth, Then fill in the rows pixel.
Data Compression.
Chapter 8 – Compression Aims: Outline the objectives of compression.
Lempel-Ziv-Welch (LZW) Compression Algorithm
Presentation transcript:

Data Compression Reduce the size of data. Reduces storage space and hence storage cost. Compression ratio = original data size/compressed data size Reduces time to retrieve and transmit data. Retrieval time from disk or Internet. Faster to compress and store (or retrieve and decompress) that to store/retrieve uncompressed data. 1 hour to download from Internet and a few seconds to decompress vs 2 hours to download and no decompression.

Lossless And Lossy Compression compressedData = compress(originalData) decompressedData = decompress(compressedData) When originalData = decompressedData, the compression is lossless. When originalData != decompressedData, the compression is lossy. Lossless compression is a class of data compression algorithms that allows the original data to be perfectly reconstructed from the compressed data. By contrast, lossy compression permits reconstruction only of an approximation of the original data, though this usually improves compression rates (and therefore reduces file sizes). [https://en.wikipedia.org/wiki/Lossless_compression]

Lossless And Lossy Compression Lossy compressors generally obtain much higher compression ratios than do lossless compressors. Say 100 vs. 2. Lossless compression is essential in applications such as text file compression, code, executables. Lossy compression is acceptable in many imaging, video and audio (multimedia) applications. In video transmission, a slight loss in the transmitted video is not noticed by the human eye. In some applications, such as medical images, local laws may forbid the use of lossy compression even though you may not be able to visually see the difference between the original and lossy images..

Text Compression Lossless compression is essential. Popular text compressors such as zip and Unix’s compress are based on the LZW (Lempel-Ziv-Welch) method.

LZW Compression Character sequences in the original text are replaced by codes that are dynamically determined. The code table is not encoded into the compressed text, because it may be reconstructed from the compressed text during decompression.

LZW Compression Assume the letters in the text are limited to {a, b}. In practice, the alphabet may be the 256 character ASCII set. The characters in the alphabet are assigned code numbers beginning at 0. The initial code table is: code key a 1 b

LZW Compression Original text = abababbabaabbabbaabba code key a 1 b Original text = abababbabaabbabbaabba Compression is done by scanning the original text from left to right. Find longest prefix p for which there is a code in the code table. Represent p by its code pCode and assign the next available code number to pc, where c is the next character in the text that is to be compressed.

LZW Compression Original text = abababbabaabbabbaabba p = a pCode = 0 key a 1 b 2 ab Original text = abababbabaabbabbaabba p = a pCode = 0 c = b Represent a by 0 and enter ab into the code table. Compressed text = 0

LZW Compression Original text = abababbabaabbabbaabba code key a 1 b 2 ab 3 ba Original text = abababbabaabbabbaabba Compressed text = 0 p = b pCode = 1 c = a Represent b by 1 and enter ba into the code table. Compressed text = 01

LZW Compression Original text = abababbabaabbabbaabba code key a 1 b 2 ab 3 4 aba ba Original text = abababbabaabbabbaabba Compressed text = 01 p = ab pCode = 2 c = a Represent ab by 2 and enter aba into the code table. Compressed text = 012

LZW Compression Original text = abababbabaabbabbaabba code key a 1 b 2 ab 3 4 5 abb ba aba Original text = abababbabaabbabbaabba Compressed text = 012 p = ab pCode = 2 c = b Represent ab by 2 and enter abb into the code table. Compressed text = 0122

LZW Compression Original text = abababbabaabbabbaabba code key a 1 b 2 ab 3 4 5 6 bab ba aba abb Original text = abababbabaabbabbaabba Compressed text = 0122 p = ba pCode = 3 c = b Represent ba by 3 and enter bab into the code table. Compressed text = 01223

LZW Compression Original text = abababbabaabbabbaabba code key a 1 b 2 ab 3 4 5 6 bab 7 baa ba aba abb Original text = abababbabaabbabbaabba Compressed text = 01223 p = ba pCode = 3 c = a Represent ba by 3 and enter baa into the code table. Compressed text = 012233

LZW Compression Original text = abababbabaabbabbaabba code key a 1 b 2 ab 3 4 5 6 bab 7 baa 8 abba ba aba abb Original text = abababbabaabbabbaabba Compressed text = 012233 p = abb pCode = 5 c = a Represent abb by 5 and enter abba into the code table. Compressed text = 0122335

LZW Compression Original text = abababbabaabbabbaabba code key a 1 b 2 ab 3 4 5 6 bab 7 baa 8 abba 9 abbaa ba aba abb Original text = abababbabaabbabbaabba Compressed text = 0122335 p = abba pCode = 8 c = a Represent abba by 8 and enter abbaa into the code table. Compressed text = 01223358

LZW Compression Original text = abababbabaabbabbaabba code key a 1 b 2 ab 3 4 5 6 bab 7 baa 8 abba 9 abbaa ba aba abb Original text = abababbabaabbabbaabba Compressed text = 01223358 p = abba pCode = 8 c = null Represent abba by 8. Compressed text = 012233588

Code Table Representation key a 1 b 2 ab 3 4 5 6 bab 7 baa 8 abba 9 abbaa ba aba abb Dictionary. Pairs are (key, element) = (key,code). Operations are : find(key) and insert(key, code) Limit number of codes to 212. Use a hash table. Convert variable length keys into fixed length keys. Each key has the form pc, where the string p is a key that is already in the table. Replace pc with (pCode)c.

Code Table Representation key a 1 b 2 ab 3 ba 4 aba 5 abb 6 bab 7 baa 8 abba 9 abbaa 0b 1a 2a 2b 3b 3a 5a 8a

LZW Decompression Original text = abababbabaabbabbaabba code key a 1 b Original text = abababbabaabbabbaabba Compressed text = 012233588 Convert codes to text from left to right. 0 represents a. Decompressed text = a pCode = 0 and p = a. p = a followed by next text character (c) is entered into the code table.

LZW Decompression Original text = abababbabaabbabbaabba code key a 1 b 2 ab Original text = abababbabaabbabbaabba Compressed text = 012233588 1 represents b. Decompressed text = ab pCode = 1 and p = b. lastP = a followed by first character of p is entered into the code table.

LZW Decompression Original text = abababbabaabbabbaabba code key a 1 b 2 ab 3 ba Original text = abababbabaabbabbaabba Compressed text = 012233588 2 represents ab. Decompressed text = abab pCode = 2 and p = ab. lastP = b followed by first character of p is entered into the code table.

LZW Decompression Original text = abababbabaabbabbaabba code key a 1 b 2 ab 3 ba 4 aba Original text = abababbabaabbabbaabba Compressed text = 012233588 2 represents ab Decompressed text = ababab. pCode = 2 and p = ab. lastP = ab followed by first character of p is entered into the code table.

LZW Decompression Original text = abababbabaabbabbaabba code key a 1 b 2 ab 3 ba 4 aba 5 abb Original text = abababbabaabbabbaabba Compressed text = 012233588 3 represents ba Decompressed text = abababba. pCode = 3 and p = ba. lastP = ab followed by first character of p is entered into the code table.

LZW Decompression Original text = abababbabaabbabbaabba code key a 1 b 2 ab 3 ba 4 aba 5 abb 6 bab Original text = abababbabaabbabbaabba Compressed text = 012233588 3 represents ba Decompressed text = abababbaba. pCode = 3 and p = ba. lastP = ba followed by first character of p is entered into the code table.

LZW Decompression Original text = abababbabaabbabbaabba code key a 1 b 2 ab 3 ba 4 aba 5 abb 6 bab 7 baa Original text = abababbabaabbabbaabba Compressed text = 012233588 5 represents abb Decompressed text = abababbabaabb. pCode = 5 and p = abb. lastP = ba followed by first character of p is entered into the code table.

LZW Decompression Original text = abababbabaabbabbaabba code key a 1 b 2 ab 3 ba 4 aba 5 abb 6 bab 7 baa 8 abba Original text = abababbabaabbabbaabba Compressed text = 012233588 8 represents ??? When a code is not in the table, its key is lastP followed by first character of lastP. lastP = abb So 8 represents abba.

LZW Decompression Original text = abababbabaabbabbaabba code key a 1 b 2 ab 3 ba 4 aba 5 abb 6 bab 7 baa 8 9 abbaa abba Original text = abababbabaabbabbaabba Compressed text = 012233588 8 represents abba Decompressed text = abababbabaabbabbaabba. pCode = 8 and p = abba. lastP = abba followed by first character of p is entered into the code table.

Code Table Representation key a 1 b 2 ab 3 4 5 6 bab 7 baa 8 abba 9 abbaa ba aba abb Dictionary. Pairs are (key, element) = (code, what the code represents) = (code, codeKey). Operations are : find(key) and insert(key, code) Keys are integers 0, 1, 2, … Use a 1D array codeTable. codeTable[code] = codeKey. Each code key has the form pc, where the string p is a code key that is already in the table. Replace pc with (pCode)c.

Time Complexity Compression. Decompression. O(n) expected time, where n is the length of the text that is being compressed. Decompression. O(n) time, where n is the length of the decompressed text. Compression takes O(n) expected time, because the expected complexity of the hash table operations in O(1). Also, since the keys are of the form <code>c, the keys are of finite length and computing the home bucket takes O(1) time.