CS 1501: Algorithm Implementation

Slides:



Advertisements
Similar presentations
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Advertisements

CSCI 3280 Tutorial 6. Outline  Theory part of LZW  Tree representation of LZW  Table representation of LZW.
Huffman Encoding Dr. Bernard Chen Ph.D. University of Central Arkansas.
Greedy Algorithms (Huffman Coding)
Lempel-Ziv-Welch (LZW) Compression Algorithm
Lecture 6 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan
Lossless Compression - II Hao Jiang Computer Science Department Sept. 18, 2007.
Lempel-Ziv Compression Techniques Classification of Lossless Compression techniques Introduction to Lempel-Ziv Encoding: LZ77 & LZ78 LZ78 Encoding Algorithm.
Lempel-Ziv Compression Techniques
A Data Compression Algorithm: Huffman Compression
Lempel-Ziv-Welch (LZW) Compression Algorithm
Lempel-Ziv Compression Techniques
Data Representation CS105. Data Representation Types of data: – Numbers – Text – Audio – Images & Graphics – Video.
Klara Nahrstedt Spring 2014
Data Compression Basics & Huffman Coding
Lossless Compression Multimedia Systems (Module 2 Lesson 3)
Yehong, Wang Wei, Wang Sheng, Jinyang, Gordon. Outline Introduction Overview of Huffman Coding Arithmetic Coding Encoding and Decoding Probabilistic Model.
Spring 2015 Mathematics in Management Science Binary Linear Codes Two Examples.
Efficient encoding methods  Coding theory refers to study of code properties and their suitability to specific applications.  Efficient codes are used,
Compression Algorithms
Chapter 2 Source Coding (part 2)
Noiseless Coding. Introduction Noiseless Coding Compression without distortion Basic Concept Symbols with lower probabilities are represented by the binary.
Text Compression Spring 2007 CSE, POSTECH. 2 2 Data Compression Deals with reducing the size of data – Reduce storage space and hence storage cost Compression.
Algorithm Design & Analysis – CS632 Group Project Group Members Bijay Nepal James Hansen-Quartey Winter
Page 110/6/2015 CSE 40373/60373: Multimedia Systems So far  Audio (scalar values with time), image (2-D data) and video (2-D with time)  Higher fidelity.
1 Analysis of Algorithms Chapter - 08 Data Compression.
Fundamental Structures of Computer Science Feb. 24, 2005 Ananda Guna Lempel-Ziv Compression.
Lecture 29. Data Compression Algorithms 1. Commonly, algorithms are analyzed on the base probability factor such as average case in linear search. Amortized.
Fundamental Structures of Computer Science March 23, 2006 Ananda Guna Lempel-Ziv Compression.
Fundamental Data Structures and Algorithms Aleks Nanevski February 10, 2004 based on a lecture by Peter Lee LZW Compression.
1 Strings CopyWrite D.Bockus. 2 Strings Def: A string is a sequence (possibly empty) of symbols from some alphabet. What do we use strings for? 1) Text.
Multimedia Specification Design and Production 2012 / Semester 1 / L3 Lecturer: Dr. Nikos Gazepidis
Image Compression (Chapter 8) CSC 446 Lecturer: Nada ALZaben.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
The LZ family LZ77 LZ78 LZR LZSS LZB LZH – used by zip and unzip
1 Classification of Compression Methods. 2 Data Compression  A means of reducing the size of blocks of data by removing  Unused material: e.g.) silence.
COMPRESSION. Compression in General: Why Compress? So Many Bits, So Little Time (Space) CD audio rate: 2 * 2 * 8 * = 1,411,200 bps CD audio storage:
Huffman Code and Data Decomposition Pranav Shah CS157B.
Fundamental Data Structures and Algorithms Margaret Reid-Miller 24 February 2005 LZW Compression.
Data Compression Reduce the size of data.  Reduces storage space and hence storage cost. Compression ratio = original data size/compressed data size.
CS Spring 2011 CS 414 – Multimedia Systems Design Lecture 6 – Basics of Compression (Part 1) Klara Nahrstedt Spring 2011.
1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.
Multimedia – Data Compression
Lecture 7 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan
Comp 335 File Structures Data Compression. Why Study Data Compression? Conserves storage space Files can be transmitted faster because there are less.
Data Compression 황승원 Fall 2010 CSE, POSTECH 2 2 포항공과대학교 황승원 교 수는 데이터구조를 수강하 는 포항공과대학교 재학생 들에게 데이터구조를 잘해 야 전산학을 잘할수 있으니 더욱 열심히 해야한다고 말 했다. 포항공과대학교 A 데이터구조를.
CS 1501: Algorithm Implementation LZW Data Compression.
Lossless Decomposition and Huffman Codes Sophia Soohoo CS 157B.
Lempel-Ziv-Welch Compression
Multi-media Data compression
Page 1KUT Graduate Course Data Compression Jun-Ki Min.
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Basics
Huffman code and Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Computer Sciences Department1. 2 Data Compression and techniques.
Compression and Huffman Coding. Compression Reducing the memory required to store some information. Lossless compression vs lossy compression Lossless.
Data Compression: Huffman Coding in Weiss (p.389)
CSE 589 Applied Algorithms Spring 1999
Data Compression.
Applied Algorithmics - week7
Lempel-Ziv Compression Techniques
Information of the LO Subject: Information Theory
Lempel-Ziv-Welch (LZW) Compression Algorithm
Why Compress? To reduce the volume of data to be transmitted (text, fax, images) To reduce the bandwidth required for transmission and to reduce storage.
Data Compression Reduce the size of data.
Lempel-Ziv Compression Techniques
Strings CopyWrite D.Bockus.
UNIT IV.
CSE 589 Applied Algorithms Spring 1999
فشرده سازي داده ها Reduce the size of data.
Presentation transcript:

CS 1501: Algorithm Implementation LZW Data Compression

Data Compression Reduce the size of data Reduces storage space and hence the cost Reduces time to retrieve and transmit data Compression ratio = original data size / compressed data size Retrieval time from disk or Internet.

Lossless and Lossy Compression compressedData = compress(originalData) decompressedData = decompress(compressedData) Lossless compression  originalData = decompressedData Lossy compression  originalData  decompressedData

Lossless and Lossy Compression Lossy compressors generally obtain much higher compression ratios as compared to lossless compressors e.g. 100 vs. 2 Lossless compression is essential in applications such as text file compression. Lossy compression is acceptable in many audio and imaging applications In video transmission, a slight loss in the transmitted video is not noticable by the human eye. In some applications, such as medical images, local laws may forbid the use of lossy compression even though you may not be able to visually see the difference between the original and lossy images.

Text Compression LZW Compression Lossless compression is essential Popular text compressors such as zip and Unix’s compress are based on the LZW (Lempel-Ziv-Welch) method. LZW Compression Character sequences in the original text are replaced by codes that are dynamically determined. The code table is not encoded into the compressed text, because it may be reconstructed from the compressed text during decompression.

LZW Compression Assume the letters in the text are limited to {a, b} In practice, the alphabet may be the 256 character ASCII set. The characters in the alphabet are assigned code numbers beginning at 0 The initial code table is: code key a 1 b

Compression public static void compress() { String input = BinaryStdIn.readString(); TST<Integer> st = new TST<Integer>(); for (int i = 0; i < R; i++) st.put("" + (char) i, i); int code = R+1; // R is codeword for EOF while (input.length() > 0) { String s = st.longestPrefixOf(input); // Find max prefix match s. BinaryStdOut.write(st.get(s), W); // Print s's encoding. int t = s.length(); if (t < input.length() && code < L) // Add s to symbol table. st.put(input.substring(0, t + 1), code++); input = input.substring(t); // Scan past s in input. } BinaryStdOut.write(R, W); BinaryStdOut.close(); Compression takes O(n) expected time, because the expected complexity of the hash table operations in O(1).

LZW Compression code key a 1 b Original text = abababbabaabbabbaabba a 1 b Original text = abababbabaabbabbaabba Compression is done by scanning the original text from left to right. Find longest prefix p for which there is a code in the code table. Represent p by its code pCode and assign the next available code number to pc, where c is the next character in the text to be compressed.

LZW Compression Original text = abababbabaabbabbaabba code key a 1 b 2 ab Original text = abababbabaabbabbaabba p = a pCode = 0 Compressed text = 0 c = b Enter p | c = ab into the code table.

Original text = abababbabaabbabbaabba Compressed text = 0 code key a 1 b 2 ab 3 ba Original text = abababbabaabbabbaabba Compressed text = 0 p = b pCode = 1 Compressed text = 01 c = a Enter p | c = ba into the code table.

Original text = abababbabaabbabbaabba Compressed text = 01 code key a 1 b 2 ab 3 4 aba ba Original text = abababbabaabbabbaabba Compressed text = 01 p = ab pCode = 2 Compressed text = 012 c = a Enter aba into the code table

Original text = abababbabaabbabbaabba Compressed text = 012 code key a 1 b 2 ab 3 4 5 abb ba aba Original text = abababbabaabbabbaabba Compressed text = 012 p = ab pCode = 2 Compressed text = 0122 c = b Enter abb into the code table.

Original text = abababbabaabbabbaabba Compressed text = 0122 code key a 1 b 2 ab 3 4 5 6 bab ba aba abb Original text = abababbabaabbabbaabba Compressed text = 0122 p = ba pCode = 3 Compressed text = 01223 c = b Enter bab into the code table.

Original text = abababbabaabbabbaabba Compressed text = 01223 code key a 1 b 2 ab 3 4 5 6 bab 7 baa ba aba abb Original text = abababbabaabbabbaabba Compressed text = 01223 p = ba pCode = 3 Compressed text = 012233 c = a Enter baa into the code table.

Original text = abababbabaabbabbaabba Compressed text = 012233 code key a 1 b 2 ab 3 4 5 6 bab 7 baa 8 abba ba aba abb Original text = abababbabaabbabbaabba Compressed text = 012233 p = abb pCode = 5 Compressed text = 0122335 c = a Enter abba into the code table.

Original text = abababbabaabbabbaabba Compressed text = 0122335 code key a 1 b 2 ab 3 4 5 6 bab 7 baa 8 abba 9 abbaa ba aba abb Original text = abababbabaabbabbaabba Compressed text = 0122335 p = abba pCode = 8 Compressed text = 01223358 c = a Enter abbaa into the code table.

Original text = abababbabaabbabbaabba Compressed text = 01223358 code key a 1 b 2 ab 3 4 5 6 bab 7 baa 8 abba 9 abbaa ba aba abb Original text = abababbabaabbabbaabba Compressed text = 01223358 p = abba pCode = 8 Compressed text = 012233588 c = null No need to enter anything to the table

Code Table Representation key a 1 b 2 ab 3 4 5 6 bab 7 baa 8 abba 9 abbaa ba aba abb Dictionary. Pairs are (key, element) = (key, code). Operations are: get(key) and put(key, code) Limit number of codes to 212 Use a hash table Convert variable length keys into fixed length keys. Each key has the form pc, where the string p is a key that is already in the table. Replace pc with (pCode)c

Code Table Representation key a 1 b 2 ab 3 ba 4 aba 5 abb 6 bab 7 baa 8 abba 9 abbaa 0b 1a 2a 2b 3b 3a 5a 8a

LZW Decompression Original text = abababbabaabbabbaabba code key a 1 b Original text = abababbabaabbabbaabba Compressed text = 012233588 Convert codes to text from left to right. pCode=0 represents p=a DecompressedText = a

Expand public static void expand() { String[] st = new String[L]; int i; // next available codeword value // initialize symbol table with all 1-character strings for (i = 0; i < R; i++) st[i] = "" + (char) i; st[i++] = ""; // (unused) lookahead for EOF int codeword = BinaryStdIn.readInt(W); if (codeword == R) return; // expanded message is empty string String val = st[codeword]; while (true) { BinaryStdOut.write(val); codeword = BinaryStdIn.readInt(W); if (codeword == R) break; String s = st[codeword]; if (i == codeword) s = val + val.charAt(0); // special case hack if (i < L) st[i++] = val + s.charAt(0); val = s; } BinaryStdOut.close(); Compression takes O(n) expected time, because the expected complexity of the hash table operations in O(1).

LZW Decompression Original text = abababbabaabbabbaabba code key a 1 b 2 ab Original text = abababbabaabbabbaabba Compressed text = 012233588 pCode=1 represents p = b DecompressedText = ab lastP = a followed by first character of p is entered into the code table (a | b)

Original text = abababbabaabbabbaabba Compressed text = 012233588 code key a 1 b 2 ab 3 ba Original text = abababbabaabbabbaabba Compressed text = 012233588 pCode = 2 represents p=ab DecompressedText = abab lastP = b followed by first character of p is entered into the code table (b | a)

Original text = abababbabaabbabbaabba Compressed text = 012233588 code key a 1 b 2 ab 3 ba 4 aba Original text = abababbabaabbabbaabba Compressed text = 012233588 pCode = 2 represents p = ab DecompressedText = ababab. lastP = ab followed by first character of p is entered into the code table ( ab | a )

Original text = abababbabaabbabbaabba Compressed text = 012233588 code key a 1 b 2 ab 3 ba 4 aba 5 abb Original text = abababbabaabbabbaabba Compressed text = 012233588 pCode = 3 represents p = ba DecompressedText = abababba. lastP = ab followed by first character of p is entered into the code table

Original text = abababbabaabbabbaabba Compressed text = 012233588 code key a 1 b 2 ab 3 ba 4 aba 5 abb 6 bab Original text = abababbabaabbabbaabba Compressed text = 012233588 pCode = 3 represents p = ba DecompressedText = abababbaba. lastP = ba followed by first character of p is entered into the code table.

Original text = abababbabaabbabbaabba Compressed text = 012233588 code key a 1 b 2 ab 3 ba 4 aba 5 abb 6 bab 7 baa Original text = abababbabaabbabbaabba Compressed text = 012233588 pCode = 5 represents p = abb DecompressedText = abababbabaabb. lastP = ba followed by first character of p is entered into the code table.

Original text = abababbabaabbabbaabba Compressed text = 012233588 code key a 1 b 2 ab 3 ba 4 aba 5 abb 6 bab 7 baa 8 abba Original text = abababbabaabbabbaabba Compressed text = 012233588 8 represents ??? When a code is not in the table (then, it is the last one entered), and its key is lastP followed by first character of lastP lastP = abb So 8 represents p = abba Decompressed text = abababbabaabbabba

Original text = abababbabaabbabbaabba Compressed text = 012233588 code key a 1 b 2 ab 3 ba 4 aba 5 abb 6 bab 7 baa 8 9 abbaa abba Original text = abababbabaabbabbaabba Compressed text = 012233588 pCode= 8 represents p=abba DecompressedText=abababbabaabbabbaabba. lastP = abba followed by first character of p is entered into the code table ( abba | a )

Code Table Representation key a 1 b 2 ab 3 4 5 6 bab 7 baa 8 abba 9 abbaa ba aba abb Dictionary. Pairs are (code, what the code represents) = (code, key). Operations are : get(code, key) and put(key) Code is an integer 0, 1, 2, … Use a 1D array codeTable. E.g. codeTable[code] = Key Each key has the form pc, where the string p is a codeKey that is already in the table; therefore you may replace pc with (pCode)c

Time Complexity Compression Decompression Expected time = O(n) where n is the length of the text that is being compressed. Decompression O(n), where n is the length of the decompressed text. Compression takes O(n) expected time, because the expected complexity of the hash table operations in O(1).