Efficient encoding methods  Coding theory refers to study of code properties and their suitability to specific applications.  Efficient codes are used,

Slides:



Advertisements
Similar presentations
Data compression. INTRODUCTION If you download many programs and files off the Internet, we have probably encountered.
Advertisements

T.Sharon-A.Frank 1 Multimedia Compression Basics.
15 Data Compression Foundations of Computer Science ã Cengage Learning.
Data Compression CS 147 Minh Nguyen.
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
21/05/2015Applied Algorithmics - week51 Off-line text search (indexing)  Off-line text search refers to the situation in which a preprocessed digital.
Lossless Compression - II Hao Jiang Computer Science Department Sept. 18, 2007.
Compression & Huffman Codes
Algorithm Programming Some Topics in Compression Bar-Ilan University תשס"ח by Moshe Fresko.
Compression Techniques. Digital Compression Concepts ● Compression techniques are used to replace a file with another that is smaller ● Decompression.
SWE 423: Multimedia Systems
CSCI 3 Chapter 1.8 Data Compression. Chapter 1.8 Data Compression  For the purpose of storing or transferring data, it is often helpful to reduce the.
A Data Compression Algorithm: Huffman Compression
Computer Science 335 Data Compression.
Compression & Huffman Codes Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Data dan Teknologi Multimedia Sesi 08 Nofriyadi Nurdam.
Lossless Compression Multimedia Systems (Module 2 Lesson 3)
Spring 2015 Mathematics in Management Science Binary Linear Codes Two Examples.
Media File Formats Jon Ivins, DMU. Text Files n Two types n 1. Plain text (unformatted) u ASCII Character set is most common u 7 bits are used u This.
8. Compression. 2 Video and Audio Compression Video and Audio files are very large. Unless we develop and maintain very high bandwidth networks (Gigabytes.
Lecture 10 Data Compression.
Chapter 2 Source Coding (part 2)
Text Compression Spring 2007 CSE, POSTECH. 2 2 Data Compression Deals with reducing the size of data – Reduce storage space and hence storage cost Compression.
MULTIMEDIA TECHNOLOGY SMM 3001 DATA COMPRESSION. In this chapter The basic principles for compressing data The basic principles for compressing data Data.
Source Coding-Compression
296.3Page 1 CPS 296.3:Algorithms in the Real World Data Compression: Lecture 2.5.
Information and Coding Theory Heuristic data compression codes. Lempel- Ziv encoding. Burrows-Wheeler transform. Juris Viksna, 2015.
Page 110/6/2015 CSE 40373/60373: Multimedia Systems So far  Audio (scalar values with time), image (2-D data) and video (2-D with time)  Higher fidelity.
Lecture 29. Data Compression Algorithms 1. Commonly, algorithms are analyzed on the base probability factor such as average case in linear search. Amortized.
Fundamental Structures of Computer Science March 23, 2006 Ananda Guna Lempel-Ziv Compression.
Multimedia Specification Design and Production 2012 / Semester 1 / L3 Lecturer: Dr. Nikos Gazepidis
Images 01/29/04 Resources: Yale Web Style Guide The GIF Controversy Unisys - lzw.
Multimedia Data Introduction to Lossless Data Compression Dr Sandra I. Woolley Electronic, Electrical.
The LZ family LZ77 LZ78 LZR LZSS LZB LZH – used by zip and unzip
Fundamental Data Structures and Algorithms Margaret Reid-Miller 24 February 2005 LZW Compression.
Data Compression Reduce the size of data.  Reduces storage space and hence storage cost. Compression ratio = original data size/compressed data size.
Parallel Data Compression Utility Jeff Gilchrist November 18, 2003 COMP 5704 Carleton University.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Comp 335 File Structures Data Compression. Why Study Data Compression? Conserves storage space Files can be transmitted faster because there are less.
Data Compression 황승원 Fall 2010 CSE, POSTECH 2 2 포항공과대학교 황승원 교 수는 데이터구조를 수강하 는 포항공과대학교 재학생 들에게 데이터구조를 잘해 야 전산학을 잘할수 있으니 더욱 열심히 해야한다고 말 했다. 포항공과대학교 A 데이터구조를.
CS 1501: Algorithm Implementation LZW Data Compression.
STATISTIC & INFORMATION THEORY (CSNB134) MODULE 11 COMPRESSION.
Sound (analogue signal). time Sound (analogue signal) time.
Lempel-Ziv-Welch Compression
Page 1KUT Graduate Course Data Compression Jun-Ki Min.
Lampel ZIV (LZ) code The Lempel-Ziv algorithm is a variable-to-fixed length code Basically, there are two versions of the algorithm LZ77 and LZ78 are the.
15-853Page :Algorithms in the Real World Data Compression III Lempel-Ziv algorithms Burrows-Wheeler Introduction to Lossy Compression.
CS 1501: Algorithm Implementation
Computer Sciences Department1. 2 Data Compression and techniques.
IS502:M ULTIMEDIA D ESIGN FOR I NFORMATION S YSTEM M ULTIMEDIA OF D ATA C OMPRESSION Presenter Name: Mahmood A.Moneim Supervised By: Prof. Hesham A.Hefny.
Submitted To-: Submitted By-: Mrs.Sushma Rani (HOD) Aashish Kr. Goyal (IT-7th) Deepak Soni (IT-8 th )
Data Compression Michael J. Watts
Compression & Huffman Codes
CS644 Advanced Topics in Networking
Information and Coding Theory
Data Compression.
Applied Algorithmics - week7
Information of the LO Subject: Information Theory
Lempel-Ziv-Welch (LZW) Compression Algorithm
Data Compression CS 147 Minh Nguyen.
Why Compress? To reduce the volume of data to be transmitted (text, fax, images) To reduce the bandwidth required for transmission and to reduce storage.
Data Compression Reduce the size of data.
Compression, Lossy, Lossless
Image Transforms for Robust Coding
فشرده سازي داده ها Reduce the size of data.
Image Coding and Compression
15 Data Compression Foundations of Computer Science ã Cengage Learning.
Chapter 8 – Compression Aims: Outline the objectives of compression.
CPS 296.3:Algorithms in the Real World
15 Data Compression Foundations of Computer Science ã Cengage Learning.
Presentation transcript:

Efficient encoding methods  Coding theory refers to study of code properties and their suitability to specific applications.  Efficient codes are used, e.g., in data compression, cryptography, error-correction, and group testing.  Codes play a central part in information theory, in particular in the design of efficient and reliable data transmission methods.  Encoding methods focus on reduction (clever use) of redundancy in data compression (in error detection and correction mechanisms) 03/09/2015Applied Algorithmics - week61

03/09/2015Applied Algorithmics - week62 Data compression  Data compression is the process of encoding information using fewer bits or other information-bearing units.  Compression is possible where the input data have statistical redundancy (e.g., in text files) or when relatively minor changes leading to smaller representation do not affect the quality/fidelity of the input (e.g., in pictures, video, or audio files).  Popular instances of data compression that many computer users are familiar with is the ZIP file format (texts), jpeg format (pictures) and mpeg format (for audio and video).

03/09/2015Applied Algorithmics - week63 Data compression  Some compression schemes are reversible so that the original data can be reconstructed (lossless data compression), while others accept some loss of data in order to achieve higher compression (lossy data compression).  Compression is important because it helps reduce the consumption of expensive resources, such as disk space or connection bandwidth. However, compression requires increased information processing power, which can also be expensive.

03/09/2015Applied Algorithmics - week64 Data compression - simple example  Run-Length Encoding Data files frequently contain the same character repeated many times in a row. For example, text files use multiple spaces to separate sentences, indent paragraphs, format tables & charts, etc. Digitized signals can also have runs of the same value, indicating that the signal is not changing. For example, an image of the night-time sky would contain long runs of the character or characters representing the black background.

03/09/2015Applied Algorithmics - week65 Data compression - simple example  Run-Length Encoding In this scheme we focus on long runs of characters. Each time a long run is encountered in the input data, two values are written to the output file. The first of these values is the character itself, i.e., a flag to indicate that run-length compression is beginning. The second value is the number of characters in the run.

03/09/2015Applied Algorithmics - week66 Move to Front Transform  Move to Front (MTF) transform is an encoding of data (typically a stream of bytes) designed to improve the performance of entropy encoding (coding scheme that assigns codes to symbols so as to match code lengths with the probabilities of the symbols) techniques of compression.  When properly implemented, it is fast enough that its benefits usually justify including it as an extra step in data compression algorithms.

03/09/2015Applied Algorithmics - week67 Move to Front Transform  In the context of MTF each byte value is encoded by its index in a list, which changes over the course of the algorithm.  The list is initially stored, e.g., in order by byte value (0, 1, 2, 3,..., 255). Therefore, the first byte is always encoded by its own value.  However, after encoding a byte, that value is moved to the front of the list before continuing to the next byte.

03/09/2015Applied Algorithmics - week68 Move to Front Transform - example  Let S= be an input sequence and the initial content of the queue Q is [0,1,2,3,4,5,6,7,8,9]  The encoding process will transform S as follows: S= and Q=[0,1,2,3,4,5,6,7,8,9] S= and Q=[9,0,1,2,3,4,5,6,7,8] S= and Q=[8,9,0,1,2,3,4,5,6,7] S= and Q=[1,8,9,0,2,3,4,5,6,7] S= and Q=[9,1,8,0,2,3,4,5,6,7] Where the blue value refers to the position of the symbol in the last instance of Q

03/09/2015Applied Algorithmics - week69 Burrows-Wheeler Transform  The Burrows-Wheeler transform (BWT), a.k.a. block-sorting compression, is one of the most popular method in data compression.  It was invented by Michael Burrows and David Wheeler, in 90-ties.  When a character string is transformed by the BWT, none of its characters change value. The transform rearranges in clever for the order of the characters in the string.  If the original string had several substrings that occurred frequently, then the transformed string will have several places where a single character is repeated multiple times in a row.  This is useful for compression, since it tends to be easy to compress a string that has runs of repeated characters by techniques such as move-to-front transform and run-length encoding.

03/09/2015Applied Algorithmics - week610 Cyclic rotations  For 0≤k ≤n-1, the k th cyclic rotation of the string w=w[0..n-1] is another string v = v[0..n-1], s.t., v[i]=w[(i+k) mod n] x xy y k w v

03/09/2015Applied Algorithmics - week611 Burrows-Wheeler Transform  The Burrows-Wheeler Transform transform for the string w = w[0..n-1] is defined as follows: Create a square matrix M[n x n] in which the k th row contains the k th cyclic rotation of w Sort rows of M in lexicographic order Store the string represented by the last column of M And the index of row which contains the position of the original string w (i.e., 0 th cyclic rotation)

03/09/2015Applied Algorithmics - week612 Burrows-Wheeler Transform (example)  Consider 5 th Fibonacci word f 5 =babbabab b a b b a b a b a b b a b a b b b b a b a b b a b a b a b b a b a b a b b a b b b a b b a b b a a b b a b b a b b b a b b a b a [0] [1] [2] [3] [4] [5] [6] [7] a b a b b a b b a b b a b a b b a b b a b b a b b a b a b b a b b a b b a b a b b a b b a b b a b b a b a b b a b b a b b a b a BWT [7] [0] [4] [1] [6] [3] [5] [2] [0] [1] [2] [3] [4] [5] [6] [7]  The output string bbbbbaaa and position [4]

03/09/2015Applied Algorithmics - week613 Burrows-Wheeler Transform  The Burrows-Wheeler transform can be computed by the algorithm that constructs suffix arrays  Which means that the Burrows-Wheeler transform can be computed in linear time  The Burrows-Wheeler transform is reversible and the original string can be recovered efficiently via generation of consecutive columns of matrix M

03/09/2015Applied Algorithmics - week614 Burrows-Wheeler (reverse) Transform -- hard way b b b b b a a a a a a b b b b b …….. b b b b a a a ab ba bb …….. ……. b ab…….. ba bb ab bab bba aba abb b b b b a a a bab bba …….. ……. b aba…….. b b b b a a a abba baba babb bbab …….. ……. b abab…….. babb bbab abab abba baba babba bbaba bbabb ababb abbab babab b b b b a a a abbab babab babba bbabb bbaba b ababb….. babbab bbabab bbabba ababba abbabb abbaba bababb b b b b a a a abbaba abbabb bababb babbab bbabba bbabab b ababba….. babbaba babbabb bbababb bbabbab ababbab abbabba abbabab bababba b b b b a a a abbabab abbabba bababba babbaba babbabb bbabbab bbababb b ababbab.. babbabab babbabba bbababba bbabbaba ababbabb abbabbab abbababb bababbab abbababb abbabbab bababbab babbabab babbabba bbabbaba bbababba ababbabb

03/09/2015Applied Algorithmics - week615 Burrows-Wheeler (reverse) Transform -- easy way based on stable sorting property a b b b b b a a a a a b b b b b BWT 1 st col Corresponding symbols a b b b b b a a a a b b b b b a Structure a b b b b b a a a a a b b b b b BWT 1 st col Reverse BWT Just follow the cycle b a b b a b a b

03/09/2015Applied Algorithmics - week716 Lempel-Ziv-Welch Compression  The Lempel-Ziv-Welch (LZW) compression algorithm is an example of dictionary based methods, in which longer fragments of the input text are replaced by much shorter references to code words stored in the special set called dictionary  LZW is an implementation of a lossless data compression algorithm developed by Abraham Lempel and Jacob Ziv.  It was published by Terry Welch in 1984 as an improved version of the LZ78 dictionary coding algorithm developed by Lempel and Ziv.

03/09/2015Applied Algorithmics - week717 LZW Compression  The key insight of the method is that it is possible to automatically build a dictionary of previously seen strings in the text being compressed.  The dictionary starts off with 256 entries, one for each possible character (single byte string).  Every time a string not already in the dictionary is seen, a longer string consisting of that string appended with the single character following it in the text, is stored in the dictionary.

03/09/2015Applied Algorithmics - week718 LZW Compression  The output consists of integer indices into the dictionary. These initially are 9 bits each, and as the dictionary grows, can increase to up to 16 bits.  A special symbol is reserved for "flush the dictionary" which takes the dictionary back to the original 256 entries, and 9 bit indices. This is useful if compressing a text which has variable characteristics, since a dictionary of early material is not of much use later in the text.  This use of variably increasing index sizes is one of Welch's contributions. Another was to specify an efficient data structure to store the dictionary.

03/09/2015Applied Algorithmics - week719 LZW Compression - example  Fibonacci language: w -1 =a, w -2 =b, w i = w i-1 ·w i-2 for i>1  For example, w 6 = babbababbabba  We show how LZW compresses babbababbabba bababbababbabba -2 Virtual part In general: CW 4 = CW 3 o First(CW 5 ) And in particular: CW i = CW j o First(CW i+1 ) and j<i CW 0 CW 1 CW 2 CW 3 CW 4 CW

03/09/2015Applied Algorithmics - week720 LZW Compression - example  cw -2 = b  cw -1 = a  cw 0 = ba  cw 1 = ab  cw 2 = bb  cw 3 = bab  cw 4 = babb  cw 5 = babba a a a b b b b cw -2 cw -1 b cw 5 cw 0 cw 1 cw 2 cw 3 cw 4

03/09/2015Applied Algorithmics - week721 LZW Compression - compression stage

03/09/2015Applied Algorithmics - week722 LZW Compression - compression stage cw   ; while ( read next symbol s from IN ) if cw·s exists in the dictionary then cw  cw·s; else add cw·s to the dictionary; save the index of cw in OUT; cw  s;

03/09/2015Applied Algorithmics - week723 Decompression stage  Copy all numbers from file IN to vector V [256………..Z+255]  Create vector F [256………..Z+255] containing first characters of each code word  Create vector CW [256………..Z+255] of all code words for i=256 to Z+255 do if V[i] < 256 then CW[i]  Concatenate(char(V[i]), F[i+1]) else CW[i]  Concatenate(CW(V[i]), F[i+1])  Write to the output file OUT all code words without their last symbols Input IN – Compressed file of integers. Output OUT – Decompressed file of characters. |IN| = Z – Size of the compressed file.

03/09/2015Applied Algorithmics - week724 LZW text compression  Theorem: For any input string S LZW algorithm computes its compressed counterpart in time O(n), where n is the length of S. Sketch of proof: The most complex operations are performed on dictionary. With a help of hash tables all operations can be performed in linear time.  Also the decompression stage is linear.