EE465: Introduction to Digital Image Processing Binary Image Compression 4 The art of modeling image source –Image pixels are NOT independent events 4.

Slides:



Advertisements
Similar presentations
Lecture 2: Basic Information Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)
Advertisements

15-583:Algorithms in the Real World
Data Compression CS 147 Minh Nguyen.
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
SIMS-201 Compressing Information. 2  Overview Chapter 7: Compression Introduction Entropy Huffman coding Universal coding.
Chapter 6 Information Theory
Chapter 7 End-to-End Data
CSc 461/561 CSc 461/561 Multimedia Systems Part B: 1. Lossless Compression.
SWE 423: Multimedia Systems
Lempel-Ziv Compression Techniques
Fundamental limits in Information Theory Chapter 10 :
CSCI 3 Chapter 1.8 Data Compression. Chapter 1.8 Data Compression  For the purpose of storing or transferring data, it is often helpful to reduce the.
Text Operations: Coding / Compression Methods. Text Compression Motivation –finding ways to represent the text in fewer bits –reducing costs associated.
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
EEE377 Lecture Notes1 EEE436 DIGITAL COMMUNICATION Coding En. Mohd Nazri Mahmud MPhil (Cambridge, UK) BEng (Essex, UK) Room 2.14.
Lossless Compression in Multimedia Data Representation Hao Jiang Computer Science Department Sept. 20, 2007.
1 Lossless Compression Multimedia Systems (Module 2) r Lesson 1: m Minimum Redundancy Coding based on Information Theory: Shannon-Fano Coding Huffman Coding.
Spring 2015 Mathematics in Management Science Binary Linear Codes Two Examples.
Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)
Software Research Image Compression Mohamed N. Ahmed, Ph.D.
CS559-Computer Graphics Copyright Stephen Chenney Image File Formats How big is the image? –All files in some way store width and height How is the image.
Basics of Compression Goals: to understand how image/audio/video signals are compressed to save storage and increase transmission efficiency to understand.
©2003/04 Alessandro Bogliolo Background Information theory Probability theory Algorithms.
8. Compression. 2 Video and Audio Compression Video and Audio files are very large. Unless we develop and maintain very high bandwidth networks (Gigabytes.
Chapter 2 Source Coding (part 2)
Noiseless Coding. Introduction Noiseless Coding Compression without distortion Basic Concept Symbols with lower probabilities are represented by the binary.
15-853Page :Algorithms in the Real World Data Compression II Arithmetic Coding – Integer implementation Applications of Probability Coding – Run.
Source Coding-Compression
296.3Page 1 CPS 296.3:Algorithms in the Real World Data Compression: Lecture 2.5.
Chapter 11 Fluency with Information Technology 4 th edition by Lawrence Snyder (slides by Deborah Woodall : 1.
Information and Coding Theory Heuristic data compression codes. Lempel- Ziv encoding. Burrows-Wheeler transform. Juris Viksna, 2015.
Page 110/6/2015 CSE 40373/60373: Multimedia Systems So far  Audio (scalar values with time), image (2-D data) and video (2-D with time)  Higher fidelity.
EE465: Introduction to Digital Image Processing 1 Data Compression: Advanced Topics  Huffman Coding Algorithm Motivation Procedure Examples  Unitary.
Multimedia Specification Design and Production 2012 / Semester 1 / L3 Lecturer: Dr. Nikos Gazepidis
Multimedia Data Introduction to Lossless Data Compression Dr Sandra I. Woolley Electronic, Electrical.
Compression.  Compression ratio: how much is the size reduced?  Symmetric/asymmetric: time difference to compress, decompress?  Lossless; lossy: any.
1 Classification of Compression Methods. 2 Data Compression  A means of reducing the size of blocks of data by removing  Unused material: e.g.) silence.
Addressing Image Compression Techniques on current Internet Technologies By: Eduardo J. Moreira & Onyeka Ezenwoye CIS-6931 Term Paper.
Digital Image Processing Image Compression
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Image Compression – Fundamentals and Lossless Compression Techniques
COMPRESSION. Compression in General: Why Compress? So Many Bits, So Little Time (Space) CD audio rate: 2 * 2 * 8 * = 1,411,200 bps CD audio storage:
Chapter 17 Image Compression 17.1 Introduction Redundant and irrelevant information  “Your wife, Helen, will meet you at Logan Airport in Boston.
Spring 2000CS 4611 Multimedia Outline Compression RTP Scheduling.
Lecture 4: Lossless Compression(1) Hongli Luo Fall 2011.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Digital Image Processing Lecture 22: Image Compression
Chapter 3 Data Representation. 2 Compressing Files.
Lecture 7 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan
Lossless Compression(2)
STATISTIC & INFORMATION THEORY (CSNB134) MODULE 11 COMPRESSION.
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval" Basics
EE465: Introduction to Digital Image Processing Lossless Image Compression 4 Recall: run length coding of binary and graphic images –Why does it not work.
Lampel ZIV (LZ) code The Lempel-Ziv algorithm is a variable-to-fixed length code Basically, there are two versions of the algorithm LZ77 and LZ78 are the.
Submitted To-: Submitted By-: Mrs.Sushma Rani (HOD) Aashish Kr. Goyal (IT-7th) Deepak Soni (IT-8 th )
IMAGE PROCESSING IMAGE COMPRESSION
Compression & Huffman Codes
EE465: Introduction to Digital Image Processing
Digital Image Processing Lecture 20: Image Compression May 16, 2005
Data Compression.
Multimedia Outline Compression RTP Scheduling Spring 2000 CS 461.
Data Compression.
Data Compression CS 147 Minh Nguyen.
Why Compress? To reduce the volume of data to be transmitted (text, fax, images) To reduce the bandwidth required for transmission and to reduce storage.
UNIT IV.
Image Transforms for Robust Coding
15 Data Compression Foundations of Computer Science ã Cengage Learning.
Chapter 8 – Compression Aims: Outline the objectives of compression.
15 Data Compression Foundations of Computer Science ã Cengage Learning.
Presentation transcript:

EE465: Introduction to Digital Image Processing Binary Image Compression 4 The art of modeling image source –Image pixels are NOT independent events 4 Run length coding of binary and graphic images –Applications: BMP, TIF/TIFF 4 Lempel-Ziv coding* –How does the idea of building a dictionary can achieve data compression?

EE465: Introduction to Digital Image Processing From Theory to Practice 4 So far, we have discussed the problem of data compression under the assumption that the source distribution is known (i.e., the probabilities of all possible events are given). 4 In practice, the source distribution (probability models) is unknown and can only be approximately obtained by relative frequency counting method due to the inherent dependency among source data.

EE465: Introduction to Digital Image Processing An Image Example Binary image sized 100  100 (approximately 1000 dark pixels)

EE465: Introduction to Digital Image Processing A Bad Model 4 Each pixel in the image is observed as an independent event 4 All pixels can be characterized by a single discrete random variable – binary Bernoulli source 4 We can estimate the probabilities by relative frequency counting

EE465: Introduction to Digital Image Processing Binary Bernoulli distribution (P(X=black)=0.1) Synthesized Image by the Bad Model

EE465: Introduction to Digital Image Processing Why does It Fail? 4 Roughly speaking –Pixels in an image are not independent events (source is not memoryless but spatially correlated) –Pixels in an image do not observe the same probability model (source is not stationary but spatially varying) 4 Fundamentally speaking –Pixels are the projection of meaningful objects in the real world (e.g., characters, lady, flowers, cameraman, etc.) –Our sensory processing system has learned to understand meaningful patterns in sensory input through evolution

EE465: Introduction to Digital Image Processing Similar Examples in 1D 4 Scenario I: think of a paragraph of English texts. Each alphabet is NOT independent due to semantic structures. For example, the probability of seeing a “u” is typically small; however, if a “q” appears, the probability of the next alphabet being “u” is large. 4 Scenario II: think of a piece of human speech. It consists of silent and voiced segments. The silent segment can be modeled by white Gaussian noise; while the voiced segment can not (e.g., pitches)

EE465: Introduction to Digital Image Processing Data Compression in Practice source modeling entropy coding discrete source X binary bit stream probability estimation P(Y) Y The art of data compression is the art of source modeling Probabilities can be estimated by counting relative frequencies either online or offline

EE465: Introduction to Digital Image Processing Source Modeling Techniques 4 Transformation –transform the source into an equivalent yet more convenient representation 4 Prediction –Predict the future based on the causal past 4 Pattern matching –Identify and represent repeated patterns

EE465: Introduction to Digital Image Processing Non-image Examples 4 Transformation in audio coding (MP3) –Audio samples are transformed into frequency domain 4 Prediction in speech coding (CELP) –Human vocal tract can be approximated by an auto-regressive model 4 Pattern matching in text compression –Search repeated patterns in texts

EE465: Introduction to Digital Image Processing How to Build a Good Model 4 Study the source characteristics –The origin of data: computer-generated, recorded, scanned … –It is about linguistics, physics, material science and so on … 4 Choose or invent the appropriate tool (modeling technique) Q: why pattern matching is suitable for texts, but not for speech or audio?

EE465: Introduction to Digital Image Processing Image Modeling 4 Binary images –Scanned documents (e.g., FAX) or computer- generated (e.g., line drawing in WORD) 4 Graphic images –Windows icons, web banners, cartoons 4 Photographic images –Acquired by digital cameras 4 Others: fingerprints, CT images, astronomical images …

EE465: Introduction to Digital Image Processing Lossless Image Compression 4 No information loss – i.e., the decoded image is mathematically identical to the original image –For some sensitive data such as document or medical images, information loss is simply unbearable –For others such as photographic images, we only care about the subjective quality of decoded images (not the fidelity to the original)

EE465: Introduction to Digital Image Processing Binary Image Compression 4 The art of modeling image source –Image pixels are NOT independent events 4 Run length coding of binary and graphic images –Applications: BMP, TIF/TIFF 4 Lempel-Ziv coding* –How does the idea of building a dictionary can achieve data compression?

EE465: Introduction to Digital Image Processing Run Length Coding (RLC) What is run length? Run length is defined as the length of consecutively identical symbols HHHHH T HHHHHHH 571 Examples SSSS EEEE NNNN WWWW 4444 Coin-flip random walk

EE465: Introduction to Digital Image Processing Run Length Coding (Con’t) Transformation by run-length counting Entropy coding discrete source X binary bit stream probability estimation P(Y) Y Y is the sequence of run-lengths from which X can be recovered losslessly

EE465: Introduction to Digital Image Processing Properties -“0” run-length (red) and “1” run-length (green) alternates - run-lengths are positive integers RLC of 1D Binary Source X Y 52 (need extra 1 bit to denote what the starting symbol is) Huffman coding compressed data

EE465: Introduction to Digital Image Processing … 57408run-length When P(x=0) is close to 1, we can record run-length of dominant symbol (“0”) only Example Properties - all coded symbols are “0” run-lengths - run-length is a nonnegative integer Variation of 1D Binary RLC

EE465: Introduction to Digital Image Processing Modeling Run-length geometric source: P(X=k)=(1/2) k, k=1,2,… Run-length kProbability 12345…12345… 1/2 1/4 1/8 1/16 1/32 …

EE465: Introduction to Digital Image Processing Golomb Codes k …k … codeword … Optimal VLC for geometric source: P(X=k)=(1/2) k, k=1,2,… …

EE465: Introduction to Digital Image Processing From 1D to 2D white run-length black run-length Question: what distinguishes 2D from 1D coding? Answer: inter-line dependency of run-lengths

EE465: Introduction to Digital Image Processing Relative Address Coding (RAC)* previous line current line 7 d 1 =1d 2 =-2NS,run=2 NS – New Start Its variation was adopted by CCITT for Fax transmission

EE465: Introduction to Digital Image Processing Image Example CCITT test image No. 1 Size: 1728  bytes Raw data (1bps) filesize of ccitt1.pbm: filesize of ccitt1.tif: bytes Compression Ratio=13.65

EE465: Introduction to Digital Image Processing Graphic Images (Cartoons) Observations: -dominant background color (e.g., white) -objects only contain a few other colors Total 12 different colors (black,white,red,green,blue,yellow …)

EE465: Introduction to Digital Image Processing Palette-based Image Representation index color … white black red green blue yellow … Any (R,G,B) 24-bit color can be repre- sented by its index in the palette.

EE465: Introduction to Digital Image Processing … 53748run-length Example color-index RLC of 1D M-ary Source Basic idea Record not only run-lengths but also color indexes 1 1 (color, run) representation:(0,5) (3,3) (0,7) (2,4) (1,1) (0,8) … WWWWGGGWWWWWWWRRRR… color sequence Note: run-length is a positive integer

EE465: Introduction to Digital Image Processing Variation of 1D M-ary RLC When P(x=0) is close to 1, we can record run-length of dominant symbol (“0”) only … 57408run Example Properties - “0” run-lengths only - run-length is a nonnegative integer level14321 (run, level) representation: (5,1) (7,4) (4,3) (0,2) (8,1) …

EE465: Introduction to Digital Image Processing Image Example Raw data: party8.ppm, 526  286, bytes Compressed: party8.bmp, bytes Compression ratio=26.4

EE465: Introduction to Digital Image Processing Binary Image Compression 4 The art of modeling image source –Image pixels are NOT independent events 4 Run length coding of binary and graphic images –Applications: BMP, TIF/TIFF 4 Lempel-Ziv coding* –How does the idea of building a dictionary can achieve data compression?

EE465: Introduction to Digital Image Processing History of Lempel-Ziv Coding 4 Invented by Lempel-Ziv in Numerous variations and improvements since then 4 Widely used in different applications –Unix system: compress command –Winzip software (LZW algorithm) –TIF/TIFF image format –Dial-up modem (to speed up the transmission)

EE465: Introduction to Digital Image Processing Dictionary-based Coding 4 Use a dictionary –Think about the evolution of an English dictionary It is structured - if any random combination of alphabets formed a word, the dictionary would not exist It is dynamic - more and more words are put into the dictionary as time moves on –Data compression is similar in the sense that redundancy reveals as patterns, just like English words in a dictionary

EE465: Introduction to Digital Image Processing Toy Example I took a walk in town one day And met a cat along the way. What do you think that cat did say? Meow, Meow, Meow I took a walk in town one day And met a pig along the way. What do you think that pig did say? Oink, Oink, Oink I took a walk in town one day And met a cow along the way. What do you think that cow did say? Moo, Moo, Moo - from “Wee Sing for Baby” I took a walk in town one day 1 entrypattern And met a What do you think that did say? along the way 6 7 cat meow ……

EE465: Introduction to Digital Image Processing Basic Ideas 4 Build up the dictionary on-the-fly (based on causal past such that decoder can duplicate the process) 4 Achieve the goal of compression by replacing a repeated string by a reference to an earlier occurrence 4 Unlike VLC (fixed-to-variable), LZ parsing goes the other way (variable-to-fixed)

EE465: Introduction to Digital Image Processing Lempel-Ziv Parsing 4 Initialization: –D={all single-length symbols} –L is the smallest integer such that all codewords whose length is smaller than L have appeared in D (L=1 at the beginning) 4 Iterations: w  next parsed block of L symbols –Rule A: If w  D, then represent w by its entry in D and update D by adding a new entry with w concatenated with its next input symbol –Rule B: If w  D, then represent the first L-1 symbols in w by its entry in D and update D by adding a new entry with w

EE465: Introduction to Digital Image Processing Example of Parsing a Binary Stream … Dictionary entrypattern , 2, 2, 1, 3, 4, 7, … (entries in D) fixed -length variable -length output: input: L=1 L=2 L=3 w: 0, 11, 10, 00, 01, 110, 011, … A AABBBB rule: Illustration: Step 1: w=0, Rule A, output 1, add 01 to D, L=2 Step 2: w=11, Rule B, output 2, add 11 to D Step 3: w=10, Rule B, output 2, add 10 to D Step 4: w=00, Rule B, output 1, add 00 to D Step 5: w=01, Rule A, output 3, add 011 to D, L=3 Step 6: w=110, Rule B, output 4, add 110 to D

EE465: Introduction to Digital Image Processing Binary Image Compression Summary 4 Theoretical aspect –Shannon’s source entropy formula tells us the lower bound for coding a memoryless discrete source –To achieve the source entropy, we need to use variable length codes (i.e., long codeword assigned to small-probability event and vice versa) –Huffman’s algorithm (generating the optimal prefix codes) offers a systematic solution 4 Practical aspect –Data in the real world contains memory (dependency, correlation …) –We don’t know the probability distribution function of the source –Tricky point: the goal of image compression (processing) is not to make your algorithm work for one image, but for a class of images

EE465: Introduction to Digital Image Processing The Art of Source Modeling 4 How do I know a model matches or not? Hypothesis Model: Bernoulli distribution (P(X=black)=0.1) Observation Data

EE465: Introduction to Digital Image Processing A Matched Example Synthesis Model

EE465: Introduction to Digital Image Processing Transformation by run-length counting Huffman coding discrete source X binary bit stream probability estimation P(Y) Y Y is the sequence of run-lengths from which X can be recovered losslessly Good Models for Binary Images Transformation by Lempel-Ziv parsing Y is the sequence of dictionary entries from which X can be recovered losslessly

EE465: Introduction to Digital Image Processing Remaining Questions 4 How to choose different compression techniques, say RLC vs. LZC? –Look at the source statistics –For instance, compare binary image (good for RLC) vs. English texts (good for LZC) 4 What is the best binary image compression technique? –It is called JBIG2 – an enhanced version of JBIG 4 What is the entropy of binary image source? –We don’t know and the question itself is questionable –A more relevant question would be: what is the entropy for a probabilistic model which we believe generates the source?