Huffman Coding, Arithmetic Coding, and JBIG2

Slides:



Advertisements
Similar presentations
15-583:Algorithms in the Real World
Advertisements

Data Compression CS 147 Minh Nguyen.
Image Compression. Data and information Data is not the same thing as information. Data is the means with which information is expressed. The amount of.
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Arithmetic Coding. Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a How we can do better than Huffman? - I As we have seen, the.
Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.
T h e U n i v e r s i t y o f B r i t i s h C o l u m b i a Bi-Level Image Compression EECE 545: Data Compression by Dave Tompkins.
Data Compression Michael J. Watts
Lecture04 Data Compression.
SWE 423: Multimedia Systems
Compression & Huffman Codes
CSc 461/561 CSc 461/561 Multimedia Systems Part B: 1. Lossless Compression.
SWE 423: Multimedia Systems
A Data Compression Algorithm: Huffman Compression
DL Compression – Beeri/Feitelson1 Compression דחיסה Introduction Information theory Text compression IL compression.
Compression & Huffman Codes Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Chapter 9: Huffman Codes
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Lossless Compression in Multimedia Data Representation Hao Jiang Computer Science Department Sept. 20, 2007.
1 A Balanced Introduction to Computer Science, 2/E David Reed, Creighton University ©2008 Pearson Prentice Hall ISBN Chapter 12 Data.
Data Compression Basics & Huffman Coding
1 Lossless Compression Multimedia Systems (Module 2) r Lesson 1: m Minimum Redundancy Coding based on Information Theory: Shannon-Fano Coding Huffman Coding.
Huffman Codes Message consisting of five characters: a, b, c, d,e
CSE Lectures 22 – Huffman codes
Huffman Coding Vida Movahedi October Contents A simple example Definitions Huffman Coding Algorithm Image Compression.
8. Compression. 2 Video and Audio Compression Video and Audio files are very large. Unless we develop and maintain very high bandwidth networks (Gigabytes.
Noiseless Coding. Introduction Noiseless Coding Compression without distortion Basic Concept Symbols with lower probabilities are represented by the binary.
15-853Page :Algorithms in the Real World Data Compression II Arithmetic Coding – Integer implementation Applications of Probability Coding – Run.
296.3Page 1 CPS 296.3:Algorithms in the Real World Data Compression: Lecture 2.5.
Page 110/6/2015 CSE 40373/60373: Multimedia Systems So far  Audio (scalar values with time), image (2-D data) and video (2-D with time)  Higher fidelity.
8. 1 MPEG MPEG is Moving Picture Experts Group On 1992 MPEG-1 was the standard, but was replaced only a year after by MPEG-2. Nowadays, MPEG-2 is gradually.
File Compression Techniques Alex Robertson. Outline History Lossless vs Lossy Basics Huffman Coding Getting Advanced Lossy Explained Limitations Future.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
COMPRESSION. Compression in General: Why Compress? So Many Bits, So Little Time (Space) CD audio rate: 2 * 2 * 8 * = 1,411,200 bps CD audio storage:
Huffman Code and Data Decomposition Pranav Shah CS157B.
CS654: Digital Image Analysis Lecture 34: Different Coding Techniques.
Bahareh Sarrafzadeh 6111 Fall 2009
Lossless Compression(2)
STATISTIC & INFORMATION THEORY (CSNB134) MODULE 11 COMPRESSION.
1 Data Compression Hae-sun Jung CS146 Dr. Sin-Min Lee Spring 2004.
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
IS502:M ULTIMEDIA D ESIGN FOR I NFORMATION S YSTEM M ULTIMEDIA OF D ATA C OMPRESSION Presenter Name: Mahmood A.Moneim Supervised By: Prof. Hesham A.Hefny.
Lossless Compression-Statistical Model Lossless Compression One important to note about entropy is that, unlike the thermodynamic measure of entropy,
Submitted To-: Submitted By-: Mrs.Sushma Rani (HOD) Aashish Kr. Goyal (IT-7th) Deepak Soni (IT-8 th )
Information theory Data compression perspective Pasi Fränti
Data Compression Michael J. Watts
JPEG Compression What is JPEG? Motivation
IMAGE PROCESSING IMAGE COMPRESSION
Data Coding Run Length Coding
Compression & Huffman Codes
Data Compression.
Multimedia Outline Compression RTP Scheduling Spring 2000 CS 461.
Lossy Compression of Stochastic Halftones with JBIG2
Algorithms in the Real World
JPEG.
Data Compression CS 147 Minh Nguyen.
Chapter 8 – Binary Search Tree
Chapter 9: Huffman Codes
Why Compress? To reduce the volume of data to be transmitted (text, fax, images) To reduce the bandwidth required for transmission and to reduce storage.
Chapter 11 Data Compression
Huffman Coding CSE 373 Data Structures.
CSE 589 Applied Algorithms Spring 1999
Huffman Encoding Huffman code is method for the compression for standard text documents. It makes use of a binary tree to develop codes of varying lengths.
JPEG Still Image Data Compression Standard
Image Coding and Compression
Huffman Encoding.
An Algorithm for Compression of Bilevel Images
CSE 589 Applied Algorithms Spring 1999
Presentation transcript:

Huffman Coding, Arithmetic Coding, and JBIG2 Illustrations Arber Borici 2010 University of N British Columbia

Huffman Coding Entropy encoder for lossless compression Input: Symbols and corresponding probabilities Output: Prefix-free codes with minimum expected lengths Prefix property: There exists no code in the output that is a prefix of another code Optimal encoding algorithm

Huffman Coding: Algorithm Create a forest of leaf nodes for each symbol Take two nodes with lowest probabilities and make them siblings. The new internal node has a probability equal to the sum of the probabilities of the two child nodes. The new internal node acts as any other node in the forest. Repeat steps 2–3 until a tree is established.

Huffman Coding: Example Consider the string ARBER The probabilities of symbols A, B, E, and R are: The initial forest will thus comprise four nodes Now, we apply the Huffman algorithm Symbol A B E R Frequency 1 2 Probability 20% 40%

Generating Huffman Codes 0.2 B 0.2 E 0.2 R 0.4 1 2 0.6 1 1 0.4 1

Generating Huffman Codes B E R 1 2 R 0.4 0.6 Symbol Code 1 1 E 0.2 0.4 1 A 0.2 B 0.2

Generating Huffman Codes B E R 1 2 R 0.4 0.6 Symbol Code 1 A 0 0 0 1 E 0.2 0.4 1 A 0.2 B 0.2

Generating Huffman Codes 1 2 R 0.4 0.6 Symbol Code 1 A 0 0 0 1 E 0.2 B 0 0 1 0.4 1 A 0.2 B 0.2

Generating Huffman Codes 1 2 R 0.4 0.6 Symbol Code 1 A 0 0 0 1 E 0.2 B 0 0 1 0.4 E 0 1 1 A 0.2 B 0.2

Generating Huffman Codes 1 2 R 0.4 0.6 Symbol Code 1 0 0 0 A 1 E 0.2 B 0 0 1 0.4 E 0 1 1 R 1 A 0.2 B 0.2

Huffman Codes: Decoding 1 1 1 1 r 1 2 R 0.4 0.6 1 1 E 0.2 0.4 1 A 0.2 A B 0.2

Huffman Codes: Decoding 1 1 1 1 r A 1 2 R 0.4 R 0.6 1 1 E 0.2 0.4 1 A 0.2 B 0.2

Huffman Codes: Decoding 1 1 1 r A R 1 2 R 0.4 0.6 1 1 E 0.2 0.4 1 A 0.2 B 0.2 B

Huffman Codes: Decoding 1 1 r A R B 1 2 R 0.4 0.6 1 1 E 0.2 E 0.4 1 A 0.2 B 0.2

Huffman Codes: Decoding 1 r A R B E 1 2 R 0.4 R 0.6 1 1 E 0.2 0.4 1 A 0.2 B 0.2

Huffman Codes: Decoding 1 1 1 1 r A R B E R 1 2 R 0.4 0.6 The prefix property ensures unique decodability 1 1 E 0.2 0.4 1 A 0.2 B 0.2

Arithmetic Coding Entropy coder for lossless compression Encodes the entire input data using a real interval Slightly more efficient than Huffman Coding Implementation is harder: practical implementation variations have been proposed

Arithmetic Coding: Algorithm Create an interval for each symbol, based on cumulative probabilities. The interval for a symbol is [low, high). Given an input string, determine the interval of the first symbol Scale the remaining intervals: New Low = Current Low + Sumn-1(p)*(H – L) New High = Current High + Sumn(p)*(H – L)

Arithmetic Coding: Example Consider the string ARBER The intervals of symbols A, B, E, and R are: A: [0, 0.2); B: [0.2, 0.4); E: [0.4, 0.6); and R: [0.6, 1); Symbol A B E R Low 0.2 0.4 0.6 High 1

Arithmetic Coding: Example B E R 0.12 A 20% of (0, 0.2) 20% of (0.12, 0.2) 0.2 0.04 0.136 B 20% of (0, 0.2) 20% of (0.12, 0.2) 0.4 0.08 0.152 20% of (0, 0.2) E 20% of (0.12, 0.2) 0.6 0.12 0.168 R 40% of (0, 0.2) 40% of (0.12, 0.2) 0.2 1 0.2

Arithmetic Coding: Example B E R 0.12 0.136 A 20% of (0.136, 0.152) 0.2 0.04 0.136 0.1392 B 20% of (0.136, 0.152) 0.4 0.08 0.152 0.1424 E 20% of (0.136, 0.152) 0.6 0.12 0.168 0.1456 R 40% of (0.136, 0.152) 0.2 1 0.2 0.152

Arithmetic Coding: Example 0.12 0.136 0.1424 A 20% of (0.1424, 0.1456) 0.2 0.04 0.136 0.1392 0.14304 B 20% of (0.1424, 0.1456) 0.4 0.08 0.152 0.1424 0.14368 E 20% of (0.1424, 0.1456) 0.6 0.12 0.168 0.1456 0.14432 R 40% of (0.1424, 0.1456) 0.2 1 0.2 0.152 0.1456

Arithmetic Coding: Example 0.12 0.136 0.1424 0.14432 A 0.2 0.04 0.136 0.1392 0.14304 B 0.4 0.08 0.152 0.1424 0.14368 E 0.6 0.12 0.168 0.1456 0.14432 R 0.2 0.1456 1 0.2 0.152 0.1456

Arithmetic Coding: Example The final interval for the input string ARBER is [0.14432, 0.1456). In bits, one chooses a number in the interval and encodes the decimal part. For the sample interval, one may choose point 0.14432, which in binary is: 0 0 1 0 0 1 0 0 1 1 1 1 0 0 1 0 0 0 1 0 0 1 1 1 1 1 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 0 1 1 1 1 (51 bits)

Arithmetic Coding Practical implementations involve absolute frequencies (integers), since the low and high interval values tend to become really small. An END-OF-STREAM flag is usually required (with a very small probability) Decoding is straightforward: Start with the last interval and divide intervals proportionally to symbol probabilities. Proceed until and END-OF-STREAM control sequence is reached.

JBIG-2 Lossless and lossy bi-level data compression standard Emerged from JBIG-1 Joint Bi-Level Image Experts Group Supports three coding modes: Generic Halftone Text Image is segmented into regions, which can be encoded using different methods

JBIG-2: Segmentation The image on the left is segmented into a binary image, text, and a grayscale image: binary text grayscale

JBIG-2: Encoding Arithmetic Coding (QM Coder) Context-based prediction Larger contexts than JBIG-1 Progressive Compression (Display) Predictive context uses previous information Adaptive Coder A A X = Pixel to be coded A = Adaptive pixel (which can be moved) A A X

JBIG-2: Halftone and Text Halftone images are coded as multi-level images, along with pattern and grid parameters Each text symbol is encoded in a dictionary along with relative coordinates:

Color Separation Images comprising discrete colors can be considered as multi-layered binary images: Each color and the image background form one binary layer If there are N colors, where one color represents the image background, then there will be N-1 binary layers: A map with white background and four colors will thus yield 4 binary layers

Color Separation: Example The following Excel graph comprises 34 colors + the white background:

Layer 1

Layer 5

Layer 12

Comparison with JBIG2 and JPEG Our Method: 96% JBIG2: 94% JPEG: 91% Our Method: 98% JBIG2: 97% JPEG: 92%

Encoding Example Original size: 64 * 3 = 192 bits Codebook RCRC Uncompressible The compression ratio is the size of the encoded stream over the original size: 1 – (1 + 20 + 64) / 192 = 56%

Definitions (cont.) Compression ratio is defined as the number of bits after a coding scheme has been applied on the source data over the original source data size Expressed as a percentage, or usually is bits per pixel (bpp) when source data is an image JBIG-2 is the standard binary image compression scheme Based mainly on arithmetic coding with context modeling Other methods in the literature designed for specific classes of binary images Our objective: design a coding method notwithstanding the nature of a binary image