ELG5126 Source Coding and Data Compression Eric Dubois
Context: signal transmission binary stream Information Source Encoder Channel Information Receiver Decoder signal binary stream
Context: signal transmission binary stream Information Source Encoder Channel aka ‘data’ Information Receiver Decoder signal binary stream
Context: signal transmission binary stream Information Source Encoder Channel error measure aka ‘data’ Information Receiver Decoder signal binary stream
Examples of information sources Speech Image Video Text file Music Radiograph Binary executable computer program Computer graphics primitives Weather radar map
Examples of channels Airwaves (EM radiation) Cable Telephone line Hard disk CD, DVD Flash memory device Optical path Internet
Examples of information receivers TV screen and viewer Audio system and listener Computer file Image printer and viewer Compute engine
Possible error measures No errors permitted (lossless coding) Numerical measures of error, e.g. mean- squared error (MSE), signal-to-noise ratio (SNR) Numerical measures of perceptual difference Mean opinion scores from human users
Measures of ‘channel rate’ Data rate (bits per second) Transmission time (seconds) File size (bytes) Average number of bits per source symbol
What is compression? There is usually a ‘natural’ representation for the source data at a given level of fidelity and sampling rate. Examples: 8 bits per character in ASCII data 24 bits per RGB color pixel 16 bits audio signal sample This natural representation leads to a certain raw channel rate (which is generally too high). Compression involves reducing the channel rate for a given level of distortion (which may be zero for lossless coding).
Compression ratio Example: HDTV, 1080I Raw channel rate: 1493 Mbit/s (1920*1080*30*24) Compressed channel rate: ~20 Mbit/s Compression ratio: ~75
Sources Categories of sources continuous time or domain: x(t), x(h,v) discrete time or domain: x[n], x[m,n] continuous amplitude or value: xR discrete amplitude or value: x A = {a1, a2, … aM} We will only consider discrete domain sources. We assume that continuous domain signals can be sampled with negligible loss. This is not considered in this course. We will mainly concentrate on one-dimensional signals such as text, speech, audio, etc. Extensions to images are covered in ELG5378. A source signal is a sequence of values drawn from a source alphabet A: x[1], x[2], … , x[n] A
Source Coder A source coder transforms a source sequence into a coded sequence whose values are drawn from a code alphabet G : u[1], u[2], …, u[i] G Normally G = {0,1}, and we will limit ourselves to this case. Note that the time indexes for the source sequence x[n] and the coded sequence u[i] do not correspond. The decoder must estimate the source signal on the basis of the received coded sequence û[i]. This may be different from u[i] if there are transmission errors. We will generally assume that there are no transmission errors.
Categories of compression coders Lossless coding: The source sequence has discrete values, and these must be reproduced without error. Examples where this is required is text, data, executables, and some quantized signals such as X-rays. Lossy coding: The source sequence may be either continuous or discrete valued. There exists a distortion criterion. The decoded sequence may be mathematically different from the source sequence, but the distortion should be kept sufficiently small. Examples are speech and images. Often a perceptual distortion criterion is desired. Lossless coding methods are often a component of a lossy coding system.
The compression problem There are two variants of the compression problem For a given source and distortion measure, minimize the channel rate for a given level of distortion D0 (which can be zero). For a given source and distortion measure, minimize the distortion (or maximize the quality) for a given channel rate R0.
Rate versus distortion performance In a coding system, there is typically a tradeoff between rate and distortion R D
Rate versus distortion performance In a coding system, there is typically a tradeoff between rate and distortion R D D0
Rate versus distortion performance In a coding system, there is typically a tradeoff between rate and distortion R R0 D
When can we compress? When there is statistical redundancy. For example, for a sequence of outcomes of a fair 16-sided die, we need 4 bits to represent each outcome. No compression is possible. In English text, some letters occur far more often than others. We can assign shorter codes to the common ones and longer codes to the uncommon ones and achieve compression (e.g., Morse code).
Statistical redundancy There are many types of statistical redundancy. For example, in English text, we are pretty sure that the next letter after a Q will be a U, so we can exploit it. The key to successful compression will be to formulate models that capture the statistical redundancy in the source.
When can we compress? (2) When there is irrelevancy. In many cases, the data is specified more precisely than it needs to be for the intended purpose. The data may be oversampled, or quantized more finely than it needs to be, either everywhere, or in some parts of the signal. This particularly applies to data meant only for consumption and not further processing.
Exploiting irrelevancy To exploit irrelevancy, we need a good model of the requirements of the receiver, e.g., human vision, hearing, etc. We also need a suitable representation of the data, e.g., transform or wavelet representations. Again, the key to success will be the formulation of appropriate models.
The elements of a source coder Change of representation Quantization (not for lossless coding) Binary code assignment All will depend on good models of the source and the receiver.
The Course Outline
Professor Eric Dubois CBY A-512 Tel: 562-5800 X 6400 edubois@uottawa.ca www.eecs.uottawa.ca/~edubois/courses/ELG5126
Textbook Textbook: K. Sayood, Introduction to Data Compression, third edition, Morgan Kaufmann Publishers, 2006.
Prerequisite Basic probability and signal processing as typically obtained in an undergraduate Electrical Engineering program (e.g., at uOttawa, ELG3125 Signal and System Analysis, ELG3126 Random Signals and Systems
Objective The objective of this course is to present the fundamental principles underlying data and waveform compression. The course begins with the study of lossless compression of discrete sources. These techniques are applicable to compression of text, data, programs and any other type of information where no loss is tolerable. They also form an integral part of schemes for lossy compression of waveforms such as audio and video signals, which is the topic of the second part of the course.
Objective The main goal of the course is to provide an understanding of the basic techniques and theories underlying popular compression systems and standards such as ZIP, FAX, MP3, JPEG, MPEG and so on, as well as the principles underlying future systems. Some of the applications will be addressed in student projects.
Course Outline Lossless coding: Discrete sources, binary codes, entropy, Huffman and related codes, Markov models, adaptive coding. Arithmetic coding: Principles, coding and decoding techniques, implementation issues. Dictionary techniques: Principles, static dictionary, adaptive dictionary. Waveform coding: Distortion measures, rate- distortion theory and bounds, models.
Course Outline (2) Quantization: Formulation, performance, uniform and non-uniform quantizers, quantizer optimization, vector quantization. Predictive coding: Prediction theory, differential coding (DPCM), adaptive coding. Transform and subband coding: Change of basis, block transforms and filter banks, bit allocation and quantization. Applications (student projects)
Grading 20% Assignments: Several assignments, to be handed in during class on the due-date specified. There will be a 5% penalty for each day late, and no assignment will be accepted after one week. 30% Project: An individual project on an application of data compression involving some experimental work. A project report and presentation at the end of the course will be required. More details will follow early in the course. 20% Midterm exam: Closed-book exam, 80 minutes in length. 30% Final exam: Closed-book exam, 3 hours in length, covering the whole course.
Enjoy the course