Concepts of Multimedia Processing and Transmission IT 481, Lecture #4 Dennis McCaughey, Ph.D. 25 September, 2006
08/28/2006 IT 481, Fall Introduction to Linear Systems The Modified Discrete Cosine Transform (MDCT) was introduced in the lecture on MP3 encoding How does it relate to the Discrete Cosine Transform (DCT) and why are we concerned? The DCT and DCT are important enablers in data compression of both audio and video. The DCT is a special case of the Discrete Fourier Transform (DFT), a key component in digital signal processing The Fast Fourier Transform (FFT) is a computationally efficient form of the DFT
08/28/2006 IT 481, Fall Linear System Definition
08/28/2006 IT 481, Fall Linear System Response to a Series of Sampled data Inputs
08/28/2006 IT 481, Fall Linear System Input/Output This is denoted as the convolution of f(t) and h(t)
08/28/2006 IT 481, Fall Fourier Transform - Non-periodic Signal Let g(t) be a continuous & non-periodic function of t The Fourier Transform of g(t) is –Where = 2 f is the radial frequency in unit of radian/sec, and f the frequency in unit of Hz The Inverse Fourier Transform is
08/28/2006 IT 481, Fall Fourier Transform Example
08/28/2006 IT 481, Fall Relationship Between the Fourier Transform and Convolution
08/28/2006 IT 481, Fall A Very Important Property
08/28/2006 IT 481, Fall Convolution Sum Example n g = n f + n h -1 f(k) = h(k) =0 for k >2
08/28/2006 IT 481, Fall Integer Arithmetic Example Multiplication of 2 Integers is a form of discrete convolution
08/28/2006 IT 481, Fall Discrete Convolution in Matrix Form
08/28/2006 IT 481, Fall Enter the Discrete Fourier Transform
08/28/2006 IT 481, Fall Discrete Fourier Transform (DFT) A discrete-time version of the Fourier Transform that can be implemented in digital domain Given an N-point time-sampled sequence {x 0, x 1,…, x N-1 }, the DFT is described by a transform pair with complexity O(N 2 ) Furthermore,
08/28/2006 IT 481, Fall Fast Fourier Transform (FFT) FFT is a computationally efficient algorithm O(Nlog 2 N). Recall DFT transform Let It can be shown that Where G n and H n are two half-sized DFTs of even and odd terms
08/28/2006 IT 481, Fall The FFT Efficient Implementation Each half-size DFT can in turn be divided into a pair of quarter-size DFTs. End result is a partition and reordering of time domain inputs using what is known as bit-reverse addressing –Each stage of the DFT consists of N complex multiply-accumulates in a straight forward implementation –Further simplification from eight to six real operations by the “butterfly” –Further simplification when time-domain sequence is real
08/28/2006 IT 481, Fall The FFT Structure
08/28/2006 IT 481, Fall The Discrete Cosine Transform (DCT) DCT is a Fourier-related transform similar to the discrete Fourier transform (DFT), but using only real numbers It is equivalent to a DFT of roughly twice the length, operating on real data with even symmetry (since the Fourier transform of a real and even function is real and even)
08/28/2006 IT 481, Fall The Modified Discrete Cosine Transform (MDCT) The MDCT is 50% overlapped making it very useful for quantization as it effectively removes the otherwise easily detectable blocking artifact between blocks
08/28/2006 IT 481, Fall In Matrix Notation(2 Length-8 Blocks)
08/28/2006 IT 481, Fall Fourier Transform Summary Physical Interpretation –Describes the frequency content of a real-world signal –For real-world signals, frequency content tails off as frequencies get higher Mathematical Interpretation –Convolution in time domain becomes multiplication in frequency domain –A matrix that diagonalizes a circulant convolution matrix –DCT is a special case of the DFT
08/28/2006 IT 481, Fall Adaptive Transform Coding (ATC) Another frequency domain technique for bit rate rage of 9.6 – 20 Kbps and involves block transformation of windowed input segment of speech waveform Each segment is represented by a set of transformed coefficients which are quantized and transmitted in lieu of the signal itself At receiver, quantized coefficients are inverse- transformed to get back to original waveform The most attractive and frequently used transformed is the Discrete Cosine Transform (DCT) and corresponding Inverse Discrete Cosine Transform (IDCT)
08/28/2006 IT 481, Fall ATC Practicality Bit allocation among different coefficients are varied adaptively from frame to frame while keeping the total number of bits constant Time-varying statistics controls the bit allocation procedure and has to be transmitted as side information (an overhead of about 2 Kbps) Side information is also used to determine the step size of various coefficient quantizers In practice, the DCT and IDCT are not directly evaluated using the formulation here but rather by computationally efficient algorithm such as the FFT
08/28/2006 IT 481, Fall Source Coding - Vocoders A class of speech coding system that analyze the voice signal at the transmitter, derive the parameters and transmit them to the receiver at which voice is synthesized using these parameters All vocoders attempt to model the speech generation process by a dynamic system and quantify the physical parameters of the system In general much more complex than waveform coders and achieve very high economy in transmission bit rate They tend to be less robust and performance are very much speaker-dependent
08/28/2006 IT 481, Fall Channel Vocoder The first among many analysis-synthesis systems that was demonstrated Frequency domain vocoder that determine the envelope of the speech signal for a number of frequency bands and then sample, encode and multiplex these samples with the encoded outputs of the other filters The sampling is done synchronously every 10 ms to 30 ms Along with energy information about each band, the voiced/unvoiced decision, the pitch frequency for voiced speech are also transmitted
08/28/2006 IT 481, Fall Cepstrum Vocoder The cepstrum vocoder separates the excitation and vocal tract spectrum by the Inverse Fourier transform of the log magnitude spectrum of the signal –The low frequency coefficients in the cepstrum correspond to the vocal tract spectral envelope –High frequency excitation coefficients form periodic pulse train at multiples of the sampling period At the receiver, the vocal tract cepstral coefficients are Fourier transformed to produce the vocal impulse response By convolving the impulse response with a synthetic excitation signal, the original speech is reconstructed
08/28/2006 IT 481, Fall Linear Predictive Coders (LPC) The time-domain LPC extracts the significant features of speech from its waveform. Computationally intensive but by far the most popular among the class of low bit rate vocoders. It’s possible to transmit good quality voice at 4.8 Kbps and poorer quality voice at lower rates LPC models the vocal tract as an all-pole digital filter, and uses a weighted sum of past p samples to estimate the present sample (10 p 15), with e n being the prediction error
08/28/2006 IT 481, Fall LPC Coefficients The LPC coefficients a n are found by solving the system of equations Where C mk are the correlation coefficients computed from the m-th and k-th lags of s n A matrix inversion is needed hence high computational load The reflection coefficient, a related set of coefficients are transmitted in practice
08/28/2006 IT 481, Fall LPC Transmitted Parameters Reflection coefficients can be adequately represented by 6 bits For q = 10 predictor, needs 72 bits per frame –60 bits for coefficients –5 bits for a gain parameter and 6 bits for a pitch period If parameters are estimated every 15 – 20 msec –Resulting bit rate has a range of 2400 – 4800 bps Additional saving can be achieved via a non-linear transformation of the coefficients prior to coding to reduce sensitivity to quantization error
08/28/2006 IT 481, Fall LPC Receiver Processing At the receiver, the coefficients are used for a synthesis filter. Various LPC methods differ based on how the synthesis filter is excited –Multi-pulse Excited LPC: typically 8 pulses with proper positions are used as excitation –Code-Excited LPC (CELP): transmitter searches its code book for a stochastic excitation to the LPC filter that gives the best perceptual match to the sound. The index to the code book is then transmitted CELP coders are extremely complex and can require more than 500 MIPS However, high quality is achieved when excitation is code at 0.25 bits/sec and transmission bit rate as low as 4.8 Kbps
08/28/2006 IT 481, Fall Various LPC Vocoders
08/28/2006 IT 481, Fall ITU-T Speech Coding Standards
08/28/2006 IT 481, Fall Speech Coder Performance Objective measure: how well the reconstructed speech signal quantitatively approximates original version? Mean Square Error (MSE) distortion Frequency weighted MSE Segmented Signal to Noise Ration (SNR) Subjective measure: conducted by playing the sample to a number of listeners to judge the quality of the speech –Overall quality, listening efforts, intelligibility, naturalness Diagnostic Rhyme Test (DRT): most popular for intelligibility Diagnostic Acceptability Measure (DAM) evaluates acceptability of speech coding system Mean Opinion Score (MOS) the most popular ranking system
08/28/2006 IT 481, Fall Mean Opinion Score (MOS) Most popular ranking system
08/28/2006 IT 481, Fall MOS for Speech Coders