Download presentation
Presentation is loading. Please wait.
Published byCecil Reed Modified over 9 years ago
1
Concepts of Multimedia Processing and Transmission IT 481, Lecture #4 Dennis McCaughey, Ph.D. 25 September, 2006
2
08/28/2006 IT 481, Fall 2006 2 Introduction to Linear Systems The Modified Discrete Cosine Transform (MDCT) was introduced in the lecture on MP3 encoding How does it relate to the Discrete Cosine Transform (DCT) and why are we concerned? The DCT and DCT are important enablers in data compression of both audio and video. The DCT is a special case of the Discrete Fourier Transform (DFT), a key component in digital signal processing The Fast Fourier Transform (FFT) is a computationally efficient form of the DFT
3
08/28/2006 IT 481, Fall 2006 3 Linear System Definition
4
08/28/2006 IT 481, Fall 2006 4 Linear System Response to a Series of Sampled data Inputs
5
08/28/2006 IT 481, Fall 2006 5 Linear System Input/Output This is denoted as the convolution of f(t) and h(t)
6
08/28/2006 IT 481, Fall 2006 6 Fourier Transform - Non-periodic Signal Let g(t) be a continuous & non-periodic function of t The Fourier Transform of g(t) is –Where = 2 f is the radial frequency in unit of radian/sec, and f the frequency in unit of Hz The Inverse Fourier Transform is
7
08/28/2006 IT 481, Fall 2006 7 Fourier Transform Example
8
08/28/2006 IT 481, Fall 2006 8 Relationship Between the Fourier Transform and Convolution
9
08/28/2006 IT 481, Fall 2006 9 A Very Important Property
10
08/28/2006 IT 481, Fall 2006 10 Convolution Sum Example n g = n f + n h -1 f(k) = h(k) =0 for k >2
11
08/28/2006 IT 481, Fall 2006 11 Integer Arithmetic Example Multiplication of 2 Integers is a form of discrete convolution
12
08/28/2006 IT 481, Fall 2006 12 Discrete Convolution in Matrix Form
13
08/28/2006 IT 481, Fall 2006 13 Enter the Discrete Fourier Transform
14
08/28/2006 IT 481, Fall 2006 14 Discrete Fourier Transform (DFT) A discrete-time version of the Fourier Transform that can be implemented in digital domain Given an N-point time-sampled sequence {x 0, x 1,…, x N-1 }, the DFT is described by a transform pair with complexity O(N 2 ) Furthermore,
15
08/28/2006 IT 481, Fall 2006 15 Fast Fourier Transform (FFT) FFT is a computationally efficient algorithm O(Nlog 2 N). Recall DFT transform Let It can be shown that Where G n and H n are two half-sized DFTs of even and odd terms
16
08/28/2006 IT 481, Fall 2006 16 The FFT Efficient Implementation Each half-size DFT can in turn be divided into a pair of quarter-size DFTs. End result is a partition and reordering of time domain inputs using what is known as bit-reverse addressing –Each stage of the DFT consists of N complex multiply-accumulates in a straight forward implementation –Further simplification from eight to six real operations by the “butterfly” –Further simplification when time-domain sequence is real
17
08/28/2006 IT 481, Fall 2006 17 The FFT Structure
18
08/28/2006 IT 481, Fall 2006 18 The Discrete Cosine Transform (DCT) DCT is a Fourier-related transform similar to the discrete Fourier transform (DFT), but using only real numbers It is equivalent to a DFT of roughly twice the length, operating on real data with even symmetry (since the Fourier transform of a real and even function is real and even)
19
08/28/2006 IT 481, Fall 2006 19 The Modified Discrete Cosine Transform (MDCT) The MDCT is 50% overlapped making it very useful for quantization as it effectively removes the otherwise easily detectable blocking artifact between blocks
20
08/28/2006 IT 481, Fall 2006 20 In Matrix Notation(2 Length-8 Blocks)
21
08/28/2006 IT 481, Fall 2006 21 Fourier Transform Summary Physical Interpretation –Describes the frequency content of a real-world signal –For real-world signals, frequency content tails off as frequencies get higher Mathematical Interpretation –Convolution in time domain becomes multiplication in frequency domain –A matrix that diagonalizes a circulant convolution matrix –DCT is a special case of the DFT
22
08/28/2006 IT 481, Fall 2006 22 Adaptive Transform Coding (ATC) Another frequency domain technique for bit rate rage of 9.6 – 20 Kbps and involves block transformation of windowed input segment of speech waveform Each segment is represented by a set of transformed coefficients which are quantized and transmitted in lieu of the signal itself At receiver, quantized coefficients are inverse- transformed to get back to original waveform The most attractive and frequently used transformed is the Discrete Cosine Transform (DCT) and corresponding Inverse Discrete Cosine Transform (IDCT)
23
08/28/2006 IT 481, Fall 2006 23 ATC Practicality Bit allocation among different coefficients are varied adaptively from frame to frame while keeping the total number of bits constant Time-varying statistics controls the bit allocation procedure and has to be transmitted as side information (an overhead of about 2 Kbps) Side information is also used to determine the step size of various coefficient quantizers In practice, the DCT and IDCT are not directly evaluated using the formulation here but rather by computationally efficient algorithm such as the FFT
24
08/28/2006 IT 481, Fall 2006 24 Source Coding - Vocoders A class of speech coding system that analyze the voice signal at the transmitter, derive the parameters and transmit them to the receiver at which voice is synthesized using these parameters All vocoders attempt to model the speech generation process by a dynamic system and quantify the physical parameters of the system In general much more complex than waveform coders and achieve very high economy in transmission bit rate They tend to be less robust and performance are very much speaker-dependent
25
08/28/2006 IT 481, Fall 2006 25 Channel Vocoder The first among many analysis-synthesis systems that was demonstrated Frequency domain vocoder that determine the envelope of the speech signal for a number of frequency bands and then sample, encode and multiplex these samples with the encoded outputs of the other filters The sampling is done synchronously every 10 ms to 30 ms Along with energy information about each band, the voiced/unvoiced decision, the pitch frequency for voiced speech are also transmitted
26
08/28/2006 IT 481, Fall 2006 26 Cepstrum Vocoder The cepstrum vocoder separates the excitation and vocal tract spectrum by the Inverse Fourier transform of the log magnitude spectrum of the signal –The low frequency coefficients in the cepstrum correspond to the vocal tract spectral envelope –High frequency excitation coefficients form periodic pulse train at multiples of the sampling period At the receiver, the vocal tract cepstral coefficients are Fourier transformed to produce the vocal impulse response By convolving the impulse response with a synthetic excitation signal, the original speech is reconstructed
27
08/28/2006 IT 481, Fall 2006 27 Linear Predictive Coders (LPC) The time-domain LPC extracts the significant features of speech from its waveform. Computationally intensive but by far the most popular among the class of low bit rate vocoders. It’s possible to transmit good quality voice at 4.8 Kbps and poorer quality voice at lower rates LPC models the vocal tract as an all-pole digital filter, and uses a weighted sum of past p samples to estimate the present sample (10 p 15), with e n being the prediction error
28
08/28/2006 IT 481, Fall 2006 28 LPC Coefficients The LPC coefficients a n are found by solving the system of equations Where C mk are the correlation coefficients computed from the m-th and k-th lags of s n A matrix inversion is needed hence high computational load The reflection coefficient, a related set of coefficients are transmitted in practice
29
08/28/2006 IT 481, Fall 2006 29 LPC Transmitted Parameters Reflection coefficients can be adequately represented by 6 bits For q = 10 predictor, needs 72 bits per frame –60 bits for coefficients –5 bits for a gain parameter and 6 bits for a pitch period If parameters are estimated every 15 – 20 msec –Resulting bit rate has a range of 2400 – 4800 bps Additional saving can be achieved via a non-linear transformation of the coefficients prior to coding to reduce sensitivity to quantization error
30
08/28/2006 IT 481, Fall 2006 30 LPC Receiver Processing At the receiver, the coefficients are used for a synthesis filter. Various LPC methods differ based on how the synthesis filter is excited –Multi-pulse Excited LPC: typically 8 pulses with proper positions are used as excitation –Code-Excited LPC (CELP): transmitter searches its code book for a stochastic excitation to the LPC filter that gives the best perceptual match to the sound. The index to the code book is then transmitted CELP coders are extremely complex and can require more than 500 MIPS However, high quality is achieved when excitation is code at 0.25 bits/sec and transmission bit rate as low as 4.8 Kbps
31
08/28/2006 IT 481, Fall 2006 31 Various LPC Vocoders
32
08/28/2006 IT 481, Fall 2006 32 ITU-T Speech Coding Standards
33
08/28/2006 IT 481, Fall 2006 33 Speech Coder Performance Objective measure: how well the reconstructed speech signal quantitatively approximates original version? Mean Square Error (MSE) distortion Frequency weighted MSE Segmented Signal to Noise Ration (SNR) Subjective measure: conducted by playing the sample to a number of listeners to judge the quality of the speech –Overall quality, listening efforts, intelligibility, naturalness Diagnostic Rhyme Test (DRT): most popular for intelligibility Diagnostic Acceptability Measure (DAM) evaluates acceptability of speech coding system Mean Opinion Score (MOS) the most popular ranking system
34
08/28/2006 IT 481, Fall 2006 34 Mean Opinion Score (MOS) Most popular ranking system
35
08/28/2006 IT 481, Fall 2006 35 MOS for Speech Coders
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.