Concepts of Multimedia Processing and Transmission IT 481, Lecture #4 Dennis McCaughey, Ph.D. 25 September, 2006.

Slides:



Advertisements
Similar presentations
Low-Complexity Transform and Quantization in H.264/AVC
Advertisements

Speech Coding Techniques
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
Time-Frequency Analysis Analyzing sounds as a sequence of frames
Digital Kommunikationselektronik TNE027 Lecture 5 1 Fourier Transforms Discrete Fourier Transform (DFT) Algorithms Fast Fourier Transform (FFT) Algorithms.
The Discrete Fourier Transform. The spectrum of a sampled function is given by where –  or 0 .
Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
CELLULAR COMMUNICATIONS 5. Speech Coding. Low Bit-rate Voice Coding  Voice is an analogue signal  Needed to be transformed in a digital form (bits)
Speech Coding Nicola Orio Dipartimento di Ingegneria dell’Informazione IV Scuola estiva AISV, 8-12 settembre 2008.
Speech & Audio Processing
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.
Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
Warped Linear Prediction Concept: Warp the spectrum to emulate human perception; then perform linear prediction on the result Approaches to warp the spectrum:
Unit 7 Fourier, DFT, and FFT 1. Time and Frequency Representation The most common representation of signals and waveforms is in the time domain Most signal.
1 Audio Compression Multimedia Systems (Module 4 Lesson 4) Summary: r Simple Audio Compression: m Lossy: Prediction based r Psychoacoustic Model r MPEG.
A Full Frequency Masking Vocoder for Legal Eavesdropping Conversation Recording R. F. B. Sotero Filho, H. M. de Oliveira (qPGOM), R. Campello de Souza.
Fast Fourier Transforms
Representing Acoustic Information
EE513 Audio Signals and Systems Digital Signal Processing (Systems) Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Discrete-Time and System (A Review)
DTFT And Fourier Transform
LECTURE Copyright  1998, Texas Instruments Incorporated All Rights Reserved Encoding of Waveforms Encoding of Waveforms to Compress Information.
AUDIO COMPRESSION msccomputerscience.com. The process of digitizing audio signals is called PCM PCM involves sampling audio signal at minimum rate which.
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
Speech Coding Using LPC. What is Speech Coding  Speech coding is the procedure of transforming speech signal into more compact form for Transmission.
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
Speech Coding Submitted To: Dr. Mohab Mangoud Submitted By: Nidal Ismail.
SPEECH CODING Maryam Zebarjad Alessandro Chiumento.
1 Linear Prediction. 2 Linear Prediction (Introduction) : The object of linear prediction is to estimate the output sequence from a linear combination.
1 Linear Prediction. Outline Windowing LPC Introduction to Vocoders Excitation modeling  Pitch Detection.
Basics of Neural Networks Neural Network Topologies.
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Zhongguo Liu_Biomedical Engineering_Shandong Univ. Chapter 8 The Discrete Fourier Transform Zhongguo Liu Biomedical Engineering School of Control.
Submitted By: Santosh Kumar Yadav (111432) M.E. Modular(2011) Under the Supervision of: Mrs. Shano Solanki Assistant Professor, C.S.E NITTTR, Chandigarh.
Linear Predictive Analysis 主講人:虞台文. Contents Introduction Basic Principles of Linear Predictive Analysis The Autocorrelation Method The Covariance Method.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
7- 1 Chapter 7: Fourier Analysis Fourier analysis = Series + Transform ◎ Fourier Series -- A periodic (T) function f(x) can be written as the sum of sines.
More On Linear Predictive Analysis
SPEECH CODING Maryam Zebarjad Alessandro Chiumento Supervisor : Sylwester Szczpaniak.
Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)
Voice Sampling. Sampling Rate Nyquist’s theorem states that a signal can be reconstructed if it is sampled at twice the maximum frequency of the signal.
Chapter 8 Lossy Compression Algorithms. Fundamentals of Multimedia, Chapter Introduction Lossless compression algorithms do not deliver compression.
Fast Fourier Transforms. 2 Discrete Fourier Transform The DFT pair was given as Baseline for computational complexity: –Each DFT coefficient requires.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
CELP / FS-1016 – 4.8kbps Federal Standard in Voice Coding
The Discrete Fourier Transform
Chapter 2. Signals and Linear Systems
1 Speech Compression (after first coding) By Allam Mousa Department of Telecommunication Engineering An Najah University SP_3_Compression.
بسم الله الرحمن الرحيم Digital Signal Processing Lecture 14 FFT-Radix-2 Decimation in Frequency And Radix -4 Algorithm University of Khartoum Department.
The content of lecture This lecture will cover: Fourier Transform
Vocoders.
Linear Prediction.
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
Linear Predictive Coding Methods
Chapter 8 The Discrete Fourier Transform
Vocoders.
LECTURE 18: FAST FOURIER TRANSFORM
Linear Prediction.
Chapter 8 The Discrete Fourier Transform
Govt. Polytechnic Dhangar(Fatehabad)
Chapter 8 The Discrete Fourier Transform
Speech Processing Final Project
LECTURE 18: FAST FOURIER TRANSFORM
Presentation transcript:

Concepts of Multimedia Processing and Transmission IT 481, Lecture #4 Dennis McCaughey, Ph.D. 25 September, 2006

08/28/2006 IT 481, Fall Introduction to Linear Systems The Modified Discrete Cosine Transform (MDCT) was introduced in the lecture on MP3 encoding How does it relate to the Discrete Cosine Transform (DCT) and why are we concerned? The DCT and DCT are important enablers in data compression of both audio and video. The DCT is a special case of the Discrete Fourier Transform (DFT), a key component in digital signal processing The Fast Fourier Transform (FFT) is a computationally efficient form of the DFT

08/28/2006 IT 481, Fall Linear System Definition

08/28/2006 IT 481, Fall Linear System Response to a Series of Sampled data Inputs

08/28/2006 IT 481, Fall Linear System Input/Output This is denoted as the convolution of f(t) and h(t)

08/28/2006 IT 481, Fall Fourier Transform - Non-periodic Signal Let g(t) be a continuous & non-periodic function of t The Fourier Transform of g(t) is –Where  = 2  f is the radial frequency in unit of radian/sec, and f the frequency in unit of Hz The Inverse Fourier Transform is

08/28/2006 IT 481, Fall Fourier Transform Example

08/28/2006 IT 481, Fall Relationship Between the Fourier Transform and Convolution

08/28/2006 IT 481, Fall A Very Important Property

08/28/2006 IT 481, Fall Convolution Sum Example n g = n f + n h -1 f(k) = h(k) =0 for k >2

08/28/2006 IT 481, Fall Integer Arithmetic Example Multiplication of 2 Integers is a form of discrete convolution

08/28/2006 IT 481, Fall Discrete Convolution in Matrix Form

08/28/2006 IT 481, Fall Enter the Discrete Fourier Transform

08/28/2006 IT 481, Fall Discrete Fourier Transform (DFT) A discrete-time version of the Fourier Transform that can be implemented in digital domain Given an N-point time-sampled sequence {x 0, x 1,…, x N-1 }, the DFT is described by a transform pair with complexity O(N 2 ) Furthermore,

08/28/2006 IT 481, Fall Fast Fourier Transform (FFT) FFT is a computationally efficient algorithm O(Nlog 2 N). Recall DFT transform Let It can be shown that Where G n and H n are two half-sized DFTs of even and odd terms

08/28/2006 IT 481, Fall The FFT Efficient Implementation Each half-size DFT can in turn be divided into a pair of quarter-size DFTs. End result is a partition and reordering of time domain inputs using what is known as bit-reverse addressing –Each stage of the DFT consists of N complex multiply-accumulates in a straight forward implementation –Further simplification from eight to six real operations by the “butterfly” –Further simplification when time-domain sequence is real

08/28/2006 IT 481, Fall The FFT Structure

08/28/2006 IT 481, Fall The Discrete Cosine Transform (DCT) DCT is a Fourier-related transform similar to the discrete Fourier transform (DFT), but using only real numbers It is equivalent to a DFT of roughly twice the length, operating on real data with even symmetry (since the Fourier transform of a real and even function is real and even)

08/28/2006 IT 481, Fall The Modified Discrete Cosine Transform (MDCT) The MDCT is 50% overlapped making it very useful for quantization as it effectively removes the otherwise easily detectable blocking artifact between blocks

08/28/2006 IT 481, Fall In Matrix Notation(2 Length-8 Blocks)

08/28/2006 IT 481, Fall Fourier Transform Summary Physical Interpretation –Describes the frequency content of a real-world signal –For real-world signals, frequency content tails off as frequencies get higher Mathematical Interpretation –Convolution in time domain becomes multiplication in frequency domain –A matrix that diagonalizes a circulant convolution matrix –DCT is a special case of the DFT

08/28/2006 IT 481, Fall Adaptive Transform Coding (ATC) Another frequency domain technique for bit rate rage of 9.6 – 20 Kbps and involves block transformation of windowed input segment of speech waveform Each segment is represented by a set of transformed coefficients which are quantized and transmitted in lieu of the signal itself At receiver, quantized coefficients are inverse- transformed to get back to original waveform The most attractive and frequently used transformed is the Discrete Cosine Transform (DCT) and corresponding Inverse Discrete Cosine Transform (IDCT)

08/28/2006 IT 481, Fall ATC Practicality Bit allocation among different coefficients are varied adaptively from frame to frame while keeping the total number of bits constant Time-varying statistics controls the bit allocation procedure and has to be transmitted as side information (an overhead of about 2 Kbps) Side information is also used to determine the step size of various coefficient quantizers In practice, the DCT and IDCT are not directly evaluated using the formulation here but rather by computationally efficient algorithm such as the FFT

08/28/2006 IT 481, Fall Source Coding - Vocoders A class of speech coding system that analyze the voice signal at the transmitter, derive the parameters and transmit them to the receiver at which voice is synthesized using these parameters All vocoders attempt to model the speech generation process by a dynamic system and quantify the physical parameters of the system In general much more complex than waveform coders and achieve very high economy in transmission bit rate They tend to be less robust and performance are very much speaker-dependent

08/28/2006 IT 481, Fall Channel Vocoder The first among many analysis-synthesis systems that was demonstrated Frequency domain vocoder that determine the envelope of the speech signal for a number of frequency bands and then sample, encode and multiplex these samples with the encoded outputs of the other filters The sampling is done synchronously every 10 ms to 30 ms Along with energy information about each band, the voiced/unvoiced decision, the pitch frequency for voiced speech are also transmitted

08/28/2006 IT 481, Fall Cepstrum Vocoder The cepstrum vocoder separates the excitation and vocal tract spectrum by the Inverse Fourier transform of the log magnitude spectrum of the signal –The low frequency coefficients in the cepstrum correspond to the vocal tract spectral envelope –High frequency excitation coefficients form periodic pulse train at multiples of the sampling period At the receiver, the vocal tract cepstral coefficients are Fourier transformed to produce the vocal impulse response By convolving the impulse response with a synthetic excitation signal, the original speech is reconstructed

08/28/2006 IT 481, Fall Linear Predictive Coders (LPC) The time-domain LPC extracts the significant features of speech from its waveform. Computationally intensive but by far the most popular among the class of low bit rate vocoders. It’s possible to transmit good quality voice at 4.8 Kbps and poorer quality voice at lower rates LPC models the vocal tract as an all-pole digital filter, and uses a weighted sum of past p samples to estimate the present sample (10  p  15), with e n being the prediction error

08/28/2006 IT 481, Fall LPC Coefficients The LPC coefficients a n are found by solving the system of equations Where C mk are the correlation coefficients computed from the m-th and k-th lags of s n A matrix inversion is needed hence high computational load The reflection coefficient, a related set of coefficients are transmitted in practice

08/28/2006 IT 481, Fall LPC Transmitted Parameters Reflection coefficients can be adequately represented by 6 bits For q = 10 predictor, needs 72 bits per frame –60 bits for coefficients –5 bits for a gain parameter and 6 bits for a pitch period If parameters are estimated every 15 – 20 msec –Resulting bit rate has a range of 2400 – 4800 bps Additional saving can be achieved via a non-linear transformation of the coefficients prior to coding to reduce sensitivity to quantization error

08/28/2006 IT 481, Fall LPC Receiver Processing At the receiver, the coefficients are used for a synthesis filter. Various LPC methods differ based on how the synthesis filter is excited –Multi-pulse Excited LPC: typically 8 pulses with proper positions are used as excitation –Code-Excited LPC (CELP): transmitter searches its code book for a stochastic excitation to the LPC filter that gives the best perceptual match to the sound. The index to the code book is then transmitted CELP coders are extremely complex and can require more than 500 MIPS However, high quality is achieved when excitation is code at 0.25 bits/sec and transmission bit rate as low as 4.8 Kbps

08/28/2006 IT 481, Fall Various LPC Vocoders

08/28/2006 IT 481, Fall ITU-T Speech Coding Standards

08/28/2006 IT 481, Fall Speech Coder Performance Objective measure: how well the reconstructed speech signal quantitatively approximates original version? Mean Square Error (MSE) distortion Frequency weighted MSE Segmented Signal to Noise Ration (SNR) Subjective measure: conducted by playing the sample to a number of listeners to judge the quality of the speech –Overall quality, listening efforts, intelligibility, naturalness Diagnostic Rhyme Test (DRT): most popular for intelligibility Diagnostic Acceptability Measure (DAM) evaluates acceptability of speech coding system Mean Opinion Score (MOS) the most popular ranking system

08/28/2006 IT 481, Fall Mean Opinion Score (MOS) Most popular ranking system

08/28/2006 IT 481, Fall MOS for Speech Coders