SPEECH CODING Maryam Zebarjad Alessandro Chiumento
SPEECH PROPERTIES 2 categories: Voiced and Unvoiced Voiced: quasi-periodic in the time domain and harmonically structured in the frequency domain Unvoiced: random-like and broadband (like white noise) Why speech coding? Efficient transmission Efficient storage Problems: High quality with the lowest bit-rate possible
Performance measures 2 ways of measuring: Objective SNR, long term SEGSNR, short term Subjective DRT Diagnostic Rhyme Test DAM Diagnostic Acceptability Measure MOS Mean Opinion Score 4 standards for speech quality: Broadcast, Network, Communications, Synthetic
Coding Techniques: WAVEFORM CODERS digitalize speech on a sample-by-sample basis. The goal is to have the output waveform closely match the input waveform. Scalar and vector quantization Sub-band coders Transform coders SINUSOIDAL ANALYSIS-SYNTHESIS They relay on the sinusoidal representation of the speech waveform Short - Time Fourier Transform models Sinusoidal Transform Coding Multiband Excitation Coder VOCODERS Speech – specific coders Formant Vocoders Channel Vocoders LPC Vocoders
Scalar and Vector Quantization SQ: every sample is mapped into a specific code Examples : PCM, DPCM, DM, ADPCM....
Scalar and Vector Quantization VQ: the data (speech) is compressed by encoding it in blocks. The incoming vectors are formed from consecutive data samples or from model parameters. Examples: VPCM, GS-VQ, A-VQ...
Sub-band Coders Unlike SQ and VQ this coders rely more on frequency- domain properties of speech. the signal band is divided into frequency sub-bands using a bank of bandpass filters. The output of each filter is then sampled (or down-sampled) and encoded. Example: AT&T, CCITT (G.722),...
Transform Coders Work on spectral properties of speech (like SBC) They use unitary transforms whose parameters are quantized at the transmitter and decoded and inverse-transformed at the receiver The potential for bit-rate reduction in transform coding lies in the fact that unitary transforms tend to generate near- uncorrelated transform components which can be coded independently Although there are many possible transforms that can be used (DCT, DFT, WHT, KLT,…) all share the property of unitarity:
Example: Adaptive Transformation Coder It employs DCT and has high performance
Speech Coding Using Sinusoidal Analysis – Synthesis Models This speech coders relay on the sinusoidal representation of the speech waveform Speech Analysis-Synthesis Using the Short-Time Fourier Transform Speech is slowly time-varying (quasi-stationary) and can be modeled by its short time spectrum Analysis expressionSynthesis expression h(n) is the sliding analysis window and is often constrained to be about 5 – 20 ms
Speech Coding Using Sinusoidal Analysis – Synthesis Models Speech Analysis-Synthesis Using the Sinusoidal Transform Coding The speech is represented by linear combination of sinusoids with time-varying amplitudes, phases and frequencies: McAulay - Quartieri The number of sinusoids L is time-varying, the possibility to reduce bit-rate comes from the fact that voiced speech is highly periodic and L can be adjusted accordingly. Furthermore the statistical properties of the Short-Time spectrum of unvoiced speech are preserved.
Vocoders Speech specific Low bit rate but performance degrades for non speech signals 4 types: Channel, Formant, Homomorphic, LPC LPC Vocoders are divided in 3 categories based in excitation models: 2-state excitation Mixed excitation residual
LPC Vocoder For a p-th order forward linear prediction the present sample if predicted from linear compination of p past samples The prediction parameters are obtained by minimizing the mean square forward prediction error where For forward estimation:
The system can be solved using the recursion: Levinson – Durbin
Wokplan Implementation of: LPC Vocoder DCT Transform Coder DPCM Coder Comparison of three methods for specific speech signals