Download presentation
1
Chapter 23 Mobile Communication Speech Coders
DSP C5000 Chapter 23 Mobile Communication Speech Coders Copyright © 2003 Texas Instruments. All rights reserved.
2
Speech Coding, CELP Coders Implementation using C54x
Outline Speech Coding, CELP Coders Implementation using C54x
3
Outline – Speech Coding
Generalities on speech and coding Linear Prediction based coders Short term and long term prediction Vector Quantization CELP coders Structure and calculations Standards
4
Applications of Speech Coding
Digital Transmissions On wired telephone: Multiplexing Integration of services On wireless channels: Spectral efficiency For better protection against errors Voice mail/messaging Storage: telephone answering machine Secure phone
5
Characteristics of Coders
Bit Rate D: 50 bps < D < 96 kbps Coding Delay ~ frame delay Quality Objective measurements: SNR, PSQM Subjective measurements: MOS (excellent,good,fair,poor,unacceptable) Intelligibility: Objective measure STI or subjective DRT Acceptability: E model of ETSI standard, communicability Immunity to noise Complexity SNR = Signal to Noise Ratio PSQM = Pseudo Subjective Quality Measurement MOS = Mean Opinion Score = Grade between 1 and 5 STI = Speech Transmission Index DRT = Diagnostic Rhyme Test
6
Objective Evaluation of the Quality
The PSQM method: Objective evaluation Based on a model of auditive perception Takes into account the masking effects Good correlation with the MOS grade in « basic » conditions: Low bit rate speech coding, tandem, transmission errors, ... But sometimes not very reliable : Loss of frames, effect of the automatic control Still under development (PSQM+) PSQM = Pseudo Subjective Quality Measurement
7
Subjective Evaluation of Quality using the ACR Method yielding MOS score
A great number of auditors give grades to a great number of speech sequences. Database with phonetically balanced sentences Presentation in random order Naive auditors Statistical processing of results gives the MOS. MOS = Mean Opinion Score ACR = Absolute Category Rating.
8
Speech Production
9
Speech Signal
10
Speech Spectrum for a Voiced Sound
11
Speech Spectrogram Non stationary Voiced / unvoiced
Representation in time and in frequency.
12
Calculation of Spectrograms
Preac = Preaccentuation, enhances high freqeuncies Window = limits the edge effects
13
Example: Time Signal and Spectrogram
Frequency SPECTROGRAM French sentence = « le facteur va porter le courrier » Time
14
Equivalent Electrical Model
15
Simplified Speech Production Model
y(t)=h(t)*e(t) Y(z)= H(z)E(z)
16
All Pole Model of the Spectrum Shaping Filter
The filter H(z) represents the spectral envelope since the excitation has a white spectrum.
17
Short Term Linear Prediction
The coefficients of H(z)=1/A(z) can be obtained by linear prediction. Short term analysis on x(n) speech signal Frames of 10 to 30 ms. Least square error criterion: X(n) is approximated by a linear combination of the past samples x(n-i).
18
Determination of the Spectral Envelope by Linear Prediction
Prediction error e(n) = residual is nearly white, so the spectral envelope of x(n) can be approximated by Sx(f): White noise refers to a random signal, for example, that obtained when a radio tuner is off station.
19
Calculation of the Prediction Coeffcients
The prediction coefficients ai are the solution of the «normal equations»: Reference for the Levinson-durbin algorithm: N. Levinson, « The Wiener RMSS (Root Mean Square » eror criterion in filter design and prediction », J Maths-Phys., 25:261—278, 1945. J. Durbin, « the fitting of time series models », rev. Int. Inst. Statis., 28: , 1960 The Levinson Durbin algorithm is often used to solve these equations
20
Example of Linear Prediction
Amplitude of the speech signal Amplitude of residual signal
21
Example of Linear Prediction: Spectral Envelope Estimation
Formants The trequencies of the maxima of the power spectral density are called formants. They correspond to resonance of the vocal tract.
22
Estimation of the Pitch Period
Pitch Period T0 estimated by correlation of the speech signal or residual. Other methods exist (e.g. cepstrum) F0 = fundamental frequency = 1/T0 Fractional pitch estimation if the precision is better than the sampling period.
23
Long Term Prediction (LTP)
The idea is to predict one period of signal from the preceding one: 2 unknowns: b and M. M is the pitch period (when voiced). Least square error criterion is used.
24
Long Term Prediction (LTP)
For a given value of M, optimal b is: The best M value maximizes: All possible values of M must be tested.
25
Example of Long Term Prediction
2 curves are on the figure. The blue one represents the prediction error when using a short term predictor only. We can observe large nearly periodical values of the error due to the fact that short term prediction cannot predict the nex pitch periods. The green curve represents the prediction error when using a short term predictor plus a long term predictor. The error is smaller:the pitch period can be well predicted with the long term predictor.
26
LPC 10 Vocoder One of the oldest speech coder is the LPC10 vocoder:
The analysis (coder) calculates each frame: Pitch period, prediction coefficients, energy, voicing. The synthesis (decoder) uses these parameters to synthesize speech from the electrical equivalent model.
27
LPC 10 Vocoder (Order 10) Frame= 22,5 ms
28
Prediction Spectral Parameters
The ai coefficients are sensitive to coding and interpolation. They are replaced by other coefficients: Reflexion coefficients ki, log area ratio LARi. Line spectrum frequencies LSFi. In the LPC10 vocoder The pitch and voicing are coded on 7 bits The log of energy on 5 bits The 10 prediction coefficients ai (transformed in ki and LARi) are coded on 41 bits. A total of 53 bits per frame of 22,5ms = 2400bps
29
Vector Quantization (2-dimensional example )
Bit rate can be decreased by applying VQ to the coefficients.
30
Line Spectrum Frequencies LSF, LSP
The Line Spectrum Frequencies fi and Line spectrum pairs cos(fi) have good properties for quantization and interpolation. The LSF and LSP are derived from the inverse filter A(z). Build F1(z) and F2(z) symetrical and antisymmetrical polynomials by (for order 10):
31
LSF and LSP Roots of F1 and F2 on lie on the unit circle and are interleaved. 5 conjugate roots exp(ji), fi= i/(2). The roots of F1 and F2 are searched by evaluating the value of Fi(z) and F2(z) on typically 60 points on the unit circle and monitoring the sign variation. In the interval with sign variation the search is refined.
32
Coders using Short Term and Long Term Prediction RELP MPE LPCELP
RELP = Residual Excited Linear Prediction Coder. The residual is coded in a scalar way and sent with the spectral parameters given by LP. MPE LP = Multi Pulse Excited Linear Prediction Coder: In multipulse coders, the residual is represented by a few pulses with good positions and amplitudes. CELP = Code Excited Linear Prediction Coder: the residual is coded by vector quantization.
33
RPE-LTP GSM Full Rate Coders
GSM Full Rate Coder is called: RPE LTP= Regular Pulse Excited, Long Term Prediction coder The signal u = the best down-sampled version ( 4) of the residual signal r. In CELP coders, vector quantization is applied on the signal. CELP = Code Excited Linear Prediction coder Each frame of residual signal is compared to sequences of signal stored in a codebook. The codebook sequences are white and the codebook is called stochastic codebook. Here down-sampled means sampling rate reduction by decimation.
34
CELP Coder Basic Scheme
Analysis by synthesis (closed loop) to find the best excitation sequence. The sequence in the codebook are normalized in energy.
35
Structure of CELP Coder: Perceptual Filter
Perceptual filter: the reconstruction error is spectrally weighted exploiting noise masking properties of formants. W(z)=A(z/1)/A(z/ 2), 0 1, 2 1 A*(z)=A(z/) (poles towards zero)
36
CELP Coder with Perceptual Filter
The coder must choose the best sequence in the waveform codebook. The best sequence minimizes the perceptual distance between the original speech frame and the synthetic one. For each sequence in the codebook the coder builds a synthetic speech frame, by filtering the white codebook sequence in ordre to give it the same spectrum and the same pitch as the original speech.
37
Basic CELP Structure: Perceptual Filter Inserted in the 2 Branches
The perceptual filter is inserted in the 2 branches of the difference. H(z)=W(z)/A(z)
38
CELP Structure: Memory of H(z)
Memory of H(z) = Output for a zero input hi= impulse response of H(z)
39
CELP Coder: Memory of H(z)
40
CELP: Adaptive Codebook
LTP can be realized by an adaptive codebook P1 corresponds to the filtering of past residuals by H. P2 corresponds to the filtering of the vectors of the stochastic codebook by H. The past residuals can be stored n a codebook that is called the « adaptive codebook » because its contents changes with time.
41
CELP with Stochastic Codebook
The adaptive codebook stores the past residual frames. It is called adaptive because its content changes with time.
42
CELP Decoder
43
CELP Equations Example: Searching through Codebooks
The main load is the filtering of all the codebook vectors.
44
Filtering Matrix H H(n) is the impulse response corresponding to H(z).
N = length of the codebook vectors.
45
Finding the Best Excitation in the Coder: Equation of the Solution
J least square criterion For a set of 2 vectors cj,i(j), F is the 2 column matrix of filtered vectors fj,i(j)
46
CELP Optimal Solution Optimal algorithm finds the best combination of code vectors maximizing the norm and finds the optimal gains gj. But the number of combinations of codebook vectors is very high and the complexity is also great. Example: M=1024 for the stochastic codebook and M=256 for the adaptive codebook Leads to solutions to test and 1280 vectors to filter.
47
Iterative Suboptimal Algorithm for 2 Codebooks
First step: Target vector = p Find the best vector in the adaptive codebook and its gain. Calculate the new target vector p1: Second step: Target vector = p1 Find the best vector in the stochastic codebook and its gain. The optimal solution is too complex. There are many suboptimal algorithms designed to decrease the complexity. The iteratve approach is one of the most common. There can be more than 2 codebooks.
48
Iterative Algorithm
49
Operations of the Iterative Algorithm
At step j, the optimal codebook vector has index i:
50
Iterative Algorithm Numerical Example : FS=8000Hz,
M=256 size of the stochastic codebook Ma=128 size of the adaptive codebook Frame size NT=160, 20ms Frames split in 4 subframes of N=40 samples p=10 linear prediction order 10 Mips to filter the stochastic codebook.
51
Iterative Algorithm The main processing load is the filtering of the codebooks vectors. Many algorithms have been proposed to decrease the computation load: Special structures of the codebook: VSELP: Vector Sum Algebraic codebook: ACELP Linear codebook (the adaptive codebook is linear). Structure of H avoiding the filtering: Diagonalization of HTH In the algebraic codebook approach, the structure is based on interleaved single-pulse permutaion design. In the algebraic codebook, the codebook vectors contain only a few non-zero pulses. The non-zero pulses are equal to + or –1. The N (typically N=40) poistions in a codebook vector are divided into a small number of tracks, for example 5 in the GSM enhanced full rate coder (EFR): 10 non-zero pulses out of 40: Track 1: 2 pulses i0, i5 positions: 0,5,10,15,20,25,30,35 Track 2: 2 pulses i1, i6 positions:1,6,11,16,21,26,31,36 Track 3: 2 pulses i2, i7 positions:2,7,12,17,2é,27,32,37 Track 4: 2 pulses i3, i8 positions:3,8,13,18,23,28,33,38 Track 5: 2 pulses i4, i9 positions:4,9,14,19,24,29,34,39 In the VSELP coder the codebok is generated from a base of vectors By linear combination with coefficients +1 or –1.
52
CELP Coding Standards from 4.8 kbps to 16 kbps
Federal standard (DOD) (4.8 kbps) frame = 260 samples (30 ms) LPC 8 --> (LSP coding 34 bits) adaptive codebook (256 vectors (fractional pitch)) stochastic codebook (512 vectors (-1,0,1))
53
VSELP (Vector Sum Excitation Coding)
Codebook vectors v are combinations of basis vectors (b1,b2,...,bk) v=+/- b1 +/- b2 +/ /- bk Only the basis vectors are filtered Motorola ( 8 kbps) GSM (half rate)(5.6 kbps)
54
Fractional Pitch The precision of the pitch period is a fraction of sample TS. An interpolation filter is used. B(z)=1-bz-Mf with Mf=M+ x(n-M-) can be written as: TF-1(X(f)*e(-j2f(M+)Te)) = x(n-M)* TF-1(e(-j2fTe)) =x(n-M)* h(n)
55
Standards
56
Standards Wired Telephony UIT-T Mobile communications (ETSI - CTIA)
G 711 (1972) : PCM 64 kbps G 721 (1984) : ADPCM 32 kbps G 728 (1991) : LD_CELP 16 kbps G 729 : CS-ACELP 8 kbps Mobile communications (ETSI - CTIA) GSM (FR ) :RPE_LTP kbps GSM (HR) :VSELP 5.6 kbps GSM (EFR) : ACELP kbps UMTS (AMR) : ACELP to 4.75 Kbps Military applications (NATO) FS 1015 (1976) : LPC kbps FS 1016 (1991) : CELP 4.8 kbps
57
AMR Adaptive MultiRate Coder for 3G Application
8 Narrow Band NB AMR source coders kbps 9 Wide Band coders WB AMR coders Based on ACELP Frame of 20 ms, fs=8000 Hz
58
IUT Civil Standards
59
ETSI and Inmarsat Standards
60
TIA and RCR Standards
61
Implementation of CELP Coders on C54x
Example of the G729 Annex A. Specific instruction for codebook search Some functions of DSPLIB
62
Profiling Example for G729 Annex A using C Compiler
G729 is a CS-ACELP Coder (ITU 1995) 8Kbps with quality of ADPCM at 32Kbps G726. DSVD: G729 Annex A voice over internet, voice Digital Simultaneous Voice & Data CS ACELP = Conjugate Structure Agebraic CELP Coder. ADPCM = Adaptive Differential Pulse code Modulation.
63
G729 Annex A Main Blocks of the Coder Algorithm
Frame = 10 ms = 80 Samples. Short term LPC analysis on 40ms frame LSP derived from ai coefficients and quantized using Split VQ. Long Term LTP analysis, 2 subframes of 40 samples. LTP lag and gain. LTP fractional lag (1/3) 8 bits 1rst subframe and 5 bits for the 2nd. Search fixed codebook: 2 subframes of 40 samples. Index and gains Code length = 40 with 4 non-zero pulses 1.
64
Structures of Frames
65
G729 Annex A, Bit Allocation
66
G729 Annex A Main Blocks of the Decoder Algorithm
The serial received bits are converted into parameters: LSP vector, 2 fractional pitch lags and gains, 2 fixed codebook index and gains. LSP are converted to LP filter coefficients ai and interpolated at each subframe. At each subframe: The excitation is constructed and scaled. The speech is synthesized by filtering the excitation by the LP synthesis filter. Postprocessing by an adaptive postfilter.
67
Using the C Compiler Use the C program of the standard and C compiler with maximum optimization. Autocorrelation = cycles Levinson = cycles Conversion ai LSF = cycles LSF Quantization = cycles Synthesis filtering = cycles Pitch open loop = cycles Fractional Pitch = 2 x cycles Search Algebraic code = 2x cycles Gains quantization = 2x cycles
68
Assembly Language Instructions for Codebook Search
Better results can be obtained with assembly language than C. Specific instructions for codebook search: Conditional stores.
69
Assembly Language Codebook Search
70
Assembly Language Codebook Search
A=C(i)2 B= C(i)2Gopt T=Gopt B= C(i)2Gopt-G(i)Copt2 If (B 0) then: BRC Gopt T Iopt A Copt2
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.