Download presentation
Presentation is loading. Please wait.
1
Vocoders
2
The Channel Vocoder (analyzer):
The channel vocoder employs a bank of bandpass filters, Each having a bandwidth between 100 Hz and 300 Hz. Typically, linear phase FIR filter are used. The output of each filter is rectified and lowpass filtered. The bandwidth of the lowpass filter is selected to match the time variations in the characteristics of the vocal tract. For measurement of the spectral magnitudes, a voicing detector and a pitch estimator are included in the speech analysis.
3
The Channel Vocoder (analyzer block diagram):
Bandpass Filter A/D Converter Lowpass Rectifier Voicing detector Pitch Encoder S(n) To Channel
4
The Channel Vocoder (synthesizer):
16-20 linear-phase FIR filters Covering 0-4 kHz Each having a bandwidth between Hz 20-ms frames, or 50 Hz changing of spectral magnitude LPF bandwidth: Hz Sampling rate of the output of the filters: 50 Hz
5
The Channel Vocoder (synthesizer):
Bit rate: 1 bit for voicing detector 6 bits for pitch period For 16 channels, each coded with 3-4 bits, updated 50 times per second Then the total bit rate is bps Further reductions to 1200 bps can be achieved by exploiting frequency correlations of the spectrum magnitude
6
The Channel Vocoder (synthesizer):
At the receiver the signal samples are passed through D/A converters. The outputs of the D/As are multiplied by the voiced or unvoiced signal sources. The resulting signal are passed through bandpass filters. The outputs of the bandpass filters are summed to form the synthesized speech signal.
7
The Channel Vocoder (synthesizer block diagram):
Decoder D/A Converter Bandpass Filter Output speech ∑ D/A Converter Bandpass Filter From Channel Voicing Information Switch Random Noise generator Pitch period Pulse generator
8
The Phase Vocoder : The phase vocoder is similar to the channel vocoder. However, instead of estimating the pitch, the phase vocoder estimates the phase derivative at the output of each filter. By coding and transmitting the phase derivative, this vocoder destroys the phase information .
9
The Phase Vocoder (analyzer block diagram, kth channel)
Compute Short-term Magnitude And Phase Derivative Short-term magnitude Encoder Lowpass Filter Decimator Differentiator S(n) To Channel Differentiator Lowpass Filter Decimator Short-term phase derivative
10
The Phase Vocoder (synthesizer block diagram, kth channel)
Interpolator Decoder ∑ From Channel Cos Integrator Sin Decimated Short-term amplitude Phase derivative
11
The Phase Vocoder : LPF bandwidth: 50 Hz
Demodulation separation: 100 Hz Number of filters: 25-30 Sampling rate of spectrum magnitude and phase derivative: samples per second Spectral magnitude is coded using PCM or DPCM Phase derivative is coded linearly using 2-3 bits The resulting bit rate is 7200 bps
12
The Formant Vocoder : The formant vocoder can be viewed as a type of channel vocoder that estimates the first three or four formants in a segment of speech. It is this information plus the pitch period that is encoded and transmitted to the receiver.
13
The Formant Vocoder : Example of formant: (a) (b)
(a) : The spectrogram of the utterance “day one” showing the pitch and the harmonic structure of speech. (b) : A zoomed spectrogram of the fundamental and the second harmonic. (a) (b)
14
The Formant Vocoder (analyzer block diagram):
Input Speech F1 F1 B1 Pitch And V/U Decoder V/U F0 Fk :The frequency of the kth formant Bk :The bandwidth of the kth formant
15
The Formant Vocoder (synthesizer block diagram):
∑ B2 F1 F1 B1 V/U Excitation Signal F0
16
Linear Predictive Coding :
The objective of LP analysis is to estimate parameters of an all-pole model for the vocal tract. Several methods have been devised for generating the excitation sequence for speech synthesizes. Various LPC-type speech analysis and synthesis methods differ primarily in the type of excitation signal generated for speech synthesis.
17
LPC 10 : This methods is called LPC-10 because of 10 coefficient are typically employed. LPC-10 partitions the speech into the 180 sample frame. Pitch and voicing decision are determined by using the AMDF and zero crossing measures.
18
A General Discrete-Time Model For Speech Production
Pitch Gain s(n) Speech Signal DT Impulse generator G(z) Glottal Filter Voiced U(n) Voiced Volume velocity H(z) Vocal tract Filter R(z) LP Filter V U Uncorrelated Noise generator Unvoiced Gain
19
تعيين مرتبه پيشگويي پيشگويي خطي صدادار بي صدا 1440/03/28 صفحه 19 از 54
1) The pitch is decoded first since it contains the mode information. And decoding is different depending on whether mode is voiced or unvoiced. The pitch is decoded first since it contains the mode information. If the pitch code is all zero or has only one bit set then the mode is unvoiced and the decoder performs error correction and uses defaults for some of the parameters. If two bits are set in the pitch code then a frame erasure is indicated. Any other pitch code means the mode is voiced and the parameters are decoded. If a frame erasure has occurred then a frame repeat mechanism is implemented. 2) After decoding the parameters, the decoder looks at noise attenuation. The noise estimator is updated and any gain attenuation is applied 3) All of the synthesis parameters are then interpolated pitchsynchronously. This includes the LSF’s, log speech gain, pitch, jitter, Fourier magnitudes, pulse and noise coefficients for mixed excitation, and spectral tilt coefficient for adaptive spectral enhancement filter. Normally, all of these parameters are linearly interpolated between the past and current frame values. 4) The excitation is generated as the sum of the filtered pulse and noise excitations. The pulse excitation is calculated using an inverse DFT of one pitch period in length. The noise excitation is generated by a uniform random number generator and then normalized. These excitations are then filtered and added together. صفحه 19 از 54
20
تعيين مرتبه پيشگويي پيشگويي خطي 1440/03/28 صفحه 20 از 54
1) The pitch is decoded first since it contains the mode information. And decoding is different depending on whether mode is voiced or unvoiced. The pitch is decoded first since it contains the mode information. If the pitch code is all zero or has only one bit set then the mode is unvoiced and the decoder performs error correction and uses defaults for some of the parameters. If two bits are set in the pitch code then a frame erasure is indicated. Any other pitch code means the mode is voiced and the parameters are decoded. If a frame erasure has occurred then a frame repeat mechanism is implemented. 2) After decoding the parameters, the decoder looks at noise attenuation. The noise estimator is updated and any gain attenuation is applied 3) All of the synthesis parameters are then interpolated pitchsynchronously. This includes the LSF’s, log speech gain, pitch, jitter, Fourier magnitudes, pulse and noise coefficients for mixed excitation, and spectral tilt coefficient for adaptive spectral enhancement filter. Normally, all of these parameters are linearly interpolated between the past and current frame values. 4) The excitation is generated as the sum of the filtered pulse and noise excitations. The pulse excitation is calculated using an inverse DFT of one pitch period in length. The noise excitation is generated by a uniform random number generator and then normalized. These excitations are then filtered and added together. صفحه 20 از 54
21
تعيين مرتبه پيشگويي پيشگويي خطي صدادار بي صدا 1440/03/28 صفحه 21 از 54
1) The pitch is decoded first since it contains the mode information. And decoding is different depending on whether mode is voiced or unvoiced. The pitch is decoded first since it contains the mode information. If the pitch code is all zero or has only one bit set then the mode is unvoiced and the decoder performs error correction and uses defaults for some of the parameters. If two bits are set in the pitch code then a frame erasure is indicated. Any other pitch code means the mode is voiced and the parameters are decoded. If a frame erasure has occurred then a frame repeat mechanism is implemented. 2) After decoding the parameters, the decoder looks at noise attenuation. The noise estimator is updated and any gain attenuation is applied 3) All of the synthesis parameters are then interpolated pitchsynchronously. This includes the LSF’s, log speech gain, pitch, jitter, Fourier magnitudes, pulse and noise coefficients for mixed excitation, and spectral tilt coefficient for adaptive spectral enhancement filter. Normally, all of these parameters are linearly interpolated between the past and current frame values. 4) The excitation is generated as the sum of the filtered pulse and noise excitations. The pulse excitation is calculated using an inverse DFT of one pitch period in length. The noise excitation is generated by a uniform random number generator and then normalized. These excitations are then filtered and added together. بي صدا صفحه 21 از 54
22
مثال پيشگويي خطي M=4 M=10 1440/03/28 صفحه 22 از 54
1) The pitch is decoded first since it contains the mode information. And decoding is different depending on whether mode is voiced or unvoiced. The pitch is decoded first since it contains the mode information. If the pitch code is all zero or has only one bit set then the mode is unvoiced and the decoder performs error correction and uses defaults for some of the parameters. If two bits are set in the pitch code then a frame erasure is indicated. Any other pitch code means the mode is voiced and the parameters are decoded. If a frame erasure has occurred then a frame repeat mechanism is implemented. 2) After decoding the parameters, the decoder looks at noise attenuation. The noise estimator is updated and any gain attenuation is applied 3) All of the synthesis parameters are then interpolated pitchsynchronously. This includes the LSF’s, log speech gain, pitch, jitter, Fourier magnitudes, pulse and noise coefficients for mixed excitation, and spectral tilt coefficient for adaptive spectral enhancement filter. Normally, all of these parameters are linearly interpolated between the past and current frame values. 4) The excitation is generated as the sum of the filtered pulse and noise excitations. The pulse excitation is calculated using an inverse DFT of one pitch period in length. The noise excitation is generated by a uniform random number generator and then normalized. These excitations are then filtered and added together. M=10 صفحه 22 از 54
23
مثال پيشگويي خطي M=2 M=10 M=54 1440/03/28 صفحه 23 از 54
1) The pitch is decoded first since it contains the mode information. And decoding is different depending on whether mode is voiced or unvoiced. The pitch is decoded first since it contains the mode information. If the pitch code is all zero or has only one bit set then the mode is unvoiced and the decoder performs error correction and uses defaults for some of the parameters. If two bits are set in the pitch code then a frame erasure is indicated. Any other pitch code means the mode is voiced and the parameters are decoded. If a frame erasure has occurred then a frame repeat mechanism is implemented. 2) After decoding the parameters, the decoder looks at noise attenuation. The noise estimator is updated and any gain attenuation is applied 3) All of the synthesis parameters are then interpolated pitchsynchronously. This includes the LSF’s, log speech gain, pitch, jitter, Fourier magnitudes, pulse and noise coefficients for mixed excitation, and spectral tilt coefficient for adaptive spectral enhancement filter. Normally, all of these parameters are linearly interpolated between the past and current frame values. 4) The excitation is generated as the sum of the filtered pulse and noise excitations. The pulse excitation is calculated using an inverse DFT of one pitch period in length. The noise excitation is generated by a uniform random number generator and then normalized. These excitations are then filtered and added together. M=54 صفحه 23 از 54
24
ايده پيشگويي خطي بلند مدت
1440/03/28 پيشگويي خطي ايده پيشگويي خطي بلند مدت M=10 5) An adaptive spectral enhancement filter is now applied to the mixed excitation. This filter is a 10th order pole-zero filter, with an additional 1st order spectral tilt compensation. Its coefficients are calculated by bandwidth expansion on the interpolated LPC filter coefficients and adapt based on the signal-to-noise ratio. 6) The next step is to perform the LPC synthesis. The LPC synthesis uses a direct form LPC filter, with the coefficients corresponding to the interpolated LSF’s. The gain is now applied to the synthesized speech. The gain scaling factor is computed for each pitch period. This scale factor is linearly interpolated to prevent discontinuities in the synthesized speech. 7) After applying the gain, the pulse dispersion filter is applied. This filter is a 65th order FIR filter derived from a spectrally-flattened triangle pulse. Finally, some buffering is performed since the synthesizer produces a full period of synthesized speech. M=50 صفحه 24 از 54
25
پيشگويي خطي بلند مدت پيشگويي خطي 1440/03/28
5) An adaptive spectral enhancement filter is now applied to the mixed excitation. This filter is a 10th order pole-zero filter, with an additional 1st order spectral tilt compensation. Its coefficients are calculated by bandwidth expansion on the interpolated LPC filter coefficients and adapt based on the signal-to-noise ratio. 6) The next step is to perform the LPC synthesis. The LPC synthesis uses a direct form LPC filter, with the coefficients corresponding to the interpolated LSF’s. The gain is now applied to the synthesized speech. The gain scaling factor is computed for each pitch period. This scale factor is linearly interpolated to prevent discontinuities in the synthesized speech. 7) After applying the gain, the pulse dispersion filter is applied. This filter is a 65th order FIR filter derived from a spectrally-flattened triangle pulse. Finally, some buffering is performed since the synthesizer produces a full period of synthesized speech. صفحه 25 از 54
26
مشخصات عمومي بخاطر ارسال 10 ضريب پيشگويي خطي به LPC10 معروف است.
1440/03/28 وكدر LPC10 مشخصات عمومي بخاطر ارسال 10 ضريب پيشگويي خطي به LPC10 معروف است. نرخ ارسال برابر 2400 بيت بر ثانيه ميباشد. تعداد نمونهها در هر فريم برابر 180 نمونه در نظر گرفته شده است. تعداد 54 بيت به ازاي هر فريم ارسال ميشود. سيگنال آنالوگ ورودي آن با نرخ 8000 هرتز نمونه برداري شده و با 16 بيت كوانتايز ميشود. صفحه 26 از 54
27
كد كننده وكدر LPC10 Bit Encoder فريم بندي فيلتر پيش تاكيد آشكار ساز
1440/03/28 وكدر LPC10 كد كننده فريم بندي فيلتر پيش تاكيد آشكار ساز صدا تعيين ضرايب پيشگويي فيلتر خطاي تخمين بسامد گام كد گذاري ضرايب LPC كد گشايي LPC محاسبه بهره كد گذاري بهره دوره گام Bit Encoder سيگنال PCM ودودي انديس ضرايبLPC انديس بهره رشته بيت ارسالي صفحه 27 از 54
28
تشخيص بسامد گام روش خود همبستگي روش تابع تفاضل دامنه روش YMC
1440/03/28 تشخيص بسامد گام روش خود همبستگي روش تابع تفاضل دامنه روش YMC صفحه 28 از 54
29
وكدر LPC10 آشكار ساز صدا تخمين بسامد گام 1- محاسبه انرژي ( باند پايين)
1440/03/28 وكدر LPC10 كد كننده آشكار ساز صدا 1- محاسبه انرژي ( باند پايين) 2- محاسبه نرخ عبور از صفر 3- محاسبه بهره پيشگويي تخمين بسامد گام محاسبه MDF ارسال يكي از مقادير: T=20,21,…,39,40,42,…,80,84,…,154 صفحه 29 از 54
30
1440/03/28 وكدر LPC10 كد كننده كوانتيزاسيون ضرايب LPC حل معادله نرمال به روش لوينسون- دوربين محاسبه ضرايب RC صفحه 30 از 54
31
سنتز گفتار وكدر LPC10 كد كننده كد گشا صفحه 31 از 54 سيگنال اصلي
1440/03/28 وكدر LPC10 سنتز گفتار سيگنال اصلي بخش كد كننده تعيين صدادار/بيصدا بودن فريم تعيين دوره گام فثط براي حالت صدادار محاسبه بهره سيگنال G قطار ضربه با پريود يراير دوره گام نويز تصادفي مدل منبع V/U گفتار سنتز شده كد گشا بهره كد كننده صفحه 31 از 54
32
1440/03/28 وكدر LPC10 محدوديتها 1- تقسيم بندي به دو قسمت صدادار و بيصدا استفاده از نويز تصادفي و قطار ضربه پريوديك جهت تحريك (قطار ضربه تنها نميتواند تمامي صوتهاي واكدار را ايجاد كند.) حفظ نشدن فاز سيگنال اصلي استفاده از قطار ضربه يك تخطي از مدل AR است. صفحه 32 از 54
33
Residual Excited LP Vocoder :
Speech quality can be improved at the expense of a higher bit rate by computing and transmitting a residual error, as done in the case of DPCM. One method is that the LPC model and excitation parameters are estimated from a frame of speech.
34
Residual Excited LP Vocoder :
The speech is synthesized at the transmitter and subtracted from the original speech signal to form the residual error. The residual error is quantized, coded, and transmitted to the receiver At the receiver the signal is synthesized by adding the residual error to the signal generated from the model.
35
Residual Excited LP Vocoder :
The residual signal is low-pass filtered at 1000 Hz in the analyzer to reduce bit rate In the synthesizer, it is rectified and spectrum flattened (using a HPF), the lowpass and highpass signals are summed and the resulting residual error signal is used to excite the LPC model. RELP vocoder provides communication-quality speech at about 9600 bps.
36
RELP Analyzer (type 1): Buffer And window ∑ Encoder stLP analysis S(n)
f (n; m) e (n; m) ∑ Residual error LP Parameters stLP analysis To Channel Excitation parameters LP Synthesis model
37
RELP Analyzer (type 2): S(n) Buffer And window Inverse Filter Lowpass
Prediction Residual Buffer And window Inverse Filter To Channel S(n) f (n; m) Lowpass Filter Decimator DFT Encoder stLP analysis LP Parameters
38
Synthesizer for a RELP vocoder
Decoder Buffer And Controller From Channel Interpolator Rectifier Highpass Filter Residual ∑ LP model Parameter updates LP synthesizer Excitation
39
Multipulse LPC Vocoder
RELP needs to regenerate the high-frequency components at the decoder. A crude approximation of the high frequencies The multipulse LPC is a time domain analysis-by-synthesis method that results in a better excitation signal for the LPC vocal system filter.
40
Multipulse LPC Vocoder
The information concerning the excitation sequence includes: the location of the pulses an overall scale factor corresponding to the largest pulse amplitude The pulse amplitudes relative to the overall scale factor The scale factor is logarithmically quantized into 6 bits. The amplitudes are linearly quantized into 4 bits. The pulse locations are encoded using a differential coding scheme. The excitation parameters are updated every 5 msec. The LPC vocal-tract parameters and the pitch period are updated every 20 msec. The bit rate is 9600 bps.
41
Analysis-by-synthesis coder
A stored sequence from a Gaussian excitation codebook is scaled and used to excite the cascade of a pitch synthesis filter and the LPC synthesis filter The synthetic speech is compared with the original speech Residual error signal is weighted perceptually by a filter
42
Obtaining the multipulse excitation: (Analysis by synthesis method)
Input speech Buffer And LP analysis Pitch Synthesis LP Synthesis filter + - ∑ Perceptual Weighting filter W(z) Multipulse Excitation generator Error minimization
43
Code Excited LP : CELP is an analysis-by-synthesis method in which the excitation sequence is selected from a codebook of zero-mean Gaussian sequence. The bit rate of the CELP is 4800 bps.
44
CELP (analysis-by-synthesis coder) :
Gaussian Excitation codebook Pitch Synthesis filter Spectral Envelope (LP) Synthesis filter ∑ Perceptual Weighting Filter W(z) Computer Energy (square and sum) Buffer and LP analysis Side information Gain parameters Speech samples Index of sequence
45
Analysis-by-synthesis coder
This weighted error is squared and summed over a subframe block to give the error energy By performing an exhaustive search through the codebook we find the excitation sequence that minimizes the error energy
46
Analysis-by-synthesis coder
The gain factor for scaling the excitation sequence is determined for each codeword in the codebook by minimizing the error energy for the block of samples
47
gain and pitch estimate
CELP (synthesizer) : From Channel decoder Buffer And controller Gaussian Excitation codebook Pitch Synthesis filter LP LP parameters, gain and pitch estimate updates
48
CELP synthesizer Cascade of two all-pole filters with coefficients that are updated periodically The first filter is a long-delay pitch filter used to generate the pitch periodicity in voiced speech This filter has this form
49
CELP Parameters of the filter can be determined by minimizing the prediction error energy, after pitch estimation, over a frame duration of 5msec The second filter is a short-delay all-pole (vocal-tract) filter and has coefficients that are determined every 10-20msec
50
Example: sampling frequency is 8 kHz
Subframe block duration for the pitch estimation and excitation sequence is performed every 5msec. We have 40 samples per 5-msec The excitation sequence consist of 40 samples
51
Example: A codebook of 1024 sequences gives good-quality of speech
For such a codebook size ,we require 10 bits to send codebook index Hence the bit rate is reduced by a factor of 4 The transmission of pitch predictor parameters and spectral predictor brings the bit rate to about 4800 bps
52
Low-delay CELP coder CELP has been used to achieve toll-quality speech at bps with low delay. Although other types of vocoders produce high quality speech at bps these vocoders buffer 10-20msec of speech samples
53
Low-delay CELP coder The one way delay is of the order of 20-40 msec
With modification of CELP, it is possible to reduce the one-way delay to about 2ms Low-delay CELP is achieved by using a backward-adaptive predictor with a gain parameter and an excitation vector size as small as 5 samples
54
Low-delay CELP coder Excitation Vector quantizer codebook ∑ + - Gain
LP (high-order) Synthesis filter ∑ Perceptual Weighting Filter W(z) Error minimization Buffer and window Input Speech + - Gain adaptation Predictor
55
Low-delay CELP coder Pitch predictor used in the conventional forward-adaptive coder is eliminated In order to compensate for the loss in pitch information, the LPC predictor order is increased significantly , to an order of 50
56
Low-delay CELP coder LPC coefficients are updated more frequently, every 2.5 ms 5-sample excitation vector corresponds to an excitation block duration of msec at 8-kHz sampling rate
57
Low-delay CELP coder The logarithm of the excitation gain is adapted every subframe excitation block by employing a 10th-order adaptive linear predictor in the logarithmic scale The coefficients of the logarithmic-gain predictor are updated every four blocks by performing an LPC analysis of previously quantized excitation signal blocks
58
Low-delay CELP coder The perceptual weighting filter is also 10th order and is updated once every four blocks by employing an LPC analysis on frames of the input speech signal of duration 2.5 msec The excitation codebook in the low-delay CELP is also modified compared to conventional CELP 10-bit excitation codebook is employed
59
Vector Sum Excited LP : The VSELP coder and decoder basically differ in method by which the excitation sequence is formed In the next block diagram of the VSELP, there are three excitation sources One excitation is obtained from the pitch period state The other two excitation sources are obtained from two codebooks
60
VSELP Decoder : Long-term Filter state Codebook 1 2 ∑ Pitch synthesis
Spectral post filter envelope (LP) Synthetic Speech
61
VSELP Decoder LPC synthesis filter is implemented as a 10-pole filter and its coefficients are coded and transmitted every 20ms Coefficients are updated in each 5-ms frame by interpolation Excitation parameters are also updated every 5ms
62
VSELP Decoder 128 codewords in each of the two codebooks
codewords are constructed from two sets of seven basis codewords by forming linear combinations of the seven basis codewords The long-term filter state is also a codebook with 128 codeword sequences
63
VSELP Decoder In each 5-msec frame, the codewords from this codebook are filtered through the speech system filter and correlated with the input speech sequence The filtered codeword is used to update the history and the lag is transmitted to the decoder
64
VSELP Decoder Thus the update occurs by appending the best-filtered codeword to the history codebook The oldest sample in the history array is discarded The result is that the long-term state becomes an adaptive codebook
65
VSELP Decoder The three excitation sequences are selected sequentially from each of three codebooks Each codebook search attempts to find the codeword that minimizes the total energy of the perceptually weighted error Once the codewords have been selected the three gain parameters are optimized
66
VSELP Decoder Joint gain optimization is sequentially accomplished by orthogonalizing each weighted codeword vectors prior to the codebook search These parameters are vector quantized to one of 256 eight-bit vectors and transmitted in every 5-ms frame
67
Vector Sum Excited LP : The bit rate of the VSELP is about 8000 bps.
Bit allocations for 8000-bps VSELP Parameters Bits/5-ms Frame Bits/20ms 10 LPC coefficients Average speech energy Excitation codewords from two VSELP codebooks Gain parameters Lag of pitch filter Total
68
VSELP Decoder Finally, an adaptive spectral post filter is employed in VSELP following the LPC synthesis filter; this post filter is a pole-zero filter of the form
69
DEMO Speech Codec Male Speaker Female Speaker Music
Original Speech/Music (16-bit sampled at 8KHz) FS-1015 (LPC-10e 2.4 kb/s) FS-1016(CELP 4.8 kb/s) IS-54 ( VSELP 7.95 kb/s) G.721 (32 kb/s ADPCM)
70
Standard Voice Algorithms G.711
The most widely used digital representation of voice signals is that of the G.711 or PCM (Pulse Code Modulation) This codec represents a 4 kHz band limited voice signal sampled at 8 kHz using 8 bits per sample A-law or m-law coding. G.726 The protocol for the G.726 codec requires a 64 kbps A-Law or m-law PCM signal to be encoded into four different bit rate options ranging from 2 bits per sample to 5 bits per sample The algorithm is based on Adaptive Differential Pulse Code Modulation (ADPCM) and is based on 1 sample backward prediction scheme.
71
G.728 The G.728 algorithm compresses PCM codec voice signals to a bit rate of 16 kbps. This algorithm is based on a strong backward prediction scheme and is by far considered as one of the most complex voice algorithms to be produced by the ITU standard organization. G.729 For compression of voice signals at 8 kbps the G.729 algorithm offers toll quality with built in algorithmic delays of less than 15 msec Additional features described in the G.729 Annex ensure VAD1 and Comfort Noise Generation functionalities to enhance the quality and reduce the overall bit rate G.723.1 The most widely used algorithm for band limited channels, such as VoIP and video conferencing, is that of G.723.1 The algorithm has two operating bit rates of 6.3 kbps and 5.3 kbps Although the delay is not as low as that of the other ITU standards its quality is near toll quality for the given low bit rates, making it very efficient in bit usage.
72
GSM2—AMR The latest GSM standard is the multi rate Adaptive Code Excited Linear Prediction that provides compression in the range of 4.75 to 12.2 kbps In total the codec provides 12 bit rates that cover the half rate to full rate channel capacity. GSM—FR The first digital codec used in a mobile environment is the GSM Full Rate vocoder The codec compresses 13 bit PCM sample signals to a rate of 13 kbps The algorithm is based on a very simple Regular Pulse Excited – Linear Prediction Coding technique. GSM—HR To increase capacity, the GSM committee decided on a lower bit rate of 5.6 kbps for the voice channel The algorithm is based on the Vector Sum Excited Linear Predictive (VSELP) and is computationally as complex as other low bit rate algorithms.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.