Download presentation
Presentation is loading. Please wait.
1
Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec
Presented by Peter
2
AMR Narrow Band Adaptive Multi-Rate Codec for narrow band speech (AMR-NB) Specified by 3GPP for GSM/3G Systems Input: 8 kHz sampling rate, 13-bit PCM 20 ms frames, no overlap 8 modes + Comfort noise Output bitrate from 4.75 – 12.2 kbps Algebraic Code Excited Linear Prediction (ACELP) is used as speech codec
3
Frequency Response
4
Speech Encoder Pre-processing
Linear prediction analysis and quantization Open-loop pitch analysis Impulse response computation Target signal computation Adaptive codebook Algebraic codebook Quantization of the adaptive and fixed codebook gains Memory update
5
Principles of the adaptive multi-rate speech encoder
Eight source codecs with bit-rates of 12.2, 10.2, 7.95, 7.40, 6.70, 5.90, 5.15 and 4.75 kbit/s 10th order linear prediction (LP), or short‑term, synthesis filter is used which is given by The long‑term, or pitch, synthesis filter is given by The pitch synthesis filter is implemented using adaptive codebook approach
6
ACELP
7
Pre-Processing Two pre‑processing functions
high‑pass filtering signal down‑scaling – prevent overflow A filter with a cut off frequency of 80 Hz is used
8
Linear Prediction Analysis
Frame is spit into four sub-frames 12.2 kbit/s mode Performed twice per frame 30ms asymmetric window No lookahead 10.2, 7.95, 7.40, 6.70, 5.90, 5.15, 4.75 kbit/s Performed once per frame 5ms lookahead
9
Windowing and Auto-correlation Computation
12.2 kbit/s mode Two different asymmetric windows 1st window concentrates on 2nd sub-frame 2nd window concentrates on 4th sub-frame
10
Windowing and Auto-correlation Computation
10.2, 7.95, 7.40, 6.70, 5.90, 5.15, 4.75 kbit/s One asymmetric windows Concentrates on 4th sub-frame 5ms (40 samples) lookahead
11
Auto-correlation Computation
Lag 0 to 10 is computed is the windowed speech 60 Hz bandwidth expansion is used by lag windowing is multiplied by the white noise correction factor which is equivalent to adding a noise floor at ‑40 dB
12
Levinson‑Durbin algorithm
by solving the set of equations uses the following recursion: The final solution is given as
13
LP to LSP conversion The LP filter coefficients, are converted to the line spectral pair (LSP) representation for quantization and interpolation purposes The LSPs are defined as the roots of the sum and difference polynomials All roots of these polynomials are on the unit circle and they alternate each other z=-1 and 1 are eliminated
14
LP to LSP conversion
15
Quantization of the LSP coefficients
12.2 kbit/s mode Two sets of LSP are quantified using the representation in the frequency domain 1st order MA prediction is applied two residual LSF vectors are jointly quantified using split matrix quantization (SMQ) weighted LSP distortion measure is used in the quantization process 10.2, 7.95, 7.40, 6.70, 5.90, 5.15, 4.75 kbit/s modes residual LSF vector is quantified using split vector quantization weighted LSP distortion measure
16
Interpolation of the LSPs
12.2 kbit/s mode interpolated LSP vectors at the 1st and 3rd subframes are given by 10.2, 7.95, 7.40, 6.70, 5.90, 5.15, 4.75 kbit/s modes interpolated LSP vectors at the 1st, 2nd, and 3rd subframes are given by
17
Open‑loop pitch analysis
Performed twice per frame (each 10 ms) for 12.2k, 10.2k, 7.95k, 7.40, 6.70k, 5.90k bit/s modes Performed once per frame for 5.15k, 4.75k bit/s modes Filtering the pre-processed signal with a perceptual weighting filter original weighted unit circle Flat: Tilted:
18
Impulse response computation
The impulse response, h(n) is computed each subframe For the search of adaptive and fixed codebooks Computed by filtering the vector of coefficients of the filter extended by zeros through the two filters and
19
Adaptive codebook Adaptive codebook search is performed on a subframe basis The parameters are the delay and gain of the pitch filter The codebook contain entries taken from the previously synthesized excitation signal
20
Algebraic codebook Encode the random portion of the excitation signal
The periodic portion of the weighted residual is first removed. Only the random portion is remained to be coded by fixed codebook Codebook search by minimize error between perceptual weighted input speech and reconstructed speech Based on interleaved single-pulse permutation (ISPP) design A few sparse impulse sequence that are phase-shifted version of each other All the pulses have the same magnitude Amplitudes are +1 or -1
21
Speech decoder Codebook parameter are decoded by table look up
LSP coefficients are interpolated and converted to LP coefficients Excitation = sum of adaptive and fixed codebook vectors multiplied by their respective gains in each subframe Speech = excitation through vocal tract filter. Enhanced perceived quality by adaptive post-filtering.
22
Speech decoder
23
Synthesis model
24
Synthesis model To reconstruct speech A noise-like speech
A pitch filter model of the glottal vibrations A linear prediction filter model of the vocal tract
25
Post‑processing Adaptive post-filtering High-pass filter
Cascade of two filters: a format postfilter and a tilt compensation filter Updated every subframe of 5 ms High-pass filter Against undesired low frequency components Cut-off frequency of 60 Hz is used Up-scaling by a factor of 2 to compensate for the down-scaling by 2 which is applied to the input signal
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.