Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec

Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec
Presented by Peter

AMR Narrow Band Adaptive Multi-Rate Codec for narrow band speech (AMR-NB) Specified by 3GPP for GSM/3G Systems Input: 8 kHz sampling rate, 13-bit PCM 20 ms frames, no overlap 8 modes + Comfort noise Output bitrate from 4.75 – 12.2 kbps Algebraic Code Excited Linear Prediction (ACELP) is used as speech codec

Frequency Response

Speech Encoder Pre-processing
Linear prediction analysis and quantization Open-loop pitch analysis Impulse response computation Target signal computation Adaptive codebook Algebraic codebook Quantization of the adaptive and fixed codebook gains Memory update

Principles of the adaptive multi-rate speech encoder
Eight source codecs with bit-rates of 12.2, 10.2, 7.95, 7.40, 6.70, 5.90, 5.15 and 4.75 kbit/s 10th order linear prediction (LP), or short‑term, synthesis filter is used which is given by The long‑term, or pitch, synthesis filter is given by The pitch synthesis filter is implemented using adaptive codebook approach

Pre-Processing Two pre‑processing functions
high‑pass filtering signal down‑scaling – prevent overflow A filter with a cut off frequency of 80 Hz is used

Linear Prediction Analysis
Frame is spit into four sub-frames 12.2 kbit/s mode Performed twice per frame 30ms asymmetric window No lookahead 10.2, 7.95, 7.40, 6.70, 5.90, 5.15, 4.75 kbit/s Performed once per frame 5ms lookahead

Windowing and Auto-correlation Computation
12.2 kbit/s mode Two different asymmetric windows 1st window concentrates on 2nd sub-frame 2nd window concentrates on 4th sub-frame

Windowing and Auto-correlation Computation
10.2, 7.95, 7.40, 6.70, 5.90, 5.15, 4.75 kbit/s One asymmetric windows Concentrates on 4th sub-frame 5ms (40 samples) lookahead

Auto-correlation Computation
Lag 0 to 10 is computed is the windowed speech 60 Hz bandwidth expansion is used by lag windowing is multiplied by the white noise correction factor which is equivalent to adding a noise floor at ‑40 dB

Levinson‑Durbin algorithm
by solving the set of equations uses the following recursion: The final solution is given as

LP to LSP conversion The LP filter coefficients, are converted to the line spectral pair (LSP) representation for quantization and interpolation purposes The LSPs are defined as the roots of the sum and difference polynomials All roots of these polynomials are on the unit circle and they alternate each other z=-1 and 1 are eliminated

LP to LSP conversion

Quantization of the LSP coefficients
12.2 kbit/s mode Two sets of LSP are quantified using the representation in the frequency domain 1st order MA prediction is applied two residual LSF vectors are jointly quantified using split matrix quantization (SMQ) weighted LSP distortion measure is used in the quantization process 10.2, 7.95, 7.40, 6.70, 5.90, 5.15, 4.75 kbit/s modes residual LSF vector is quantified using split vector quantization weighted LSP distortion measure

Interpolation of the LSPs
12.2 kbit/s mode interpolated LSP vectors at the 1st and 3rd subframes are given by 10.2, 7.95, 7.40, 6.70, 5.90, 5.15, 4.75 kbit/s modes interpolated LSP vectors at the 1st, 2nd, and 3rd subframes are given by

Open‑loop pitch analysis
Performed twice per frame (each 10 ms) for 12.2k, 10.2k, 7.95k, 7.40, 6.70k, 5.90k bit/s modes Performed once per frame for 5.15k, 4.75k bit/s modes Filtering the pre-processed signal with a perceptual weighting filter original weighted unit circle Flat: Tilted:

Impulse response computation
The impulse response, h(n) is computed each subframe For the search of adaptive and fixed codebooks Computed by filtering the vector of coefficients of the filter extended by zeros through the two filters and

Adaptive codebook Adaptive codebook search is performed on a subframe basis The parameters are the delay and gain of the pitch filter The codebook contain entries taken from the previously synthesized excitation signal

Algebraic codebook Encode the random portion of the excitation signal
The periodic portion of the weighted residual is first removed. Only the random portion is remained to be coded by fixed codebook Codebook search by minimize error between perceptual weighted input speech and reconstructed speech Based on interleaved single-pulse permutation (ISPP) design A few sparse impulse sequence that are phase-shifted version of each other All the pulses have the same magnitude Amplitudes are +1 or -1

Speech decoder Codebook parameter are decoded by table look up
LSP coefficients are interpolated and converted to LP coefficients Excitation = sum of adaptive and fixed codebook vectors multiplied by their respective gains in each subframe Speech = excitation through vocal tract filter. Enhanced perceived quality by adaptive post-filtering.

Speech decoder

Synthesis model

Synthesis model To reconstruct speech A noise-like speech
A pitch filter model of the glottal vibrations A linear prediction filter model of the vocal tract

Post‑processing Adaptive post-filtering High-pass filter
Cascade of two filters: a format postfilter and a tilt compensation filter Updated every subframe of 5 ms High-pass filter Against undesired low frequency components Cut-off frequency of 60 Hz is used Up-scaling by a factor of 2 to compensate for the down-scaling by 2 which is applied to the input signal

Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec

Similar presentations

Presentation on theme: "Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec

Similar presentations

Presentation on theme: "Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec"— Presentation transcript:

Similar presentations

About project

Feedback