Presentation is loading. Please wait.

Presentation is loading. Please wait.

IEEE GlobalSIP, Orlando, FL, USA, December 14-16, 2015 Enhanced AMR-WB Bandwidth Extension in 3GPP EVS Codec Magdalena Kaniewska, Stéphane Ragot Orange.

Similar presentations


Presentation on theme: "IEEE GlobalSIP, Orlando, FL, USA, December 14-16, 2015 Enhanced AMR-WB Bandwidth Extension in 3GPP EVS Codec Magdalena Kaniewska, Stéphane Ragot Orange."— Presentation transcript:

1

2 IEEE GlobalSIP, Orlando, FL, USA, December 14-16, 2015 Enhanced AMR-WB Bandwidth Extension in 3GPP EVS Codec Magdalena Kaniewska, Stéphane Ragot Orange Labs (formerly France Telecom R&D) Zexin Liu, Lei Miao, Xingtao Zhang, Jon Gibbs Huawei Technologies Co. Ltd, China Václav Eksler VoiceAge Corp., QC, Canada

3 p 2 EVS codec overview, focus on AMR-WB IO modes Review of BWE in AMR-WB BWE signal model in AMR-WB issue for BWE BWE in EVS AMR-WB IO signal model excitation generation, HB gain Performance subjective quality, complexity, delay Conclusion Agenda

4 p 3 EVS codec 1. Superior quality Super HD quality (starting at the same bit rate as HD voice - around 12-13 kbit/s). Better NB / WB quality at same rate. Improved music quality compared to existing codecs in conversational services. 2. Interoperability Intrinsic interoperability with HD voice (improved AMR-WB inside EVS). 3. Efficiency (capacity, coverage…) EVS bit rates optimized for LTE TBS; wide range of bit rates to cover also fixed-line applications. Better robustness against packet losses. JBM included (recommended feature). Current NB/WB quality at a lower bit rate. EVS AMR-WB IO (Enh.) AMR-WB AMR Mobile phone with EVS focus of this presentation: EVS AMR-WB IO

5 EVS codec overview, focus on AMR-WB IO modes Review of BWE in AMR-WB BWE signal model in AMR-WB issue for BWE BWE in EVS AMR-WB IO signal model excitation generation, HB gain Performance subjective quality, complexity, delay Conclusion Agenda

6 p 5 AMR-WB coding model AMR-WB codec is based on a split band model. 9 bit-rates: 6.6, 8.85, 12.65, 14.25, 15.85, 18.25, 19.85, 23.05, 23.85 kbit/s ACELP coding of a low-band (LB) signal (0 ‑ 6.4 kHz band) after decimating the input signal from 16 to 12.8 kHz. The high-band (HB) signal (6.4-7 kHz) is modeled by BWE: 0 bit BWE for bit rates from 6.6 to 23.05 kbit/s, 0.8 kbit/s side information only at 23.85 kbit/s.

7 p 6 BWE in AMR-WB white noise excitation (5 ms subframe) time envelope ← subframe gain level equalization based on low-band excitation gain correction coded (4 bits/subfr.) at 23.85 kbits/s, otherwise estimated based on tilt frequency envelope ← LPC synthesis filter BP filter (6-7 kHz) LP filter (7kHz) at 23.85 kbit/s

8 p 7 Issues of BWE in AMR-WB High-band signal model based on shaping a white noise signal in both the time and frequency domain. Too limited to represent general signals above 6.4 kHz (e.g. music). Extension from 6.4 kHz only to 7 kHz (while sampling frequency allows for extension up to 8 kHz). Misalignment due to additional low-pass FIR filter at 23.85 kbit/s (0.9375 ms extra delay). AMR-WB quality at 23.85 kbit/s is lower than at 23.05 kbit/s and quite similar to 15.85 kbit/s for clean speech signals (see official characterization). The level of the high-band artificial excitation should be carefully controlled. The side information (0.8 kbit/s) available at 23.85 kbit/s to code the high-band may be better exploited.

9 EVS codec overview, focus on AMR-WB IO modes Review of BWE in AMR-WB BWE signal model in AMR-WB issue for BWE BWE in EVS AMR-WB IO signal model excitation generation, HB gain Performance subjective quality, complexity, delay Conclusion Agenda

10 BWE in EVS AMR-WB IO vs legacy p 9 ← EVS AMR-WB IO AMR-WB →

11 Excitation generation 1/3 DCT domain, 20 ms frame, no overlapping (not necessary in excitation domain). Missing frequency bins (above 6.4 kHz) are copied from LB spectrum starting from a frequency bin determined adaptively. Components from 5 to 6.4 kHz are left unchanged to provide smooth transition between LB and HB. All bins below 5 kHz are set to 0. Tonal and ambiance components are extracted, modified and re-combined to control the level of tonality in the generated signal. Filtering in DCT domain. p 10

12 High-band excitation p 11 Low band (0-6.4 kHz) spectrum obtained by 256-point DCT.

13 High-band excitation p 12 5~6kHz, maintain the original spectrum

14 High-band excitation p 13 5~6kHz, maintain the original spectrum Search an energy peak of the low-band spectral envelope 6~8kHz, adaptively copy based on the start frequency bin, which is limited to [40, 160] range

15 High-band excitation p 14 5~6kHz, maintain the original spectrum Search an energy peak of the low-band spectral envelope 6~8kHz, adaptively copy based on the start frequency bin, which is limited to [40, 160] range

16 High-band excitation p 15 5~6kHz, maintain the original spectrum Search an energy peak of the low-band spectral envelope 6~8kHz, adaptively copy based on the start frequency bin, which is limited to [40, 160] range

17 Excitation generation 2/3 DCT domain, 20 ms frame, no overlapping (not necessary in excitation domain). Missing frequency bins (above 6.4 kHz) are copied from low band spectrum starting from a frequency bin determined adaptively. Components from 5 to 6.4 kHz are left unchanged to provide smooth transition between LB and HB, below 5 kHz all bins are set to 0. Tonal and ambiance components are extracted, modified and re-combined to control the level of tonality in the generated signal. Filtering in DCT domain. p 16

18 Tonal and ambiance components 1/2 Ambience componentTonal component Tonal components are defined as the residual signal satisfying y(k) > 0: The ambiance (in absolute value) corresponds to the local average of the magnitude spectrum over a sliding window of 15 bins. The excitation for bins 240:319 (6-8 kHz) is split into ambiance and tonal components.

19 Tonal and ambiance components 2/2 The extracted tonal and ambiance components are then adaptively re-mixed, the signs of U HB1 (k) are applied to the combined signal The scaling with an adaptive attenuation factor is applied to restore the overall energy ener HB and to obtain the combined high-band excitation signal. Before tonal/ambience recombination After tonal/ambience recombination

20 Excitation generation 3/3 DCT domain, 20 ms frame, no overlapping (not necessary in excitation domain). Missing frequency bins (above 6.4 kHz) are copied from low band spectrum starting from a frequency bin determined adaptively. Components from 5 to 6.4 kHz are left unchanged to provide smooth transition between LB and HB, below 5 kHz all bins are set to 0. Tonal and ambiance components are extracted, modified and re-combined to control the level of tonality in the generated signal. Filtering in DCT domain. p 19

21 p 20 Filtering in DCT Domain and Inverse DCT 1/2 The excitation is de ‑ emphasized using the frequency response of the filter over the 6-8 kHz frequency range. This de-emphasis operation is used to revert the pre-emphasis and be consistent with the low-band signal (in the 0-6.4 kHz band), which is useful for the subsequent energy estimation and adjustment. The excitation is also band-pass filtered in DCT domain with cut-off frequencies at 6 kHz and 7-7.8 kHz. The variable upper cut-off limit is motivated by the fact that adding too much bandwidth above 7 kHz may not be desirable at lowest low bit rates (6.6, 8.85 kbit/s) because the low-band quality is limited and typically degrades quality compared to limiting BWE to 7 kHz. For higher bit rates the 7.8 kHz upper limit has proven empirically to be the best trade-off between more presence and less artifacts.

22 Filtering in DCT Domain and Inverse DCT 2/2 Finally, 320-point inverse DCT is performed and time domain HB signal is obtained.

23 Advantages of proposed improvements Precise control of the HB frequency content and tonality level. Choosing starting frequency of the LB spectrum portion to be copied. Combination of tonal and ambience components. Reverse of pre-emphasis operation. Adaptive low-pass filtering to control artifacts. Implicit resampling from 12.8 to 16 kHz. Low complexity, no overlap-add or filtering delay. p 22

24 HB scaling Subframe gain correction is applied to restore the same subframe to frame energy ratio as in decoded LB signal, that might have been changed by the processing in DCT domain. Decoded HB gain at 23.85 kbit/s is refined (in particular based on de- emphasis characteristic and tilt information) to improve the quality over 23.05 kbit/s using the extra 4 bits per subframe. Correction gain for LPC spectral envelopes mismatch in the cross-over region is estimated and applied before LPC synthesis mostly to avoid artifacts coming from an overestimation of HB energy. In each 5 ms subframe, the frequency response of the LPC filter in the low- band and the LPC filter in the high-band are computed at the frequency of 6 kHz. The ratio of frequency responses at 6 kHz provides an estimated gain correction to be used to align the level of LPC spectral envelopes in two different bands. This principle was further adjusted using 2 nd order LPC filters to optimize the correction factor estimation, in particular to avoid over-estimation. p 23

25 EVS codec overview, focus on AMR-WB IO modes Review of BWE in AMR-WB BWE signal model in AMR-WB issue for BWE BWE in EVS AMR-WB IO signal model excitation generation, HB gain Performance subjective quality, complexity, delay Conclusion Agenda

26 Comparison with legacy AMR-WB BWE p 25 original AMR-WB legacy AMR-WB IO

27 p 26 WB clean speech quality (see TR 26.952) ACR method (ITU-T P.800) 32 subjects, 6x4 sentence pairs nominal level (-26 dBov) diotic listening Quality at 23.85 kbit/s improved wrt. AMR-WB. EVS AMR-WB IO provides consistent improvement compared to AMR-WB operating at the next higher bit rate. These results capture the overall quality of the EVS AMR-WB IO modes, reflecting: Enhanced BWE. Enhancements to low ‑ band decoding, e.g. formant sharpening, dynamic normalization, and improved post- processing. EVS AMR-WB IO provides slightly more audio bandwidth (up to 7.8 kHz) than AMR-WB (up to 7 kHz).

28 p 27 AB test: new BWE vs. original BWE Ref/A/B test with P.800 CCR grading scale 8 expert listeners in Huawei Lab A: EVS AMB-WB IO B: EVS AMR-WB IO with legacy AMR-WB BWE 24 samples: 12 speech in Mandarin Chinese (6 clean and 6 noisy) and 12 mixed content/music (6 mixed content and 6 music) At both 12.65 kbit/s and 23.85 kbit/s the quality is improved due to the enhanced EVS AMR-WB IO BWE. significant improvement for mixed/music items highest improvement at 23.85 kbit/s 23.85 kbit/s: new BWE vs. original BWE 12.65 kbit/s: new BWE vs. original BWE

29 p 28 AMR-WB IO BWE compared to original AMR-WB BWE: Computational complexity around 1.2 WMOPS higher. About 0.2 kWords of ROM and 1.5 kWords of RAM extra. AMR ‑ WB IO BWE has in principle no extra delay compared to the low-band decoding, since all FIR filtering steps are replaced by DCT processing. In EVS, the BWE output is delayed to be time-synchronized with the resampled low-band output. Complexity, algorithmic delay

30 p 29 Conclusion The EVS codec includes an enhanced AMR-WB BWE. Quality improvements (legacy AMR-WB → AMR-WB IO) du to: High band excitation is modeled entirely by white noise → excitation generation in DCT allows to control HB spectral content. Band is extended from 6.4 to only 7 kHz → new method for excitation generation combined with refined gain correction allows to extend band up to 7.8 kHz, increasing the perceived effect while limiting the artifacts. Misalignment between LB and HB coming from additional LP filter at 23.85 kbps → no extra delay, HB perfectly aligned with LB. Quality of highest bit rate (23.85 kbit/s) has proven to be lower than at 23.05 kbit/s → in AMR-WB IO mode the quality at 23.85 kbit/s is higher than at 23.05 kbit/s.

31 p 30 Q&A


Download ppt "IEEE GlobalSIP, Orlando, FL, USA, December 14-16, 2015 Enhanced AMR-WB Bandwidth Extension in 3GPP EVS Codec Magdalena Kaniewska, Stéphane Ragot Orange."

Similar presentations


Ads by Google