IEEE GlobalSIP, Orlando, FL, USA, December 14-16, 2015 Enhanced AMR-WB Bandwidth Extension in 3GPP EVS Codec Magdalena Kaniewska, Stéphane Ragot Orange.

Slides:

Advertisements

Similar presentations

Wideband Speech Coding for CDMA2000® Systems

Advertisements

VMR-WB – Operation of the 3GPP2 Wideband Speech Coding Standard M. Jelinek†, R. Salami‡ and S. Ahmadi * †University of Sherbrooke, Canada ‡VoiceAge Corporation,

Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 1 IEEE Speech Coding Workshop Sept 17–20, 2000 Lake Lawn Resort Delavan, WI Jean-Marc Valin,

MPEG-1 MUMT-614 Jan.23, 2002 Wes Hatch. Purpose of MPEG encoding To decrease data rate How? –two choices: could decrease sample rate, but this would cause.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.

STQ Workshop, Sophia-Antipolis, February 11 th, 2003 Packet loss concealment using audio morphing Franck Bouteille¹ Pascal Scalart² Balazs Kövesi² ¹ PRESCOM.

Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),

Philippe Gournay, Bruno Bessette, Roch Lefebvre

Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.

Ranko Pinter Simoco Digital Systems

Filtering Filtering is one of the most widely used complex signal processing operations The system implementing this operation is called a filter A filter.

SYED SYAHRIL TRADITIONAL MUSICAL INSTRUMENT SIMULATOR FOR GUITAR1.

VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi.

CHAPTER 4 Noise in Frequency Modulation Systems

© 2006 AudioCodes Ltd. All rights reserved. AudioCodes Confidential Proprietary Signal Processing Technologies in Voice over IP Eli Shoval Audiocodes.

1 © NOKIA GPP2 Wideband Codec Presentation Interoperable Wideband Speech Coder for CDMA2000 and WCDMA Systems W-VRM: Wideband Variable-Rate Multi-Mode.

Speech Coding Nicola Orio Dipartimento di Ingegneria dell’Informazione IV Scuola estiva AISV, 8-12 settembre 2008.

2nd Workshop on Wideband Speech Quality - June nd Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction 22nd.

A STUDY OF DESIGN COMPROMISES FOR SPEECH CODERS IN PACKET NETWORKS 1.INTRODUCTION In voice over packet networks, the coding gain achieved by prediction-based.

Understanding the Internet Low Bit Rate Coder Jan Linden Vice President of Engineering Global IP Sound Presented by Jan Skoglund Sr. Research Scientist.

Voice over the Internet (the basics) CS 7270 Networked Applications & Services Lecture-2.

Spectrum analyser basics Spectrum analyser basics 1.

1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.

Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.

4.4.3 Interpolation Using Unchanged Key Values It is often necessary to retain the values from the input sequence y(m) in the interpolated x(n). without.

MPEG Audio Compression by V. Loumos. Introduction Motion Picture Experts Group (MPEG) International Standards Organization (ISO) First High Fidelity Audio.

Digital Voice Communication Link EE 413 – TEAM 2 April 21 st, 2005.

Basic Concepts and Definitions Vector and Function Space. A finite or an infinite dimensional linear vector/function space described with set of non-unique.

Communications & Multimedia Signal Processing Formant Tracking LP with Harmonic Plus Noise Model of Excitation for Speech Enhancement Qin Yan Communication.

Communications & Multimedia Signal Processing Refinement in FTLP-HNM system for Speech Enhancement Qin Yan Communication & Multimedia Signal Processing.

Warped Linear Prediction Concept: Warp the spectrum to emulate human perception; then perform linear prediction on the result Approaches to warp the spectrum:

A Full Frequency Masking Vocoder for Legal Eavesdropping Conversation Recording R. F. B. Sotero Filho, H. M. de Oliveira (qPGOM), R. Campello de Souza.

Over-Sampling and Multi-Rate DSP Systems

Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.

Ni.com Data Analysis: Time and Frequency Domain. ni.com Typical Data Acquisition System.

„Bandwidth Extension of Speech Signals“ 2nd Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction 22nd and 23rd June.

Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.

Chapter 16 Speech Synthesis Algorithms 16.1 Synthesis based on LPC 16.2 Synthesis based on formants 16.3 Synthesis based on homomorphic processing 16.4.

Speech Coding Submitted To: Dr. Mohab Mangoud Submitted By: Nidal Ismail.

Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.

VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.

ITU-T G.729 EE8873 Rungsun Munkong March 22, 2004.

1.INTRODUCTION The use of the adaptive codebook (ACB) in CELP-like speech coders allows the achievement of high quality speech, especially for voiced segments.

In CELP coders, the past excitation signal used to build the adaptive codebook is the main source of error propagation when a frame is lost. We presents.

Robust Feature Extraction for Automatic Speech Recognition based on Data-driven and Physiologically-motivated Approaches Mark J. Harvilla1, Chanwoo Kim2.

EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,

Present document contains informations proprietary to France Telecom. Accepting this document means for its recipient he or she recognizes the confidential.

Minjie Xie, Dave Lindbergh, and Peter Chu

A Very Low Bit Rate Protection Layer to Increase the Robustness of the AMR- WB+ Codec against Bit Errors Philippe Gournay Université de Sherbrooke Département.

Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)

A UDIO B ANDWIDTH D ETECTION IN THE EVS C ODEC University of Sherbrooke, Canada VoiceAge Corporation, Montréal, Canada Fraunhofer IIS, Erlagen, Germany.

CELP / FS-1016 – 4.8kbps Federal Standard in Voice Coding

A Novel Frequency Domain BWE with Relaxed Synchronization and Associated BWE Switching Lei Miao, Zexin Liu, Xingtao Zhang, Chen Hu, Jon Gibbs Huawei Technologies.

Institut für Nachrichtengeräte und Datenverarbeitung Prof. Dr.-Ing. P. Vary On the Use of Artificial Bandwidth Extension Techniques in Wideband Speech.

MEMORY-LESS GAIN QUANTIZATION IN THE EVS CODEC Vladimir Malenovsky Milan Jelinek University of Sherbrooke/VoiceAge Corp. CANADA.

2nd Workshop on Wideband Speech Quality - June nd Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction 22nd.

Digital Communications Chapter 13. Source Coding

Spread Spectrum Audio Steganography using Sub-band Phase Shifting

Sampling rate conversion by a rational factor

1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.

Mohamed Chibani, Roch Lefebvre and Philippe Gournay

ON THE ARCHITECTURE OF THE CDMA2000® VARIABLE-RATE MULTIMODE WIDEBAND (VMR-WB) SPEECH CODING STANDARD Milan Jelinek†, Redwan Salami‡, Sassan Ahmadi*, Bruno.

Understanding the Internet Low Bit Rate Coder

Two-Stage Mel-Warped Wiener Filter SNR-Dependent Waveform Processing

Scalable Speech Coding for IP Networks: Beyond iLBC

Govt. Polytechnic Dhangar(Fatehabad)

Presenter: Shih-Hsiang(士翔)

Presentation transcript:

IEEE GlobalSIP, Orlando, FL, USA, December 14-16, 2015 Enhanced AMR-WB Bandwidth Extension in 3GPP EVS Codec Magdalena Kaniewska, Stéphane Ragot Orange Labs (formerly France Telecom R&D) Zexin Liu, Lei Miao, Xingtao Zhang, Jon Gibbs Huawei Technologies Co. Ltd, China Václav Eksler VoiceAge Corp., QC, Canada

p 2 EVS codec overview, focus on AMR-WB IO modes Review of BWE in AMR-WB BWE signal model in AMR-WB issue for BWE BWE in EVS AMR-WB IO signal model excitation generation, HB gain Performance subjective quality, complexity, delay Conclusion Agenda

p 3 EVS codec 1. Superior quality Super HD quality (starting at the same bit rate as HD voice - around kbit/s). Better NB / WB quality at same rate. Improved music quality compared to existing codecs in conversational services. 2. Interoperability Intrinsic interoperability with HD voice (improved AMR-WB inside EVS). 3. Efficiency (capacity, coverage…) EVS bit rates optimized for LTE TBS; wide range of bit rates to cover also fixed-line applications. Better robustness against packet losses. JBM included (recommended feature). Current NB/WB quality at a lower bit rate. EVS AMR-WB IO (Enh.) AMR-WB AMR Mobile phone with EVS focus of this presentation: EVS AMR-WB IO

EVS codec overview, focus on AMR-WB IO modes Review of BWE in AMR-WB BWE signal model in AMR-WB issue for BWE BWE in EVS AMR-WB IO signal model excitation generation, HB gain Performance subjective quality, complexity, delay Conclusion Agenda

p 5 AMR-WB coding model AMR-WB codec is based on a split band model. 9 bit-rates: 6.6, 8.85, 12.65, 14.25, 15.85, 18.25, 19.85, 23.05, kbit/s ACELP coding of a low-band (LB) signal (0 ‑ 6.4 kHz band) after decimating the input signal from 16 to 12.8 kHz. The high-band (HB) signal (6.4-7 kHz) is modeled by BWE: 0 bit BWE for bit rates from 6.6 to kbit/s, 0.8 kbit/s side information only at kbit/s.

p 6 BWE in AMR-WB white noise excitation (5 ms subframe) time envelope ← subframe gain level equalization based on low-band excitation gain correction coded (4 bits/subfr.) at kbits/s, otherwise estimated based on tilt frequency envelope ← LPC synthesis filter BP filter (6-7 kHz) LP filter (7kHz) at kbit/s

p 7 Issues of BWE in AMR-WB High-band signal model based on shaping a white noise signal in both the time and frequency domain. Too limited to represent general signals above 6.4 kHz (e.g. music). Extension from 6.4 kHz only to 7 kHz (while sampling frequency allows for extension up to 8 kHz). Misalignment due to additional low-pass FIR filter at kbit/s ( ms extra delay). AMR-WB quality at kbit/s is lower than at kbit/s and quite similar to kbit/s for clean speech signals (see official characterization). The level of the high-band artificial excitation should be carefully controlled. The side information (0.8 kbit/s) available at kbit/s to code the high-band may be better exploited.

EVS codec overview, focus on AMR-WB IO modes Review of BWE in AMR-WB BWE signal model in AMR-WB issue for BWE BWE in EVS AMR-WB IO signal model excitation generation, HB gain Performance subjective quality, complexity, delay Conclusion Agenda

BWE in EVS AMR-WB IO vs legacy p 9 ← EVS AMR-WB IO AMR-WB →

Excitation generation 1/3 DCT domain, 20 ms frame, no overlapping (not necessary in excitation domain). Missing frequency bins (above 6.4 kHz) are copied from LB spectrum starting from a frequency bin determined adaptively. Components from 5 to 6.4 kHz are left unchanged to provide smooth transition between LB and HB. All bins below 5 kHz are set to 0. Tonal and ambiance components are extracted, modified and re-combined to control the level of tonality in the generated signal. Filtering in DCT domain. p 10

High-band excitation p 11 Low band (0-6.4 kHz) spectrum obtained by 256-point DCT.

High-band excitation p 12 5~6kHz, maintain the original spectrum

High-band excitation p 13 5~6kHz, maintain the original spectrum Search an energy peak of the low-band spectral envelope 6~8kHz, adaptively copy based on the start frequency bin, which is limited to [40, 160] range

High-band excitation p 14 5~6kHz, maintain the original spectrum Search an energy peak of the low-band spectral envelope 6~8kHz, adaptively copy based on the start frequency bin, which is limited to [40, 160] range

High-band excitation p 15 5~6kHz, maintain the original spectrum Search an energy peak of the low-band spectral envelope 6~8kHz, adaptively copy based on the start frequency bin, which is limited to [40, 160] range

Excitation generation 2/3 DCT domain, 20 ms frame, no overlapping (not necessary in excitation domain). Missing frequency bins (above 6.4 kHz) are copied from low band spectrum starting from a frequency bin determined adaptively. Components from 5 to 6.4 kHz are left unchanged to provide smooth transition between LB and HB, below 5 kHz all bins are set to 0. Tonal and ambiance components are extracted, modified and re-combined to control the level of tonality in the generated signal. Filtering in DCT domain. p 16

Tonal and ambiance components 1/2 Ambience componentTonal component Tonal components are defined as the residual signal satisfying y(k) > 0: The ambiance (in absolute value) corresponds to the local average of the magnitude spectrum over a sliding window of 15 bins. The excitation for bins 240:319 (6-8 kHz) is split into ambiance and tonal components.

Tonal and ambiance components 2/2 The extracted tonal and ambiance components are then adaptively re-mixed, the signs of U HB1 (k) are applied to the combined signal The scaling with an adaptive attenuation factor is applied to restore the overall energy ener HB and to obtain the combined high-band excitation signal. Before tonal/ambience recombination After tonal/ambience recombination

Excitation generation 3/3 DCT domain, 20 ms frame, no overlapping (not necessary in excitation domain). Missing frequency bins (above 6.4 kHz) are copied from low band spectrum starting from a frequency bin determined adaptively. Components from 5 to 6.4 kHz are left unchanged to provide smooth transition between LB and HB, below 5 kHz all bins are set to 0. Tonal and ambiance components are extracted, modified and re-combined to control the level of tonality in the generated signal. Filtering in DCT domain. p 19

p 20 Filtering in DCT Domain and Inverse DCT 1/2 The excitation is de ‑ emphasized using the frequency response of the filter over the 6-8 kHz frequency range. This de-emphasis operation is used to revert the pre-emphasis and be consistent with the low-band signal (in the kHz band), which is useful for the subsequent energy estimation and adjustment. The excitation is also band-pass filtered in DCT domain with cut-off frequencies at 6 kHz and kHz. The variable upper cut-off limit is motivated by the fact that adding too much bandwidth above 7 kHz may not be desirable at lowest low bit rates (6.6, 8.85 kbit/s) because the low-band quality is limited and typically degrades quality compared to limiting BWE to 7 kHz. For higher bit rates the 7.8 kHz upper limit has proven empirically to be the best trade-off between more presence and less artifacts.

Filtering in DCT Domain and Inverse DCT 2/2 Finally, 320-point inverse DCT is performed and time domain HB signal is obtained.

Advantages of proposed improvements Precise control of the HB frequency content and tonality level. Choosing starting frequency of the LB spectrum portion to be copied. Combination of tonal and ambience components. Reverse of pre-emphasis operation. Adaptive low-pass filtering to control artifacts. Implicit resampling from 12.8 to 16 kHz. Low complexity, no overlap-add or filtering delay. p 22

HB scaling Subframe gain correction is applied to restore the same subframe to frame energy ratio as in decoded LB signal, that might have been changed by the processing in DCT domain. Decoded HB gain at kbit/s is refined (in particular based on de- emphasis characteristic and tilt information) to improve the quality over kbit/s using the extra 4 bits per subframe. Correction gain for LPC spectral envelopes mismatch in the cross-over region is estimated and applied before LPC synthesis mostly to avoid artifacts coming from an overestimation of HB energy. In each 5 ms subframe, the frequency response of the LPC filter in the low- band and the LPC filter in the high-band are computed at the frequency of 6 kHz. The ratio of frequency responses at 6 kHz provides an estimated gain correction to be used to align the level of LPC spectral envelopes in two different bands. This principle was further adjusted using 2 nd order LPC filters to optimize the correction factor estimation, in particular to avoid over-estimation. p 23

EVS codec overview, focus on AMR-WB IO modes Review of BWE in AMR-WB BWE signal model in AMR-WB issue for BWE BWE in EVS AMR-WB IO signal model excitation generation, HB gain Performance subjective quality, complexity, delay Conclusion Agenda

Comparison with legacy AMR-WB BWE p 25 original AMR-WB legacy AMR-WB IO

p 26 WB clean speech quality (see TR ) ACR method (ITU-T P.800) 32 subjects, 6x4 sentence pairs nominal level (-26 dBov) diotic listening Quality at kbit/s improved wrt. AMR-WB. EVS AMR-WB IO provides consistent improvement compared to AMR-WB operating at the next higher bit rate. These results capture the overall quality of the EVS AMR-WB IO modes, reflecting: Enhanced BWE. Enhancements to low ‑ band decoding, e.g. formant sharpening, dynamic normalization, and improved post- processing. EVS AMR-WB IO provides slightly more audio bandwidth (up to 7.8 kHz) than AMR-WB (up to 7 kHz).

p 27 AB test: new BWE vs. original BWE Ref/A/B test with P.800 CCR grading scale 8 expert listeners in Huawei Lab A: EVS AMB-WB IO B: EVS AMR-WB IO with legacy AMR-WB BWE 24 samples: 12 speech in Mandarin Chinese (6 clean and 6 noisy) and 12 mixed content/music (6 mixed content and 6 music) At both kbit/s and kbit/s the quality is improved due to the enhanced EVS AMR-WB IO BWE. significant improvement for mixed/music items highest improvement at kbit/s kbit/s: new BWE vs. original BWE kbit/s: new BWE vs. original BWE

p 28 AMR-WB IO BWE compared to original AMR-WB BWE: Computational complexity around 1.2 WMOPS higher. About 0.2 kWords of ROM and 1.5 kWords of RAM extra. AMR ‑ WB IO BWE has in principle no extra delay compared to the low-band decoding, since all FIR filtering steps are replaced by DCT processing. In EVS, the BWE output is delayed to be time-synchronized with the resampled low-band output. Complexity, algorithmic delay

p 29 Conclusion The EVS codec includes an enhanced AMR-WB BWE. Quality improvements (legacy AMR-WB → AMR-WB IO) du to: High band excitation is modeled entirely by white noise → excitation generation in DCT allows to control HB spectral content. Band is extended from 6.4 to only 7 kHz → new method for excitation generation combined with refined gain correction allows to extend band up to 7.8 kHz, increasing the perceived effect while limiting the artifacts. Misalignment between LB and HB coming from additional LP filter at kbps → no extra delay, HB perfectly aligned with LB. Quality of highest bit rate (23.85 kbit/s) has proven to be lower than at kbit/s → in AMR-WB IO mode the quality at kbit/s is higher than at kbit/s.

p 30 Q&A