A UDIO B ANDWIDTH D ETECTION IN THE EVS C ODEC University of Sherbrooke, Canada VoiceAge Corporation, Montréal, Canada Fraunhofer IIS, Erlagen, Germany.

Slides:



Advertisements
Similar presentations
Wideband Speech Coding for CDMA2000® Systems
Advertisements

VMR-WB – Operation of the 3GPP2 Wideband Speech Coding Standard M. Jelinek†, R. Salami‡ and S. Ahmadi * †University of Sherbrooke, Canada ‡VoiceAge Corporation,
MPEG-1 MUMT-614 Jan.23, 2002 Wes Hatch. Purpose of MPEG encoding To decrease data rate How? –two choices: could decrease sample rate, but this would cause.
VSMC MIMO: A Spectral Efficient Scheme for Cooperative Relay in Cognitive Radio Networks 1.
CMP206 – Introduction to Data Communication & Networks Lecture 3 – Bandwidth.
Digital Coding of Analog Signal Prepared By: Amit Degada Teaching Assistant Electronics Engineering Department, Sardar Vallabhbhai National Institute of.
Analogue to Digital Conversion (PCM and DM)
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
Motivation Application driven -- VoD, Information on Demand (WWW), education, telemedicine, videoconference, videophone Storage capacity Large capacity.
Digital Representation of Audio Information Kevin D. Donohue Electrical Engineering University of Kentucky.
Codec requirements update Michael Knappe Co-chair, codec WG 1Michael Knappe IETF 77.
Speech codecs and DCCP with TFRC VoIP mode Magnus Westerlund
Speech Coding Nicola Orio Dipartimento di Ingegneria dell’Informazione IV Scuola estiva AISV, 8-12 settembre 2008.
Voice over the Internet (the basics) CS 7270 Networked Applications & Services Lecture-2.
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec
MPEG Audio Compression by V. Loumos. Introduction Motion Picture Experts Group (MPEG) International Standards Organization (ISO) First High Fidelity Audio.
Digital Voice Communication Link EE 413 – TEAM 2 April 21 st, 2005.
Dolby AC-3 Audio Encoding & THX Wai Kam (Winnie) Henele Adams Peter Boettcher.
Audio Coding MPEG1 Layers I, II, III MPEG2MPEG4 Sherida Subrati Anthony Caliendo.
1 Audio Compression Multimedia Systems (Module 4 Lesson 4) Summary: r Simple Audio Compression: m Lossy: Prediction based r Psychoacoustic Model r MPEG.
A Full Frequency Masking Vocoder for Legal Eavesdropping Conversation Recording R. F. B. Sotero Filho, H. M. de Oliveira (qPGOM), R. Campello de Souza.
Image Compression - JPEG. Video Compression MPEG –Audio compression Lossy / perceptually lossless / lossless 3 layers Models based on speech generation.
High survival HF radio network Michele Morelli, Marco Moretti, Luca Sanguinetti CNIT- PISA.
Representing Acoustic Information
Over-Sampling and Multi-Rate DSP Systems
Formatting and Baseband Modulation
Modulation, Demodulation and Coding Course Period Sorour Falahati Lecture 2.
LE 460 L Acoustics and Experimental Phonetics L-13
DIGITAL VOICE NETWORKS ECE 421E Tuesday, October 02, 2012.
COSC 3213 – Computer Networks I Summer 2003 Topics: 1. Line Coding (Digital Data, Digital Signals) 2. Digital Modulation (Digital Data, Analog Signals)
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
„Bandwidth Extension of Speech Signals“ 2nd Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction 22nd and 23rd June.
LECTURE Copyright  1998, Texas Instruments Incorporated All Rights Reserved Encoding of Waveforms Encoding of Waveforms to Compress Information.
Audio Compression Usha Sree CMSC 691M 10/12/04. Motivation Efficient Storage Streaming Interactive Multimedia Applications.
A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.
UNIVERSITÉ DE SHERBROOKE - Philippe G OURNAY Senior Research Engineer VoiceAge Corporation University of Sherbrooke François R OUSSEAU, Roch L EFEBVRE.
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
Preprocessing Ch2, v.5a1 Chapter 2 : Preprocessing of audio signals in time and frequency domain  Time framing  Frequency model  Fourier transform 
Audio Henning Schulzrinne Dept. of Computer Science Columbia University Fall 2003.
Speech Coding Submitted To: Dr. Mohab Mangoud Submitted By: Nidal Ismail.
MPEG Audio coders. Motion Pictures Expert Group(MPEG) The coders associated with audio compression part of MPEG standard are called MPEG audio compressor.
1 Audio Compression. 2 Digital Audio  Human auditory system is much more sensitive to quality degradation then is the human visual system  redundancy.
Compression video overview 演講者:林崇元. Outline Introduction Fundamentals of video compression Picture type Signal quality measure Video encoder and decoder.
Week 7 Lecture 1+2 Digital Communications System Architecture + Signals basics.
ITU-T G.729 EE8873 Rungsun Munkong March 22, 2004.
1.INTRODUCTION The use of the adaptive codebook (ACB) in CELP-like speech coders allows the achievement of high quality speech, especially for voiced segments.
Subband Coding Jennie Abraham 07/23/2009. Overview Previously, different compression schemes were looked into – (i)Vector Quantization Scheme (ii)Differential.
Present document contains informations proprietary to France Telecom. Accepting this document means for its recipient he or she recognizes the confidential.
Minjie Xie, Dave Lindbergh, and Peter Chu
A Very Low Bit Rate Protection Layer to Increase the Robustness of the AMR- WB+ Codec against Bit Errors Philippe Gournay Université de Sherbrooke Département.
A Novel Frequency Domain BWE with Relaxed Synchronization and Associated BWE Switching Lei Miao, Zexin Liu, Xingtao Zhang, Chen Hu, Jon Gibbs Huawei Technologies.
Institut für Nachrichtengeräte und Datenverarbeitung Prof. Dr.-Ing. P. Vary On the Use of Artificial Bandwidth Extension Techniques in Wideband Speech.
IEEE GlobalSIP, Orlando, FL, USA, December 14-16, 2015 Enhanced AMR-WB Bandwidth Extension in 3GPP EVS Codec Magdalena Kaniewska, Stéphane Ragot Orange.
MEMORY-LESS GAIN QUANTIZATION IN THE EVS CODEC Vladimir Malenovsky Milan Jelinek University of Sherbrooke/VoiceAge Corp. CANADA.
2nd Workshop on Wideband Speech Quality - June nd Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction 22nd.
Presentation III Irvanda Kurniadi V. ( )
Lifecycle from Sound to Digital to Sound. Characteristics of Sound Amplitude Wavelength (w) Frequency ( ) Timbre Hearing: [20Hz – 20KHz] Speech: [200Hz.
[1] National Institute of Science & Technology Technical Seminar Presentation 2004 Suresh Chandra Martha National Institute of Science & Technology Audio.
Ch. 2 : Preprocessing of audio signals in time and frequency domain
Scalable Speech Coding for IP Networks
Scalable Speech Coding for IP Networks: Beyond iLBC
Vocoders.
Audio Henning Schulzrinne Dept. of Computer Science
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
Speech and Audio Processing
ON THE ARCHITECTURE OF THE CDMA2000® VARIABLE-RATE MULTIMODE WIDEBAND (VMR-WB) SPEECH CODING STANDARD Milan Jelinek†, Redwan Salami‡, Sassan Ahmadi*, Bruno.
Scalable Speech Coding for IP Networks: Beyond iLBC
MPEG-1 Overview of MPEG-1 Standard
Govt. Polytechnic Dhangar(Fatehabad)
Presentation transcript:

A UDIO B ANDWIDTH D ETECTION IN THE EVS C ODEC University of Sherbrooke, Canada VoiceAge Corporation, Montréal, Canada Fraunhofer IIS, Erlagen, Germany Václav Eksler, Milan Jelínek, Wolfgang Jaegers IEEE GlobalSIP, Orlando, FL, USA, December 14-16, 2015

Agenda  Introduction, specification of the problem  EVS codec  Goal of the work  Prior work  Bandwidth detection (BWD) algorithm  algorithm details  block diagrams  Performance  Coding efficiency  Complexity  Conclusion

Introduction  Speech and audio codecs are usually designed such that they encode all the frequency bands of the input signal spectrum.  Problem: these codecs often do not work optimally when the higher bands do not contain any perceptually meaningful content, because a part of the available bit budget is always assigned to encode these bands.  Solution: bandwidth detection algorithm.  The algorithm is used in the new 3GPP speech and audio codec for Enhanced Voice Services (EVS).

EVS codec  State of the art speech and audio codec standardized by 3GPP.  Flexible in terms of coding various audio material at a large range of bitrates and bandwidths.  Capable of efficiently compressing voice, music, and mixed content signals.  In order to keep high subjective quality for all audio material it consists of a number of different coding modes.  These modes are selected depending on bitrate, input signal characteristics (e.g. speech/music, voiced/unvoiced), signal activity, and audio bandwidth.  Several stages of classification in the pre-processing.

EVS encoder block diagram P RE - PROCESSING Pre-emphasis, Spect.anal. Signal activity detection Noise update/Estimation Speech/Music classifier Open-loop classifier Filter-bank & resampling Bandwidth detector TD transient detector LP analysis, pitch tracker Channel aware config. Signal classifier MDCT selector Input audio Channel (VoIP, VoLTE network) Signaling Info (BW, core, frame type, CA, formant sharp) HP filter (20 Hz) EVS P RIMARY M ODES MDCT core encoder BWE encoder DTX, CNG encoder LP-based encoder AMR-WB IO encoder Core and DTX Switching

Audio bandwidths in EVS codec  Sampling rates: 8, 16, 32, 48 kHz.  Audio bandwidths (BW) supported in the EVS codec: bandwidthfrequency range [kHz]bitrate range [kbps] narrowband (NB) 0 – – 24.4 wideband (WB) 0 – – 128 super wideband (SWB) 0 – – 128 full band (FB) 0 – – 128

Goal of bandwidth detection algorithm  Determine the effective audio bandwidth of the input signal.  Detect changes in the effective audio bandwidth of the input signal.  The information is used to set the codec to its optimal configuration (no waste of available bit budget).  Consequently the coding efficiency is increased for band-limited signals by allocating bits to encode only the useful bandwidth.  (EVS) codec can be flexibly re ‑ configured to encode only the perceptually meaningful frequency content and distribute the available bit budget in the most optimal manner.

Prior work  Traditionally, speech and audio codecs generally expect to receive an input signal with an effective audio bandwidth being close to the Nyquist frequency → low focus for bandwidth detection.  VMR-WB: a simple detection algorithm was used to detect NB input signal sampled at 16 kHz.  Computes smoothed energy in upper bands in FFT domain.  Not very flexible to react to frequent changes in effective bandwidth.  A more robust algorithm based on computing FFT and detecting significant energy in certain bands was presented in [PCT/US2012/067532].  FFT is computed every 5 ms of the input signal → a computationally intensive solution.

Bandwidth detection (BWD) algorithm  The BWD algorithm is based on :  computing energies in spectral regions,  comparing them to certain thresholds,  updating bandwidth-related long-term parameters and counters,  selecting the effective bandwidth.  The algorithm reuses as much as possible signal buffers and parameters available from the earlier stages of the EVS pre ‑ processing module.  EVS primary mode: Complex Modulated Low Delay Filter Bank (CLDFB) algorithm (TF matrix of 16 time slots and several frequency sub-bands (400 Hz each); 4 frequency sub-bands form a frequency band of 1,600 Hz).  EVS AMR-WB IO mode: Discrete Cosine Transform (DCT) algorithm (frequency band of 1,500 Hz, Hanning window with constant length of 320 samples).

Energy bands and energy regions  log energies in energy bands  one to four frequency bands are assigned to each of the spectral regions band #spectral region CLDFB spectrum [kHz] DCT spectrum [kHz] input sampling rate [kHz] 0 nb1.2 – – /32/48 1 wb4.4 – – 7.516/32/ swb9.2 – – / fb16.8 – –

Mean and maximum energy values Log energies per frequency bands are then used to calculate:  the mean energy values per spectral region  the maximum energy values per spectral region

Long term mean energy values  Computed for energy regions only if local_VAD = 1, or if the LT background noise level > 30 dB.  The long-term mean energy values are compared to certain thresholds while taking also into account the current maximum values per bandwidth.  This results in increasing or decreasing counters for each bandwidth.

Bandwidth decision 1/4  The values of the counters are compared against thresholds to detect a BW change.  These thresholds are selected such that the BW change happens with certain hysteresis in order to avoid frequent changes in the detected and subsequently the coded bandwidth.

Bandwidth decision 2/4  Switching from a lower BW to a higher BW is relatively fast to avoid any potential quality degradation due to a loss of high frequency content → short hysteresis (Ω = 10 frames).  Switching from a higher BW to a lower BW is relatively slow. While in this case the coding efficiency somewhat decreases, there is no significant quality degradation. → Longer hysteresis (90 frames) is used as a safeguard against misclassification and to eliminate frequent switching.

Bandwidth decision 3/4  Tests are performed in a sequential order. It can thus happen that the decision about the detected BW changes several times before the final decision.  If a higher BW is detected, the BW counters for BWs smaller or equal to the detected bandwidth are set to their maximum value of 100.  If a lower bandwidth is detected, the BW counters greater or equal to this detected bandwidth are set to their minimal value of 0.

Bandwidth decision 4/4  Finally the detected bandwidth information is used to select the appropriate coding mode with a couple of constraints: 1)In DTX, the bandwidth switching should not happen in the CNG segment so it is postponed until the first active frame. 2)The coded bandwidth can be constrained if the specific bitrate does not support the detected bandwidth.

Performance 1/2  Demonstrated by encoding a band-limited input audio signal (WB signal at 48 kHz sampling rate).  1) LP coding (segmental SNR over the frequency range of 0 – 6.4 kHz for the 13.2 kbps bitrate and over the frequency range of 0 – 8 kHz for the 32 and 64 kbps bitrates): bitrate [kbps] segSNR [dB] encoder + decoder complexity [WMOPS] w/o. BWDw. BWDw/o. BWDw. BWD

Performance 2/2  2) MDCT coding (segmental SNR was measured over only the 0 – 8 kHz frequency range in the transform domain after spectrum quantization):  3) Complexity :  CLDFB BWD in EVS primary mode WMOPS  DCT BWD in AMR-WB IO mode WMOPS bitrate [kbps] segSNR [dB] encoder + decoder complexity [WMOPS] w/o. BWDw. BWDw/o. BWDw. BWD

Conclusion Bandwidth detection algorithm:  Efficient and flexible algorithm, robust to misclassifications.  Part of the recently standardized EVS codec.  Enhances the codec with a flexibility to effectively encode band ‑ limited signals by detecting the current input audio bandwidth.  This information is used to set the codec to its optimal configuration such that the available bit budget is distributed in the more optimal way.  The results show that the coding efficiency significantly increases while the computational complexity significantly decreases.

Thank you! More info:  3GPP TS : "EVS Codec Detailed Algorithmic Description".