A UDIO B ANDWIDTH D ETECTION IN THE EVS C ODEC University of Sherbrooke, Canada VoiceAge Corporation, Montréal, Canada Fraunhofer IIS, Erlagen, Germany Václav Eksler, Milan Jelínek, Wolfgang Jaegers IEEE GlobalSIP, Orlando, FL, USA, December 14-16, 2015
Agenda Introduction, specification of the problem EVS codec Goal of the work Prior work Bandwidth detection (BWD) algorithm algorithm details block diagrams Performance Coding efficiency Complexity Conclusion
Introduction Speech and audio codecs are usually designed such that they encode all the frequency bands of the input signal spectrum. Problem: these codecs often do not work optimally when the higher bands do not contain any perceptually meaningful content, because a part of the available bit budget is always assigned to encode these bands. Solution: bandwidth detection algorithm. The algorithm is used in the new 3GPP speech and audio codec for Enhanced Voice Services (EVS).
EVS codec State of the art speech and audio codec standardized by 3GPP. Flexible in terms of coding various audio material at a large range of bitrates and bandwidths. Capable of efficiently compressing voice, music, and mixed content signals. In order to keep high subjective quality for all audio material it consists of a number of different coding modes. These modes are selected depending on bitrate, input signal characteristics (e.g. speech/music, voiced/unvoiced), signal activity, and audio bandwidth. Several stages of classification in the pre-processing.
EVS encoder block diagram P RE - PROCESSING Pre-emphasis, Spect.anal. Signal activity detection Noise update/Estimation Speech/Music classifier Open-loop classifier Filter-bank & resampling Bandwidth detector TD transient detector LP analysis, pitch tracker Channel aware config. Signal classifier MDCT selector Input audio Channel (VoIP, VoLTE network) Signaling Info (BW, core, frame type, CA, formant sharp) HP filter (20 Hz) EVS P RIMARY M ODES MDCT core encoder BWE encoder DTX, CNG encoder LP-based encoder AMR-WB IO encoder Core and DTX Switching
Audio bandwidths in EVS codec Sampling rates: 8, 16, 32, 48 kHz. Audio bandwidths (BW) supported in the EVS codec: bandwidthfrequency range [kHz]bitrate range [kbps] narrowband (NB) 0 – – 24.4 wideband (WB) 0 – – 128 super wideband (SWB) 0 – – 128 full band (FB) 0 – – 128
Goal of bandwidth detection algorithm Determine the effective audio bandwidth of the input signal. Detect changes in the effective audio bandwidth of the input signal. The information is used to set the codec to its optimal configuration (no waste of available bit budget). Consequently the coding efficiency is increased for band-limited signals by allocating bits to encode only the useful bandwidth. (EVS) codec can be flexibly re ‑ configured to encode only the perceptually meaningful frequency content and distribute the available bit budget in the most optimal manner.
Prior work Traditionally, speech and audio codecs generally expect to receive an input signal with an effective audio bandwidth being close to the Nyquist frequency → low focus for bandwidth detection. VMR-WB: a simple detection algorithm was used to detect NB input signal sampled at 16 kHz. Computes smoothed energy in upper bands in FFT domain. Not very flexible to react to frequent changes in effective bandwidth. A more robust algorithm based on computing FFT and detecting significant energy in certain bands was presented in [PCT/US2012/067532]. FFT is computed every 5 ms of the input signal → a computationally intensive solution.
Bandwidth detection (BWD) algorithm The BWD algorithm is based on : computing energies in spectral regions, comparing them to certain thresholds, updating bandwidth-related long-term parameters and counters, selecting the effective bandwidth. The algorithm reuses as much as possible signal buffers and parameters available from the earlier stages of the EVS pre ‑ processing module. EVS primary mode: Complex Modulated Low Delay Filter Bank (CLDFB) algorithm (TF matrix of 16 time slots and several frequency sub-bands (400 Hz each); 4 frequency sub-bands form a frequency band of 1,600 Hz). EVS AMR-WB IO mode: Discrete Cosine Transform (DCT) algorithm (frequency band of 1,500 Hz, Hanning window with constant length of 320 samples).
Energy bands and energy regions log energies in energy bands one to four frequency bands are assigned to each of the spectral regions band #spectral region CLDFB spectrum [kHz] DCT spectrum [kHz] input sampling rate [kHz] 0 nb1.2 – – /32/48 1 wb4.4 – – 7.516/32/ swb9.2 – – / fb16.8 – –
Mean and maximum energy values Log energies per frequency bands are then used to calculate: the mean energy values per spectral region the maximum energy values per spectral region
Long term mean energy values Computed for energy regions only if local_VAD = 1, or if the LT background noise level > 30 dB. The long-term mean energy values are compared to certain thresholds while taking also into account the current maximum values per bandwidth. This results in increasing or decreasing counters for each bandwidth.
Bandwidth decision 1/4 The values of the counters are compared against thresholds to detect a BW change. These thresholds are selected such that the BW change happens with certain hysteresis in order to avoid frequent changes in the detected and subsequently the coded bandwidth.
Bandwidth decision 2/4 Switching from a lower BW to a higher BW is relatively fast to avoid any potential quality degradation due to a loss of high frequency content → short hysteresis (Ω = 10 frames). Switching from a higher BW to a lower BW is relatively slow. While in this case the coding efficiency somewhat decreases, there is no significant quality degradation. → Longer hysteresis (90 frames) is used as a safeguard against misclassification and to eliminate frequent switching.
Bandwidth decision 3/4 Tests are performed in a sequential order. It can thus happen that the decision about the detected BW changes several times before the final decision. If a higher BW is detected, the BW counters for BWs smaller or equal to the detected bandwidth are set to their maximum value of 100. If a lower bandwidth is detected, the BW counters greater or equal to this detected bandwidth are set to their minimal value of 0.
Bandwidth decision 4/4 Finally the detected bandwidth information is used to select the appropriate coding mode with a couple of constraints: 1)In DTX, the bandwidth switching should not happen in the CNG segment so it is postponed until the first active frame. 2)The coded bandwidth can be constrained if the specific bitrate does not support the detected bandwidth.
Performance 1/2 Demonstrated by encoding a band-limited input audio signal (WB signal at 48 kHz sampling rate). 1) LP coding (segmental SNR over the frequency range of 0 – 6.4 kHz for the 13.2 kbps bitrate and over the frequency range of 0 – 8 kHz for the 32 and 64 kbps bitrates): bitrate [kbps] segSNR [dB] encoder + decoder complexity [WMOPS] w/o. BWDw. BWDw/o. BWDw. BWD
Performance 2/2 2) MDCT coding (segmental SNR was measured over only the 0 – 8 kHz frequency range in the transform domain after spectrum quantization): 3) Complexity : CLDFB BWD in EVS primary mode WMOPS DCT BWD in AMR-WB IO mode WMOPS bitrate [kbps] segSNR [dB] encoder + decoder complexity [WMOPS] w/o. BWDw. BWDw/o. BWDw. BWD
Conclusion Bandwidth detection algorithm: Efficient and flexible algorithm, robust to misclassifications. Part of the recently standardized EVS codec. Enhances the codec with a flexibility to effectively encode band ‑ limited signals by detecting the current input audio bandwidth. This information is used to set the codec to its optimal configuration such that the available bit budget is distributed in the more optimal way. The results show that the coding efficiency significantly increases while the computational complexity significantly decreases.
Thank you! More info: 3GPP TS : "EVS Codec Detailed Algorithmic Description".