ON THE ARCHITECTURE OF THE CDMA2000® VARIABLE-RATE MULTIMODE WIDEBAND (VMR-WB) SPEECH CODING STANDARD Milan Jelinek†, Redwan Salami‡, Sassan Ahmadi*, Bruno.

Slides:



Advertisements
Similar presentations
Wideband Speech Coding for CDMA2000® Systems
Advertisements

ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL Preeti Rao and Pushkar Patwardhan Department of Electrical Engineering,
VMR-WB – Operation of the 3GPP2 Wideband Speech Coding Standard M. Jelinek†, R. Salami‡ and S. Ahmadi * †University of Sherbrooke, Canada ‡VoiceAge Corporation,
Part II (MPEG-4) Audio TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.
Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.
Philippe Gournay, Bruno Bessette, Roch Lefebvre
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
Ranko Pinter Simoco Digital Systems
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
Speech codecs and DCCP with TFRC VoIP mode Magnus Westerlund
A STUDY OF DESIGN COMPROMISES FOR SPEECH CODERS IN PACKET NETWORKS 1.INTRODUCTION In voice over packet networks, the coding gain achieved by prediction-based.
Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec
Communications & Multimedia Signal Processing Meeting 7 Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel University 23 November,
MPEG Audio Compression by V. Loumos. Introduction Motion Picture Experts Group (MPEG) International Standards Organization (ISO) First High Fidelity Audio.
Communications & Multimedia Signal Processing Formant Tracking LP with Harmonic Plus Noise Model of Excitation for Speech Enhancement Qin Yan Communication.
Communications & Multimedia Signal Processing Refinement in FTLP-HNM system for Speech Enhancement Qin Yan Communication & Multimedia Signal Processing.
A Full Frequency Masking Vocoder for Legal Eavesdropping Conversation Recording R. F. B. Sotero Filho, H. M. de Oliveira (qPGOM), R. Campello de Souza.
LE 460 L Acoustics and Experimental Phonetics L-13
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
„Bandwidth Extension of Speech Signals“ 2nd Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction 22nd and 23rd June.
INTRODUCTION  Sibilant speech is aperiodic.  the fricatives /s/, / ʃ /, /z/ and / Ʒ / and the affricatives /t ʃ / and /d Ʒ /  we present a sibilant.
Voice Over Packet Networks Getting the most from your voice codec Philippe Gournay VoiceAge Corp. 750 Lucerne Road, Suite 250 Montreal (Quebec) H3R 2H6.
Sergei Hyppenen Supervisor: Professor Sven-Gustav Häggman
UNIVERSITÉ DE SHERBROOKE - Philippe G OURNAY Senior Research Engineer VoiceAge Corporation University of Sherbrooke François R OUSSEAU, Roch L EFEBVRE.
Highlights of the Revised VMR-WB RTP Payload and Storage File Formats Sassan Ahmadi, Ph.D. Nokia Inc. USA May 1, 2004 For more information please refer.
Speech Enhancement Using Spectral Subtraction
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
1 Linear Prediction. Outline Windowing LPC Introduction to Vocoders Excitation modeling  Pitch Detection.
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
ITU-T G.729 EE8873 Rungsun Munkong March 22, 2004.
1.INTRODUCTION The use of the adaptive codebook (ACB) in CELP-like speech coders allows the achievement of high quality speech, especially for voiced segments.
In CELP coders, the past excitation signal used to build the adaptive codebook is the main source of error propagation when a frame is lost. We presents.
Present document contains informations proprietary to France Telecom. Accepting this document means for its recipient he or she recognizes the confidential.
Voice Coding in 3G Networks
A Very Low Bit Rate Protection Layer to Increase the Robustness of the AMR- WB+ Codec against Bit Errors Philippe Gournay Université de Sherbrooke Département.
Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)
Voice Sampling. Sampling Rate Nyquist’s theorem states that a signal can be reconstructed if it is sampled at twice the maximum frequency of the signal.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
A UDIO B ANDWIDTH D ETECTION IN THE EVS C ODEC University of Sherbrooke, Canada VoiceAge Corporation, Montréal, Canada Fraunhofer IIS, Erlagen, Germany.
CELP / FS-1016 – 4.8kbps Federal Standard in Voice Coding
A Novel Frequency Domain BWE with Relaxed Synchronization and Associated BWE Switching Lei Miao, Zexin Liu, Xingtao Zhang, Chen Hu, Jon Gibbs Huawei Technologies.
Institut für Nachrichtengeräte und Datenverarbeitung Prof. Dr.-Ing. P. Vary On the Use of Artificial Bandwidth Extension Techniques in Wideband Speech.
IEEE GlobalSIP, Orlando, FL, USA, December 14-16, 2015 Enhanced AMR-WB Bandwidth Extension in 3GPP EVS Codec Magdalena Kaniewska, Stéphane Ragot Orange.
MEMORY-LESS GAIN QUANTIZATION IN THE EVS CODEC Vladimir Malenovsky Milan Jelinek University of Sherbrooke/VoiceAge Corp. CANADA.
2nd Workshop on Wideband Speech Quality - June nd Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction 22nd.
PART II: TRANSIENT SUPPRESSION. IntroductionIntroduction Cohen, Gannot and Talmon\11 2 Transient Interference Suppression Transient Interference Suppression.
Presented by: Class Presentation of Custom DSP Implementation Course on: This is a class presentation. All data are copy rights of their respective authors.
Voice Activity Detection Based on Sequential Gaussian Mixture Model Zhan Shen, Jianguo Wei, Wenhuan Lu, Jianwu Dang Tianjin Key Laboratory of Cognitive.
Codec 2 open source speech codec
Random signals Honza Černocký, ÚPGM.
Speech Enhancement Summer 2009
Speech over Packet Networks Variable Jitter Buffering Decoder-Based Time-Scaling Performance Analysis Performance Analysis of a Decoder-Based Time-Scaling.
Techniques to control noise and fading
Scalable Speech Coding for IP Networks
Digital Communications Chapter 13. Source Coding
Vocoders.
Speech Enhancement with Binaural Cues Derived from a Priori Codebook
Audio Henning Schulzrinne Dept. of Computer Science
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
Mohamed Chibani, Roch Lefebvre and Philippe Gournay
Linear Predictive Coding Methods
Mobile Systems Workshop 1 Narrow band speech coding for mobile phones
Two-Stage Mel-Warped Wiener Filter SNR-Dependent Waveform Processing
Statistical Models for Automatic Speech Recognition
Vocoders.
Linear Prediction.
Presentation transcript:

ON THE ARCHITECTURE OF THE CDMA2000® VARIABLE-RATE MULTIMODE WIDEBAND (VMR-WB) SPEECH CODING STANDARD Milan Jelinek†, Redwan Salami‡, Sassan Ahmadi*, Bruno Bessette†, Philippe Gournay†‡ and Claude Laflamme† †University of Sherbrooke, Canada - ‡VoiceAge Corp., Canada - *Nokia inc., USA VMR-WB Variable-Rate Multi-Mode Wideband Speech Codec New 3GPP2 WB Speech Coding Standard for 3G applications Main Features: Near Face-to-Face Communication Speech Quality Source and Channel Controlled Operation (4 Modes) 3GPP/ITU AMR-WB Directly Interoperable in Mode 3 Average Bit Rates (ABR): Compliant with CDMA2000 Rate Set 2 - 13.3 (FR), 6.2 (HR) , 2.7 (QR) or 1.0 (ER) kbit/s frames WB (50-7000 HZ) and NB (200-3400 Hz) Input/Output 20 ms Frames Noise Reduction with Adjustable Maximum Reduction Encoder Flow Chart VMR-WB Coding Techniques Source-Controlled Operation Hierarchical Signal Classification Operating on Frame-level 1. Voice Activity Detection (VAD) 2. Unvoiced Frame Decision Spectral Analysis LP Analysis Pitch Tracking Noise Reduction Noise Estimation Voice Activity? Voice Activity Decision: Parameters Input De-noised lower for noisy speech higher for clean speech Based on the following parameters: Coding Type Bitrate kbit/s Description Inactive Speech Coding CNG ER 1.0 -Noise excited LP filter -Smoothed over time CNG QR 2.7 -As previous, but interoperable with AMR-WB CNG Unvoiced Coding Unvoiced HR 6.2 -13 bit Gaussian codebook (4x/frame) Unvoiced QR -As previous, but randomly chosen vectors Voiced Coding Voiced HR -Frame level signal modification -12 bit ACELP codebook (4x/frame) Generic Coding Interoperable FR 13.3 -Similar to AMR-WB @ 12.65 kbit/s Generic FR -As previous + FER protection Interoperable HR -As Interoperable FR, but with random algebraic codebook indices Signaling HR Generic HR -Pitch coded 2x/frame Normalized Correlation T – open-loop pitch period estimate xi – perceptually weighted input signal Begin 1. Voice Activity? 2. Unvoiced Frame? 3. Voiced Frame? 4. Low Energy? CNG Encoding or DTX Unvoiced Speech Optimized Encoding Voiced Speech Optimized Encoding Generic HR Encoding Generic FR Encoding Yes No Spectral Tilt Eh – average energy of last 2 critical bands. El – average energy of pitch-synchronous bins in the first 10 critical bands Active speech kbit/s 40% Speech Activity Mode 3 13.3 6.1 Mode 0 12.8 5.7 Mode 1 10.5 4.8 Mode 2 8.1 3.8 Frame Energy Variation Noise Estimation Update Decision: Based on parameters with low sensitivity to noise level: Pitch period varying AND normalized correlation at pitch period low AND low estimated order of AR model AND signal energy stationary INDEPENDENT of VAD decision! - Robust to noise level variations - Conservative approach: the noise estimation is updated only if quite sure the frame is inactive E32(j) – energy maximum in a bloc of 32-samples Relative Frame Energy - Erel Decision: 3. Voiced Frame Decision / Signal Modification 4. Low Energy Decision Channel-Controlled Operation 4 Operational Modes Controlled by Channel Conditions Transparent Memory-less Mode Switching Per-Frame Bit Rate Control Capability   Coding Types Relative Usage in Active Speech: Mode Switching Performance: Comparing MOS scores of modes 0, 1, 2 with random mode switching at 0.5, 1 and 5 second intervals (from characterization test) Enhancements at Decoder Low Frequency Post-processing: Enhancement of the periodicity in low frequency region: Performance (MOS scores from selection test) CDMA Specific Modes (Modes 0, 1, 2), WB Input Performance (MOS scores from characterization test) Voiced Decision is an Inherent Part of Original Signal Modification Algorithm Frame is coded as voiced if all constraints of the modification are satisfied Signal modification is done pitch-synchronously Pitch period evolution is piecewise linear (constant at frame end) to avoid pitch period oscillations Modified input is synchronous with original input at frame end Modification is transparent at least up to 30% of active speech frames (in the example bellow, no coding is used and 30 % of active clean speech frames are modified) NB Input Test Modes 0, 1, 2, 3, Clean speech, nominal level Test on Interworking with AMR-WB @ 12.65 kbit/s -WB input, clean speech conditions Purpose: To avoid encoding unclassified frames with low perceptual importance at Full Rate Condition: 2000 Hz Ref 0 – AMR-WB @ 14.25 Ref 1 – AMR-WB @ 12.65 Ref 2 – AMR-WB @ 8.85 Test 0 – VMR-WB Mode 0 Test 1 – VMR-WB Mode 1 Test 2 – VMR-WB Mode 2 Coding Type Mode 0 Mode 1 Mode 2 Mode 3 Generic FR 93.4 % 60.4 % 34.1 % - Interoperable FR 100.0 % Generic HR 7.1 % 13.1 % Voiced HR 13.0 % 33.2 % Unvoiced HR 6.6 % 19.5 % 5.6 % Unvoiced QR 14.0 % Et – sum of critical band energies for current frame, in dB Ef – long-term mean of Et for active speech Clean Speech Conditions: Example: Typical example of a low-energy frame encoded with Generic HR in mode 2 Frame Errors Concealment: Lost Frame Concealment: Excitation energy and spectral envelope converge to estimated noise. Excitation periodicity converges to 0. Convergence rate depends on the signal class of last good frame. Recovery after erasure: Careful energy control of synthesized speech. Artificial onset reconstruction in case of lost voiced onset. Channel Error Conditions:   Background Noise Conditions: