VMR-WB – Operation of the 3GPP2 Wideband Speech Coding Standard M. Jelinek†, R. Salami‡ and S. Ahmadi * †University of Sherbrooke, Canada ‡VoiceAge Corporation,

Slides:



Advertisements
Similar presentations
Wideband Speech Coding for CDMA2000® Systems
Advertisements

GSM Receiver Key Parameters
McGraw-Hill©The McGraw-Hill Companies, Inc., 2003 Chapter 11 Ethernet Evolution: Fast and Gigabit Ethernet.
Cognitive Radio Communications and Networks: Principles and Practice By A. M. Wyglinski, M. Nekovee, Y. T. Hou (Elsevier, December 2009) 1 Chapter 11 Information.
Cognitive Radio Communications and Networks: Principles and Practice By A. M. Wyglinski, M. Nekovee, Y. T. Hou (Elsevier, December 2009) 1 Chapter 10 User.
Cognitive Radio Communications and Networks: Principles and Practice By A. M. Wyglinski, M. Nekovee, Y. T. Hou (Elsevier, December 2009) 1 Chapter 12 Cross-Layer.
VoIP Models for System Performance Evaluation Farooq Khan IEEE Interim Meeting Vancouver, BC, Canada January 12-16, 2004.
Iterative Equalization and Decoding
Interference Avoidance and Control Ramki Gummadi (MIT) Joint work with Rabin Patra (UCB) Hari Balakrishnan (MIT) Eric Brewer (UCB)
Thema: Menü Ansicht, Master, Folien-Master 1 ITU - IP Telephony Workshop June Standards for IP-telephony P.A.Probst, External Relations Swisscom.
Multimedia Gateways & H.248/ MEGACO ITU Workshop on IP Networking & MEDIACOM-2004 April 26, 2001 Presented by- Clifford Sayre, Lucent Technologies.
The Fully Networked Car Geneva, 4-5 March Automotive Speech Enhancement of Today: Applications, Challenges and Solutions Tim Haulick Harman/Becker.
Multi-service Architecture: Evolution of Network Architecture Keith Knightson Khalid Ahmad Carrier Data Networks Nortel Networks, Canada IP-Networking/Mediacom.
Yves Bellégo France Telecom/Orange Examples of potential future services/applications ITU-R SG8 Seminar.
IP Cablecom and MEDIACOM 2004 Prediction and Monitoring of Quality for VoIP services Quality for VoIP services Vincent Barriac – France Télécom R&D SG12.
19/04/2001 Abossé AKUE-KPAKPO TOGO TELECOM 1 Abossé AKUE-KPAKPO Telecommunication Manager Chief, Internet and Business Services Division Tel. : (228) 21.
1 Wideband Simulation Results European Organisation for the Safety of Air Navigation AGCFG #3 & ACP WG-C#11 Lommaert Luc DAS/CSM September, Brussels.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Environmental Remote Sensing GEOG 2021 Lecture 2 Image display and enhancement.
Chapter 3: PCM Noise and Companding
Scheduled Model Predictive Control of Wind turbines in Above Rated Wind Avishek Kumar Dr Karl Stol Department of Mechanical Engineering.
ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL Preeti Rao and Pushkar Patwardhan Department of Electrical Engineering,
1 Quality of Service Issues Network design and security Lecture 12.
Outline Transmitters (Chapters 3 and 4, Source Coding and Modulation) (week 1 and 2) Receivers (Chapter 5) (week 3 and 4) Received Signal Synchronization.
نیمسال اوّل افشین همّت یار دانشکده مهندسی کامپیوتر مخابرات سیّار (626-40) ظرفیت انتقال اطلاعات.
UMTS system Telenor FoU Josef Noll Page 1 UMTS system & planning aspects, Link and system level simulations aspects related to network.
Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 1 IEEE Speech Coding Workshop Sept 17–20, 2000 Lake Lawn Resort Delavan, WI Jean-Marc Valin,
S Transmission Methods in Telecommunication Systems (5 cr)
ST/SEU-CO | | © Robert Bosch GmbH reserves all rights even in the event of industrial property rights. We reserve all rights of disposal such as copying.
QR026 High Sensitivity VME Tuner Performance Data
Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Tel: WWW: Copyright 2003.
McGraw-Hill©The McGraw-Hill Companies, Inc., 2001 Chapter 16 Integrated Services Digital Network (ISDN)
Part II (MPEG-4) Audio TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.
CELLULAR COMMUNICATIONS. LTE Data Rate Requirements And Targets to LTE  reduced delays, in terms of both connection establishment and transmission.
Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.
N Team 15: Final Presentation Peter Nyberg Azadeh Bararsani Adie Tong N N multicodec minisip.
High Performance 32 Channel ADPCM Codec File Number Here ® LogiCORE Products.
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
Ranko Pinter Simoco Digital Systems
RTP Payload for Comfort Noise Robert Zopf Lucent Technologies.
Speech codecs and DCCP with TFRC VoIP mode Magnus Westerlund
© 2006 AudioCodes Ltd. All rights reserved. AudioCodes Confidential Proprietary Signal Processing Technologies in Voice over IP Eli Shoval Audiocodes.
1 © NOKIA GPP2 Wideband Codec Presentation Interoperable Wideband Speech Coder for CDMA2000 and WCDMA Systems W-VRM: Wideband Variable-Rate Multi-Mode.
1 TAC2000/ IP Telephony Lab Perceptual Evaluation of Speech Quality (PESQ) Speaker: Wen-Jen Lin Date: Dec
Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec
Communications & Multimedia Signal Processing Formant Tracking LP with Harmonic Plus Noise Model of Excitation for Speech Enhancement Qin Yan Communication.
DIGITAL VOICE NETWORKS ECE 421E Tuesday, October 02, 2012.
„Bandwidth Extension of Speech Signals“ 2nd Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction 22nd and 23rd June.
Sergei Hyppenen Supervisor: Professor Sven-Gustav Häggman
Highlights of the Revised VMR-WB RTP Payload and Storage File Formats Sassan Ahmadi, Ph.D. Nokia Inc. USA May 1, 2004 For more information please refer.
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
GSM Mobile Computing IT644.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
ITU-T G.729 EE8873 Rungsun Munkong March 22, 2004.
© NOKIAAMR_MIME.PPT / / AL page: 1 MIME type registration of AMR speech codec draft-lakaniemi-avt-mime-amr-00.txt draft-wimmer-avt-mime-amr-00.txt.
Present document contains informations proprietary to France Telecom. Accepting this document means for its recipient he or she recognizes the confidential.
DVSI HX-SD™ Selectable Mode Vocoder
Voice Coding in 3G Networks
Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)
A UDIO B ANDWIDTH D ETECTION IN THE EVS C ODEC University of Sherbrooke, Canada VoiceAge Corporation, Montréal, Canada Fraunhofer IIS, Erlagen, Germany.
A Novel Frequency Domain BWE with Relaxed Synchronization and Associated BWE Switching Lei Miao, Zexin Liu, Xingtao Zhang, Chen Hu, Jon Gibbs Huawei Technologies.
Institut für Nachrichtengeräte und Datenverarbeitung Prof. Dr.-Ing. P. Vary On the Use of Artificial Bandwidth Extension Techniques in Wideband Speech.
MEMORY-LESS GAIN QUANTIZATION IN THE EVS CODEC Vladimir Malenovsky Milan Jelinek University of Sherbrooke/VoiceAge Corp. CANADA.
Scalable Speech Coding for IP Networks
Vocoders.
Audio Henning Schulzrinne Dept. of Computer Science
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
Mohamed Chibani, Roch Lefebvre and Philippe Gournay
ON THE ARCHITECTURE OF THE CDMA2000® VARIABLE-RATE MULTIMODE WIDEBAND (VMR-WB) SPEECH CODING STANDARD Milan Jelinek†, Redwan Salami‡, Sassan Ahmadi*, Bruno.
Vocoders.
Presentation transcript:

VMR-WB – Operation of the 3GPP2 Wideband Speech Coding Standard M. Jelinek†, R. Salami‡ and S. Ahmadi * †University of Sherbrooke, Canada ‡VoiceAge Corporation, Canada * Nokia Inc., USA

VMR-WB key features Background VMR-WB rate selection AMR-WB ↔ VMR-WB interoperation Performance Outline

VMR-WB Key Features Variable-Rate Multi-Mode Wideband Speech Codec New 3GPP2 WB speech coding standard for 3G applications Near face-to-face communication speech quality

VMR-WB Key Features Variable-Rate Multi-Mode Wideband Speech Codec New 3GPP2 WB speech coding standard for 3G applications Near face-to-face communication speech quality Source and network controlled operation (4 modes)

VMR-WB Key Features Variable-Rate Multi-Mode Wideband Speech Codec New 3GPP2 WB speech coding standard for 3G applications Near face-to-face communication speech quality Source and network controlled operation (4 modes) 3GPP/ITU AMR-WB interoperable in mode 3

VMR-WB Key Features Variable-Rate Multi-Mode Wideband Speech Codec New 3GPP2 WB speech coding standard for 3G applications Near face-to-face communication speech quality Source and network controlled operation (4 modes) 3GPP/ITU AMR-WB interoperable in mode 3 Compliant with CDMA2000 rate set 2

VMR-WB Key Features Variable-Rate Multi-Mode Wideband Speech Codec New 3GPP2 WB speech coding standard for 3G applications Near face-to-face communication speech quality Source and network controlled operation (4 modes) 3GPP/ITU AMR-WB interoperable in mode 3 Compliant with CDMA2000 rate set 2 WB ( HZ) and NB ( Hz) input/output

VMR-WB Key Features Variable-Rate Multi-Mode Wideband Speech Codec New 3GPP2 WB speech coding standard for 3G applications Near face-to-face communication speech quality Source and network controlled operation (4 modes) 3GPP/ITU AMR-WB interoperable in mode 3 Compliant with CDMA2000 rate set 2 WB ( HZ) and NB ( Hz) input/output 20 ms frames

VMR-WB Key Features Variable-Rate Multi-Mode Wideband Speech Codec New 3GPP2 WB speech coding standard for 3G applications Near face-to-face communication speech quality Source and network controlled operation (4 modes) 3GPP/ITU AMR-WB interoperable in mode 3 Compliant with CDMA2000 rate set 2 WB ( HZ) and NB ( Hz) input/output 20 ms frames Noise reduction with adjustable maximum reduction

Background (1) Wideband vs. “telephony” speech signal Unvoiced spectrum, male speakerVoiced spectrum, male speaker

Background (2) 1.AMR-WB (Adaptive Multirate Wideband) Standardisation: ETSI/3GPP (Europe, Asia, northern Africa) Selected: December 2000 Applications: GSM, 3G WCDMA Wideband speech coding standardizations:

Background (2) 1.AMR-WB (Adaptive Multirate Wideband) Standardisation: ETSI/3GPP (Europe, Asia, northern Africa) Selected: December 2000 Applications: GSM, 3G WCDMA 2.Recommendation G Standardization: ITU-T (worldwide) Selected: July 2001 Applications: wideband telephony, teleconferencing, voice over IP, internet applications, … Wideband speech coding standardizations:

Background (2) 1.AMR-WB (Adaptive Multirate Wideband) Standardisation: ETSI/3GPP (Europe, Asia, northern Africa) Selected: December 2000 Applications: GSM, 3G WCDMA 2.Recommendation G Standardization: ITU-T (worldwide) Selected: July 2001 Applications: wideband telephony, teleconferencing, voice over IP, internet applications, … 3.VMR-WB Standardizations: TIA/3GPP2 (North America, Asia) Selected: April 2003 Applications: 3G CDMA2000 Wideband speech coding standardizations:

Background (3) AMR-WB rate adaptation to prevailing radio channel conditions AMR-WB bitrates: Mode kb/s Mode kb/s Mode kb/s Mode kb/s Mode kb/s Mode kb/s Mode kb/s Mode kb/s Mode kb/s

Background (3) Example of AMR-WB mode adaptation in GSM Full Rate channel AMR-WB rate adaptation to prevailing radio channel conditions AMR-WB bitrates: Mode kb/s Mode kb/s Mode kb/s Mode kb/s Mode kb/s Mode kb/s Mode kb/s Mode kb/s Mode kb/s

VMR-WB rate selection (1) Variable bitrate codec The average bitrate (ABR) is controlled by 1.System: defining operating mode, i.e. the target ABR

VMR-WB rate selection (1) Variable bitrate codec The average bitrate (ABR) is controlled by 1.System: defining operating mode, i.e. the target ABR 2.Source: the actual bitrate is chosen based on the information content in every speech frame

VMR-WB rate selection (1) Variable bitrate codec The average bitrate (ABR) is controlled by 1.System: defining operating mode, i.e. the target ABR 2.Source: the actual bitrate is chosen based on the information content in every speech frame Building blocks: (CDMA2000 allowed bitrates) FR: 13.3 kb/s HR: 6.2 kb/s QR: 2.7 kb/s ER: 1.0 kb/s

VMR-WB rate selection (1) Variable bitrate codec The average bitrate (ABR) is controlled by 1.System: defining operating mode, i.e. the target ABR 2.Source: the actual bitrate is chosen based on the information content in every speech frame Building blocks: (CDMA2000 allowed bitrates) FR: 13.3 kb/s HR: 6.2 kb/s QR: 2.7 kb/s ER: 1.0 kb/s Active speech kbit/s 40% Speech Activity kbit/s Mode Mode Mode Mode VMR-WB ABRs:

VMR-WB rate selection (2) 1. Voice Activity? 2. Unvoiced Frame? 3. Voiced Frame? 4. Low Energy? CNG Encoding or DTX (ER) Unvoiced Speech Optimized HR or QR Encoding Voiced Speech Optimized HR Encoding Generic HR Encoding Generic FR Encoding Yes No Hierarchical Signal Classification Operating on Frame-level CNG – Comfort noise generation DTX – Discontinuous transmission

Spectral Analysis LP Analysis Pitch Tracking, Voicing f c Noise Reduction Noise Estimation Up Voice Activity? = f(SNR) Parameters Speech De-noised Speech Noise Estimation Down Voice Activity? ≠ f(SNR) No Update VMR-WB rate selection (3) 1. Voice Activity Detection (VAD) VAD decision

1. Voice Activity? 2. Unvoiced Frame? 3. Voiced Frame? 4. Low Energy? CNG Encoding or DTX Unvoiced Speech Optimized HR or QR Encoding Voiced Speech Optimized HR Encoding Generic HR Encoding Generic FR Encoding Yes No Hierarchical Signal Classification Operating on Frame-level CNG – Comfort noise generation DTX – Discontinuous transmission

VMR-WB rate selection (4) 2. Unvoiced Frame Decision Normalized correlation T – open-loop pitch period estimate x i – perceptually weighted input signal Based on the following parameters:

VMR-WB rate selection (4) 2. Unvoiced Frame Decision Normalized correlation T – open-loop pitch period estimate x i – perceptually weighted input signal Spectral tilt Based on the following parameters:

Unvoiced spectrum, male speakerVoiced spectrum, male speaker

VMR-WB rate selection (4) 2. Unvoiced Frame Decision Normalized correlation T – open-loop pitch period estimate x i – perceptually weighted input signal Spectral tilt E h – average energy of last 2 critical bands. E l – average energy of pitch-synchronous bins in the first 10 critical bands Based on the following parameters:

VMR-WB rate selection (4) 2. Unvoiced Frame Decision Normalized correlation T – open-loop pitch period estimate x i – perceptually weighted input signal Spectral tilt Relative frame energy with respect to long-term average E h – average energy of last 2 critical bands. E l – average energy of pitch-synchronous bins in the first 10 critical bands Based on the following parameters:

VMR-WB rate selection (4) 2. Unvoiced Frame Decision Normalized correlation T – open-loop pitch period estimate x i – perceptually weighted input signal Spectral tilt Energy variation within a frame Relative frame energy with respect to long-term average E h – average energy of last 2 critical bands. E l – average energy of pitch-synchronous bins in the first 10 critical bands Based on the following parameters:

1. Voice Activity? 2. Unvoiced Frame? 3. Voiced Frame? 4. Low Energy? CNG Encoding or DTX Unvoiced Speech Optimized HR or QR Encoding Voiced Speech Optimized HR Encoding Generic HR Encoding Generic FR Encoding Yes No Hierarchical Signal Classification Operating on Frame-level CNG – Comfort noise generation DTX – Discontinuous transmission

VMR-WB rate selection (5) 3. Voiced Frame Decision / Signal Modification Voiced decision is an inherent part of original Signal Modification Algorithm i.e. frame is coded as voiced if all constraints of the modification are satisfied

VMR-WB rate selection (5) 3. Voiced Frame Decision / Signal Modification Signal modification features: pitch-period synchronous Voiced decision is an inherent part of original Signal Modification Algorithm i.e. frame is coded as voiced if all constraints of the modification are satisfied

VMR-WB rate selection (5) 3. Voiced Frame Decision / Signal Modification Signal modification features: pitch-period synchronous Pitch period evolution is piecewise linear (constant at frame end) to avoid pitch period oscillations Voiced decision is an inherent part of original Signal Modification Algorithm i.e. frame is coded as voiced if all constraints of the modification are satisfied

VMR-WB rate selection (5) 3. Voiced Frame Decision / Signal Modification Signal modification features: pitch-period synchronous Pitch period evolution is piecewise linear (constant at frame end) to avoid pitch period oscillations Modified input is synchronous with original input at frame end Voiced decision is an inherent part of original Signal Modification Algorithm i.e. frame is coded as voiced if all constraints of the modification are satisfied

VMR-WB rate selection (5) 3. Voiced Frame Decision / Signal Modification Signal modification features: pitch-period synchronous Pitch period evolution is piecewise linear (constant at frame end) to avoid pitch period oscillations Modified input is synchronous with original input at frame end Voiced decision is an inherent part of original Signal Modification Algorithm i.e. frame is coded as voiced if all constraints of the modification are satisfied

VMR-WB rate selection (2) 1. Voice Activity? 2. Unvoiced Frame? 3. Voiced Frame? 4. Low Energy? CNG Encoding or DTX Unvoiced Speech Optimized HR or QR Encoding Voiced Speech Optimized HR Encoding Generic HR Encoding Generic FR Encoding Yes No Hierarchical Signal Classification Operating on Frame-level CNG – Comfort noise generation DTX – Discontinuous transmission

VMR-WB rate selection (6) 4. Low Energy Decision Purpose: Avoid encoding unclassified frames with low perceptual importance at Full Rate

VMR-WB rate selection (6) 4. Low Energy Decision Purpose: Avoid encoding unclassified frames with low perceptual importance at Full Rate Condition: E t – sum of critical band energies for current frame, in dB E f – long-term mean of E t for active speech

VMR-WB rate selection (6) 4. Low Energy Decision Purpose: Avoid encoding unclassified frames with low perceptual importance at Full Rate Condition: E t – sum of critical band energies for current frame, in dB E f – long-term mean of E t for active speech Example: Typical example of a low-energy frame encoded with Generic HR in mode 2

VMR-WB rate selection (7) System-Controlled Operation - 4 Operational Modes -Mode 3: Interoperable with modes 0, 1, 2 of AMR-WB -Modes 0, 1, 2 chosen depending on network capacity and the desired quality of service - Transparent Memoryless Mode Switching

VMR-WB rate selection (7) System-Controlled Operation - 4 Operational Modes -Mode 3: Interoperable with modes 0, 1, 2 of AMR-WB -Modes 0, 1, 2 chosen depending on network capacity and the desired quality of service - Transparent Memoryless Mode Switching Coding TypeMode 0Mode 1Mode 2Mode 3 Generic FR93.4 %60.4 %34.1 %- Interoperable FR % Generic HR-7.1 %13.1 %- Voiced HR-13.0 %33.2 %- Unvoiced HR6.6 %19.5 %5.6 %- Unvoiced QR %- Usage of different coding techniques during active speech:

AMR-WB ↔ VMR-WB interoperation (1) Problems: –DTX transmission of AMR-WB vs. continuous transmission of VMR-WB

AMR-WB ↔ VMR-WB interoperation (1) Problems: –DTX transmission of AMR-WB vs. continuous transmission of VMR-WB –Different bitstream sizes

AMR-WB ↔ VMR-WB interoperation (1) Problems: –DTX transmission of AMR-WB vs. continuous transmission of VMR-WB –Different bitstream sizes –AMR-WB DTX hangover too long for 3GPP2 systems

AMR-WB ↔ VMR-WB interoperation (1) Problems: –DTX transmission of AMR-WB vs. continuous transmission of VMR-WB –Different bitstream sizes –AMR-WB DTX hangover too long for 3GPP2 systems –In-band signalling of 3GPP2 systems

AMR-WB ↔ VMR-WB interoperation (2) AMR-WB → VMR-WB link AMR-WB encoder VMR-WB decoder Maximum HR request VAD = kb/s frame No-data frame CNG-update frame CNG QR frame Void ER frame Interoperable FR Interoperable HR In case of maximum HR request, ACELP innovation indices ares discarded at the gateway and regenerated randomly at the decoder System interface

AMR-WB ↔ VMR-WB interoperation (3) VMR-WB → AMR-WB link VMR-WB encoder AMR-WB decoder Generate innovation kb/s frame No-data frame CNG-update frame CNG QR frame ER frame Interoperable FR Interoperable HR In case of Interoperable HR frame, ACELP innovation indices are generated at the gateway so that the bitstream is transparent for AMR-WB decoder System interface

AMR-WB ↔ VMR-WB interoperation (4) Performance of the interoperable links

Performance Performance on WB speech: Selection test: –modes 0, 1 & 2 evaluted in 3 experiments. –VMR-WB outperformed all other candidates in all experiments, for all 3 modes

Performance Performance on WB speech: Selection test: –modes 0, 1 & 2 evaluted in 3 experiments. –VMR-WB outperformed all other candidates in all experiments, for all 3 modes Performance on NB speech: