A STUDY OF DESIGN COMPROMISES FOR SPEECH CODERS IN PACKET NETWORKS 1.INTRODUCTION In voice over packet networks, the coding gain achieved by prediction-based.

Slides:



Advertisements
Similar presentations
Wideband Speech Coding for CDMA2000® Systems
Advertisements

Introduction to H.264 / AVC Video Coding Standard Multimedia Systems Sharif University of Technology November 2008.
STQ Workshop, Sophia-Antipolis, February 11 th, 2003 Packet loss concealment using audio morphing Franck Bouteille¹ Pascal Scalart² Balazs Kövesi² ¹ PRESCOM.
Philippe Gournay, Bruno Bessette, Roch Lefebvre
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
RTP Payload for Comfort Noise Robert Zopf Lucent Technologies.
VIPER DSPS 1998 Slide 1 A DSP Solution to Error Concealment in Digital Video Eduardo Asbun and Edward J. Delp Video and Image Processing Laboratory (VIPER)
Speech codecs and DCCP with TFRC VoIP mode Magnus Westerlund
Understanding the Internet Low Bit Rate Coder Jan Linden Vice President of Engineering Global IP Sound Presented by Jan Skoglund Sr. Research Scientist.
1 TAC2000/ IP Telephony Lab Perceptual Evaluation of Speech Quality (PESQ) Speaker: Wen-Jen Lin Date: Dec
SCHOOL OF COMPUTING SCIENCE SIMON FRASER UNIVERSITY CMPT 820 : Error Mitigation Schaar and Chou, Multimedia over IP and Wireless Networks: Compression,
Sang-Chun Han Hwangjun Song Jun Heo International Conference on Intelligent Hiding and Multimedia Signal Processing (IIH-MSP), Feb, /05 Feb 2009.
University of Illinois, Urbana-Champaign
Recursive End-to-end Distortion Estimation with Model-based Cross-correlation Approximation Hua Yang, Kenneth Rose Signal Compression Lab University of.
Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec
An Error-Resilient GOP Structure for Robust Video Transmission Tao Fang, Lap-Pui Chau Electrical and Electronic Engineering, Nanyan Techonological University.
Rate-Distortion Optimized Layered Coding with Unequal Error Protection for Robust Internet Video Michael Gallant, Member, IEEE, and Faouzi Kossentini,
2001/05/24Chin-Kai Wu, CS, NTHU1 Improved frame erasure concealment for CELP-based coders Juan Carlos De Martin, Takahiro Unno, Vishu Viswanathan DSPS.
CS294-9 :: Fall 2003 vic and NAÏVE K. Mayer-Patel.
Video Streaming: An FEC-Based Novel Approach Jianfei Cai, Chang Wen Chen Electrical and Computer Engineering, Canadian Conference on.
4/24/2002SCL UCSB1 Optimal End-to-end Distortion Estimation for Drift Management in Scalable Video Coding H. Yang, R. Zhang and K. Rose Signal Compression.
09/24/02ICIP20021 Drift Management and Adaptive Bit Rate Allocation in Scalable Video Coding H. Yang, R. Zhang and K. Rose Signal Compression Lab ECE Department.
Secure Steganography in Audio using Inactive Frames of VoIP Streams
Voice Over Packet Networks Getting the most from your voice codec Philippe Gournay VoiceAge Corp. 750 Lucerne Road, Suite 250 Montreal (Quebec) H3R 2H6.
AUDIO COMPRESSION msccomputerscience.com. The process of digitizing audio signals is called PCM PCM involves sampling audio signal at minimum rate which.
UNIVERSITÉ DE SHERBROOKE - Philippe G OURNAY Senior Research Engineer VoiceAge Corporation University of Sherbrooke François R OUSSEAU, Roch L EFEBVRE.
Evalvid overview. Contents Introduction Framework and Design Functionalities Tools.
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
Speech Coding Submitted To: Dr. Mohab Mangoud Submitted By: Nidal Ismail.
Experiences with Multimedia Streaming over 2.5G and 3G Networks J. Chesterfield, R. Chakravorty, J. Crowcroft, P. Rodriguez, S. Banerjee Presented by Denny.
Rate-distortion Optimized Mode Selection Based on Multi-channel Realizations Markus Gärtner Davide Bertozzi Classroom Presentation 13 th March 2001.
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
ITU-T G.729 EE8873 Rungsun Munkong March 22, 2004.
1.INTRODUCTION The use of the adaptive codebook (ACB) in CELP-like speech coders allows the achievement of high quality speech, especially for voiced segments.
In CELP coders, the past excitation signal used to build the adaptive codebook is the main source of error propagation when a frame is lost. We presents.
TCP-Cognizant Adaptive Forward Error Correction in Wireless Networks
IPTEL'2001, New York, USA1 Lingfen Sun Graham Wade, Benn Lines Emmanuel Ifeachor University of Plymouth, U.K. Impact of Packet Loss Location on Perceived.
Page 1 The department of Information & Communications Engineering Dong-uk, kim A Survey of Packet Loss Recovery Techniques for Streaming.
Comparisons of FEC and Codec Robustness on VoIP Quality and Bandwidth Efficiency Wenyu Jiang Henning Schulzrinne Columbia University ICN 2002, Atlanta,
Present document contains informations proprietary to France Telecom. Accepting this document means for its recipient he or she recognizes the confidential.
Minufiya University Faculty of Electronic Engineering Dep. of Electronic and Communication Eng. 4’th Year Information Theory and Coding Lecture on: Performance.
Voice Coding in 3G Networks
Overview of Digital Video Compression Multimedia Systems and Standards S2 IF Telkom University.
A Very Low Bit Rate Protection Layer to Increase the Robustness of the AMR- WB+ Codec against Bit Errors Philippe Gournay Université de Sherbrooke Département.
3GPP2 Evolution Workshop Multimedia Codecs and Protocols 3GPP2 TSG-C SWG1.2.
Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)
From Error Control to Error Concealment Dr Farokh Marvasti Multimedia Lab King’s College London.
A UDIO B ANDWIDTH D ETECTION IN THE EVS C ODEC University of Sherbrooke, Canada VoiceAge Corporation, Montréal, Canada Fraunhofer IIS, Erlagen, Germany.
CELP / FS-1016 – 4.8kbps Federal Standard in Voice Coding
Institut für Nachrichtengeräte und Datenverarbeitung Prof. Dr.-Ing. P. Vary On the Use of Artificial Bandwidth Extension Techniques in Wideband Speech.
Fundamentals of Multimedia Chapter 17 Wireless Networks 건국대학교 인터넷미디어공학부 임 창 훈.
IEEE GlobalSIP, Orlando, FL, USA, December 14-16, 2015 Enhanced AMR-WB Bandwidth Extension in 3GPP EVS Codec Magdalena Kaniewska, Stéphane Ragot Orange.
MEMORY-LESS GAIN QUANTIZATION IN THE EVS CODEC Vladimir Malenovsky Milan Jelinek University of Sherbrooke/VoiceAge Corp. CANADA.
1 Speech Compression (after first coding) By Allam Mousa Department of Telecommunication Engineering An Najah University SP_3_Compression.
2nd Workshop on Wideband Speech Quality - June nd Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction 22nd.
Introduction to H.264 / AVC Video Coding Standard Multimedia Systems Sharif University of Technology November 2008.
Scalable Speech Coding for IP Networks
Scalable Speech Coding for IP Networks: Beyond iLBC
Vocoders.
Wenyu Jiang Henning Schulzrinne Columbia University
Impact of Packet Loss Location on Perceived Speech Quality
Audio Henning Schulzrinne Dept. of Computer Science
Error recovery for Packet Audio and Video
Mohamed Chibani, Roch Lefebvre and Philippe Gournay
ON THE ARCHITECTURE OF THE CDMA2000® VARIABLE-RATE MULTIMODE WIDEBAND (VMR-WB) SPEECH CODING STANDARD Milan Jelinek†, Redwan Salami‡, Sassan Ahmadi*, Bruno.
Understanding the Internet Low Bit Rate Coder
Scalable Speech Coding for IP Networks: Beyond iLBC
Packet loss concealment using audio morphing
Standards Presentation ECE 8873 – Data Compression and Modeling
Presentation transcript:

A STUDY OF DESIGN COMPROMISES FOR SPEECH CODERS IN PACKET NETWORKS 1.INTRODUCTION In voice over packet networks, the coding gain achieved by prediction-based speech coders is offset by packet losses. Concealment must be applied to the missing packets, which reduces quality for two main reasons : not all missing packets can be concealed, especially when concealment uses only the past signal  onsets, transients the concealment error can propagate over several frames, even frames received correctly  culprit : desynchronisation of the excitation content (LTP) We propose to compare two approaches for alleviating this problem : Adding redundancy to increase the robustness of a baseline predictive encoder (G.729) Using a speech coding model which does not have interframe dependencies ( iLBC ) To be compared, solutions should have comparable bit rates 2. ADDED REDUNDANCY versus FRAME INDEPENDENCE 6. LISTENING TEST RESULTS 7. CONCLUSIONS R (kbps) D (ms) PROPOSED APPROACHES FOR ADDING REDUNDANCY 4. EFFECT ON ERROR PROPAGATION5. SUBJECTIVE EXPERIMENT A formal listening test was conducted to compare the different solutions for increasing the robustness in case of missing packets. The main features of this test are : clean speech, narrowband, IRS filtered 4 male, 4 female speakers 32 naive listeners listening using binaural headphones following guidelines of ITU-T Rec. P conditions in total, including MNRU and other reference conditions 0 – 20% random packet losses, synchronized between iLBC and G ms packet 3rd Packet lost G.729 synthesis G error at decoder G error at decoder G error at decoder G error at decoder G error at decoder iLBC error at decoder (compared to iLBC synthesis without frame loss) 20 ms frame encoded in « absolute » G : Consider only G.729 at 8 kbps (baseline predictive coder) and add redundancy to obtain bit rates similar to iLBC at 15.2 kbps. 20 ms packet (two G.729 frames) P k-1 P k P k+1 F 2k-2 F 2k-1 F 2k F 2k+1 F 2k+2 F 2k+3 G.729 frame packet G G G iLBC G G (Point size proportional to quality at 10 % FER) G : Content of each 20-ms packet : Bit rate and algorithmic delay F 2k-2 F 2k-1 F 2k …… P k-1 PkPk P k+1 F 2k F 2k+1 F 2k+2 F 2k+3 F 2k+4 G / G : F 2k-2 F 2k-1 F’ 2k-3 …… P k-1 PkPk P k+1 F 2k F 2k+1 F’ 2k-1 F 2k+2 F 2k+3 F’ 2k+1 F’ 2k-4 F’ 2k-2 F’ 2k F 2k-2 F 2k-1 F 2k-3 …… P k-1 PkPk P k+1 F 2k F 2k+1 F 2k-1 F 2k+2 F 2k+3 F 2k+1 F 2k-4 F 2k-2 F 2k G : F 2k-2 F 2k-1 …… P k-1 PkPk P k+1 F 2k F 2k+1 F 2k+2 F 2k+3 In G and G.729-3, F’ k denotes F k but without the 18 LSF bits and pitch parity bit (hence, frame F’ k has 19 bits less than frame F k ). The missing ISFs have to be extrapolated at the decoder when a missing frame occurs. G and G differ at the decoder : G : Decode packet P k when it arrives (do not wait for packet P k+1 ). If packet P k is missing, then apply concealment followed by resynchronisation of filter memories using F’ 2k and F’ 2k+1 that are received when packet P k+1 arrives. Then, start decoding packet P k+1. G : Decode packet P k only after packet P k+1 has arrived (additional delay of 20 ms). If packet P k was missing, then just use F’ 2k and F’ 2k+1 that are added as redundancy in packet P k+1. No concealment is applied in this case. G : At the decoder, wait for packet P k+1 before decoding packet P k. G : Every missing 20-ms packet implies that two consecutive 10-ms frames of G.729 are lost. Concealment and propagation introduce large artefacts. G : Every missing 20-ms packet reduces to a single 10-ms frame loss in G.729. Concealment is more optimal, and propagation is reduced. G : Concealment followed by approximate resynchronisation of filter memories. G : Limited concealment (there would be no concealment if F’ was equal to F). G : No effective loss in all single packet losses. ILBC : Concealment, but limited error propagation (only due to post-filtering at decoder to smooth frame transitions). From the test results, we can make the following conclusions : In clean channel conditions, iLBC at 15.2 kbps has equivalent quality to G.729 at 8 kbps (i.e. a much higher bit rate is necessary in a « frame- independent » coder to increase both the quality in clean channel and frame loss conditions).  extreme example = G.711 at 64 kbps The best quality in frame loss conditions was achieved by using a low-rate CELP coder with added redundancy and delay (G.729-4), with a total bit rate close to iLBC (16 kbps compared to 15.2 kbps) The approaches studied to increase robustness represent only a subset of all possible combinations. Only solutions based on a standard CELP-coder (G.729) were considered, with some of them not optimal (ex.: G.729-2). Improved results could be expected by designing a solution without the constraint of using standard core codecs. The G.729 RTP payload can already support solutions G and G Roch Lefebvre,Philippe Gournay University of Sherbrooke Sherbrooke, Quebec, Canada Redwan Salami VoiceAge Corp. Montreal, Quebec, Canada % FER Quality (robustness to frame loss) 0 Codec_P Codec_FI or Codec_P + R Codec_P + R + Delay Codec_P R Redundancy Codec_FI Total payload bit rate Approach 1 : Use a lower bit rate, predictive (CELP) coder, and add channel redundancy to improve robustness to missing frames. Approach 2 : Use a higher bit rate, non-predictive or « frame- independent » codec, to improve robustness to missing frames in the core codec itself. Anticipated gains in quality 10 ms frame Long-term prediction Long-term prediction Past excitation Codec_P : G.729 (CELP-based) Codec_FI : iLBC (Freame-independent)