Philippe Gournay, Bruno Bessette, Roch Lefebvre

Slides:



Advertisements
Similar presentations
Wideband Speech Coding for CDMA2000® Systems
Advertisements

Introduction to H.264 / AVC Video Coding Standard Multimedia Systems Sharif University of Technology November 2008.
Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 1 IEEE Speech Coding Workshop Sept 17–20, 2000 Lake Lawn Resort Delavan, WI Jean-Marc Valin,
Part II (MPEG-4) Audio TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.
CS335 Principles of Multimedia Systems Audio Hao Jiang Computer Science Department Boston College Oct. 11, 2007.
MPEG-1 MUMT-614 Jan.23, 2002 Wes Hatch. Purpose of MPEG encoding To decrease data rate How? –two choices: could decrease sample rate, but this would cause.
Time-Frequency Analysis Analyzing sounds as a sequence of frames
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
Digital Representation of Audio Information Kevin D. Donohue Electrical Engineering University of Kentucky.
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
Speech and Audio Processing and Recognition
DFT Filter Banks Steven Liddell Prof. Justin Jonas.
Speech Coding Nicola Orio Dipartimento di Ingegneria dell’Informazione IV Scuola estiva AISV, 8-12 settembre 2008.
A STUDY OF DESIGN COMPROMISES FOR SPEECH CODERS IN PACKET NETWORKS 1.INTRODUCTION In voice over packet networks, the coding gain achieved by prediction-based.
Speech & Audio Processing
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
MPEG Audio Compression by V. Loumos. Introduction Motion Picture Experts Group (MPEG) International Standards Organization (ISO) First High Fidelity Audio.
Scalable Wavelet Video Coding Using Aliasing- Reduced Hierarchical Motion Compensation Xuguang Yang, Member, IEEE, and Kannan Ramchandran, Member, IEEE.
Audio Coding MPEG1 Layers I, II, III MPEG2MPEG4 Sherida Subrati Anthony Caliendo.
T.Sharon-A.Frank 1 Multimedia Image Compression 2 T.Sharon-A.Frank Coding Techniques – Hybrid.
Warped Linear Prediction Concept: Warp the spectrum to emulate human perception; then perform linear prediction on the result Approaches to warp the spectrum:
Lossy Compression Based on spatial redundancy Measure of spatial redundancy: 2D covariance Cov X (i,j)=  2 e -  (i*i+j*j) Vertical correlation   
Patrick-André Savard, Philippe Gournay and Roch Lefebvre Université de Sherbrooke, Québec, Canada.
Audio Compression Usha Sree CMSC 691M 10/12/04. Motivation Efficient Storage Streaming Interactive Multimedia Applications.
AUDIO COMPRESSION msccomputerscience.com. The process of digitizing audio signals is called PCM PCM involves sampling audio signal at minimum rate which.
1/75 Embedded Audio Coder Jin Li 2/75 Outline Introduction Embedded audio coder - Algorithm MLT with window switching Quantizer Entropy coder Bitstream.
Sergei Hyppenen Supervisor: Professor Sven-Gustav Häggman
Speech Coding Using LPC. What is Speech Coding  Speech coding is the procedure of transforming speech signal into more compact form for Transmission.
INTERPOLATED HALFTONING, REHALFTONING, AND HALFTONE COMPRESSION Prof. Brian L. Evans Collaboration.
UNIVERSITÉ DE SHERBROOKE - Philippe G OURNAY Senior Research Engineer VoiceAge Corporation University of Sherbrooke François R OUSSEAU, Roch L EFEBVRE.
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
Preprocessing Ch2, v.5a1 Chapter 2 : Preprocessing of audio signals in time and frequency domain  Time framing  Frequency model  Fourier transform 
Speech Coding Submitted To: Dr. Mohab Mangoud Submitted By: Nidal Ismail.
Concepts of Multimedia Processing and Transmission IT 481, Lecture #4 Dennis McCaughey, Ph.D. 25 September, 2006.
SPEECH CODING Maryam Zebarjad Alessandro Chiumento.
CIS679: Multimedia Basics r Multimedia data type r Basic compression techniques.
NTT Labs NTT Communication Science Labs. Takehiro Moriya 守谷 健弘 Coding Technologies for Speech and Audio Signals ISPACS 2005.
Submitted By: Santosh Kumar Yadav (111432) M.E. Modular(2011) Under the Supervision of: Mrs. Shano Solanki Assistant Professor, C.S.E NITTTR, Chandigarh.
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
ITU-T G.729 EE8873 Rungsun Munkong March 22, 2004.
1.INTRODUCTION The use of the adaptive codebook (ACB) in CELP-like speech coders allows the achievement of high quality speech, especially for voiced segments.
In CELP coders, the past excitation signal used to build the adaptive codebook is the main source of error propagation when a frame is lost. We presents.
CELLULAR COMMUNICATIONS MIDTERM REVIEW. Representing Oscillations   w is angular frequency    Need two variables to represent a state  Use a single.
Present document contains informations proprietary to France Telecom. Accepting this document means for its recipient he or she recognizes the confidential.
Minjie Xie, Dave Lindbergh, and Peter Chu
A Very Low Bit Rate Protection Layer to Increase the Robustness of the AMR- WB+ Codec against Bit Errors Philippe Gournay Université de Sherbrooke Département.
Voice Sampling. Sampling Rate Nyquist’s theorem states that a signal can be reconstructed if it is sampled at twice the maximum frequency of the signal.
From Error Control to Error Concealment Dr Farokh Marvasti Multimedia Lab King’s College London.
A UDIO B ANDWIDTH D ETECTION IN THE EVS C ODEC University of Sherbrooke, Canada VoiceAge Corporation, Montréal, Canada Fraunhofer IIS, Erlagen, Germany.
A Novel Frequency Domain BWE with Relaxed Synchronization and Associated BWE Switching Lei Miao, Zexin Liu, Xingtao Zhang, Chen Hu, Jon Gibbs Huawei Technologies.
Institut für Nachrichtengeräte und Datenverarbeitung Prof. Dr.-Ing. P. Vary On the Use of Artificial Bandwidth Extension Techniques in Wideband Speech.
Fundamentals of Multimedia Chapter 6 Basics of Digital Audio Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
IEEE GlobalSIP, Orlando, FL, USA, December 14-16, 2015 Enhanced AMR-WB Bandwidth Extension in 3GPP EVS Codec Magdalena Kaniewska, Stéphane Ragot Orange.
MEMORY-LESS GAIN QUANTIZATION IN THE EVS CODEC Vladimir Malenovsky Milan Jelinek University of Sherbrooke/VoiceAge Corp. CANADA.
Introduction to H.264 / AVC Video Coding Standard Multimedia Systems Sharif University of Technology November 2008.
FHTW Wavelet Based Video Compression Using Long Term Memory Motion-Compensated Prediction and Context-based Adaptive Arithmetic Coding D.Marpe, H.L.Cycon,
Digital Communications Chapter 13. Source Coding
Vocoders.
Subject Name: Digital Communication Subject Code:10EC61
Audio Henning Schulzrinne Dept. of Computer Science
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
Fundamentals Data.
Mohamed Chibani, Roch Lefebvre and Philippe Gournay
ON THE ARCHITECTURE OF THE CDMA2000® VARIABLE-RATE MULTIMODE WIDEBAND (VMR-WB) SPEECH CODING STANDARD Milan Jelinek†, Redwan Salami‡, Sassan Ahmadi*, Bruno.
Scalable Speech Coding for IP Networks: Beyond iLBC
Standards Presentation ECE 8873 – Data Compression and Modeling
MPEG-1 Overview of MPEG-1 Standard
Govt. Polytechnic Dhangar(Fatehabad)
Presentation transcript:

Philippe Gournay, Bruno Bessette, Roch Lefebvre Universal Speech and Audio Codec Linear Prediction Domain processing Philippe Gournay, Bruno Bessette, Roch Lefebvre Université de Sherbrooke Département de Génie Electrique et Informatique Sherbrooke, Québec, Canada

Outline The 3GPP AMR-WB+ Standard Changes brought to LPD processing Source of inspiration for LPD processing in USAC Changes brought to LPD processing Forward Aliasing Cancellation Frequency-Domain Noise Shaping Other changes Conclusion More efficient LPD processing Better unification of LPD and non-LPD FD coders

Context The 3GPP AMR-WB+ Standard Hybrid codec Time (ACELP) and Frequency (TCX) Domain Very efficient on speech and speech-over-music contents

The AMR-WB+ Encoder ACELP PACKETIZATION 1 frame Bitstream Audio TCX Mode Selection 1, 2 or 4 frames Mode Index, ISF

AMR-WB+ Frame Structure ACELP Short TCX Medium TCX Long TCX One super-frame = 1024 samples (a) (b) (c) Three out of the 26 possible ACELP/TCX coding configurations

Transitions from ACELP to TCX Zero-input response (ZIR) of LPC weighting filter provides pseudo-windowing Decoded TCX window ACELP Frame 1/8 overlap

Transitions from TCX to ACELP Redundant windowed TCX samples are discarded Decoded TCX window Frame 1/8 overlap ACELP

Limitations of the AMR-WB+ model Non-critically sampled transforms FFT vs. MDCT Inefficiencies at transitions between modes Sub-optimal windowing (from ACELP to TCX) Discarded samples (from TCX to ACELP) Transform windows not aligned with ACELP grid LPC analysis window also shifted to the right Even worse when switching with AAC Time-Domain Aliasing Cancellation (TDAC) Transitions between LPD and non-LPD processing

Changes brought to the LPD processing Replaced FFTs by MDCTs Introduced Frequency Domain Noise Shaping Introduced Forward Aliasing Cancellation Other changes

Frequency Domain Noise Shaping To unify processing of AAC and TCX frames, the MDCT transform in TCX is applied in the original signal domain Noise shaping for TCX frames is performed in the MDCT domain based on LPC filters mapped to the MDCT domain FDNS allows a smooth (sample-by-sample) time-domain noise envelope by applying a 1st-order filtering to the MDCT coefficients (similar in principle to TNS)

Effect of FDNS on the spectral shape and the time-domain envelope of the noise time axis (n) A B C Noise gains g1[m] calculated at time position A Interpolated gains seen in the time domain, for each of the M bands Noise gains g2[m] calculated at time position B Frequency axis (k or m)

Frequency-Domain Noise Shaping FDNS allows a smooth (sample-by-sample) time-domain noise envelope by applying a 1st-order filtering to the MDCT coefficients (similar in principle to TNS)

Forward Aliasing Cancellation Introduced to compensate windowing and time-domain aliasing in MDCT-coded frames when switching to and from ACELP frames Windowing effect and Time Domain Aliasing ACELP synthesis TCX frame output Next ACELP frame - +

Forward Aliasing Cancellation FAC is applied in the original signal domain FAC is quantized in the LPC weighted domain so that quantization noises of FAC and decoded MDCT are of the same nature For transition from ACELP to TCX, the ACELP synthesis can be taken into account; this reduces the bitrate needed to encode FAC

Computation of FAC targets for transitions from and to ACELP (encoder) Signal in the original domain + - TCX frame output ACELP synthesis ACELP contribution TCX frame error (including ACELP contribution) ACELP error LPC1 LPC2 Windowed and folded ACELP synth Windowed ACELP ZIR Line 1 Line 2 Line 3 Line 4 Next ACELP frame FAC target

Quantization of FAC targets W1(z) LPC1 FAC target DCT-IV Q DCT-IV-1 FAC synthesis 1/W1(z) 1/W1(z) ZIR Transmit to decoder Filter memory (ACELP error) Zero memory Transition from ACELP to TCX LPC2 FAC target FAC synthesis W2(z) DCT-IV Q DCT-IV-1 1/W2(z) Filter memory (TCX frame error) Zero memory Transmit to decoder Transition from TC to ACELP

Other changes brought to the LPD processing Critical sampling MDCT vs. FFT FAC+FDNS Scalar quantizer + adaptive arithmetic coder for TCX (AMR-WB+ uses AVQ) Variable bit rate LPC quantizer Bit reservoir adaptation

Conclusion USAC makes use of LPD and non-LPD processing LPD mode inspired by AMR-WB+ Non-LPD mode derived from AAC Substantial changes were brought to the LPD processing, and new tools were introduced to make it more efficient Frequency Domain Noise Shaping (FDNS) Forward Aliasing Cancellation (FAC) USAC is a real unification of two coding models

Thank you for your attention!