Speech-Coding Techniques Chapter 3. Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones.

Slides:

Advertisements

Similar presentations

Speech Coding Techniques

Advertisements

Speech Coding EE 516 Spring 2009

Speech Processing for NSR Vs DSR Veeru Ramaswamy PhD CTO, Vianix LLC

Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.

Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.

Ranko Pinter Simoco Digital Systems

Digital Representation of Audio Information Kevin D. Donohue Electrical Engineering University of Kentucky.

Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.

CELLULAR COMMUNICATIONS 5. Speech Coding. Low Bit-rate Voice Coding  Voice is an analogue signal  Needed to be transformed in a digital form (bits)

Speech codecs and DCCP with TFRC VoIP mode Magnus Westerlund

© 2006 AudioCodes Ltd. All rights reserved. AudioCodes Confidential Proprietary Signal Processing Technologies in Voice over IP Eli Shoval Audiocodes.

Speech Coding Nicola Orio Dipartimento di Ingegneria dell’Informazione IV Scuola estiva AISV, 8-12 settembre 2008.

Understanding the Internet Low Bit Rate Coder Jan Linden Vice President of Engineering Global IP Sound Presented by Jan Skoglund Sr. Research Scientist.

Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.

Digital Voice Communication Link EE 413 – TEAM 2 April 21 st, 2005.

© 2006 Cisco Systems, Inc. All rights reserved. 2.2: Digitizing and Packetizing Voice.

Fundamental of Wireless Communications ELCT 332Fall C H A P T E R 6 SAMPLING AND ANALOG-TO-DIGITAL CONVERSION.

COMP 249 :: Spring 2005 Slide: 1 Audio Coding Ketan Mayer-Patel.

Waveform SpeechCoding Algorithms: An Overview

331: STUDY DATA COMMUNICATIONS AND NETWORKS.  1. Discuss computer networks (5 hrs)  2. Discuss data communications (15 hrs)

1 Audio Compression Multimedia Systems (Module 4 Lesson 4) Summary: r Simple Audio Compression: m Lossy: Prediction based r Psychoacoustic Model r MPEG.

CS :: Fall 2003 Audio Coding Ketan Mayer-Patel.

Chapter Seven: Digital Communication

© 2006 Cisco Systems, Inc. All rights reserved. QOS Lecture 2 - Introducing VoIP Networks.

GODIAN MABINDAH RUTHERFORD UNUSI RICHARD MWANGI.  Differential coding operates by making numbers small. This is a major goal in compression technology:

Improving Voice Quality in International Mobile-to-Mobile Calls Aram Falsafi, Seattle, WA PIMRC September 2008.

LECTURE Copyright  1998, Texas Instruments Incorporated All Rights Reserved Encoding of Waveforms Encoding of Waveforms to Compress Information.

AUDIO COMPRESSION msccomputerscience.com. The process of digitizing audio signals is called PCM PCM involves sampling audio signal at minimum rate which.

Sergei Hyppenen Supervisor: Professor Sven-Gustav Häggman

Speech Coding Using LPC. What is Speech Coding  Speech coding is the procedure of transforming speech signal into more compact form for Transmission.

Multimedia Data Speech and Audio Dr Sandra I. Woolley Electronic, Electrical and Computer Engineering.

Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.

Audio Henning Schulzrinne Dept. of Computer Science Columbia University Fall 2003.

Speech Coding Submitted To: Dr. Mohab Mangoud Submitted By: Nidal Ismail.

CE Digital Signal Processing Fall 1992 Waveform Coding Hossein Sameti Department of Computer Engineering Sharif University of Technology.

1 Linear Prediction. Outline Windowing LPC Introduction to Vocoders Excitation modeling  Pitch Detection.

Code : STM#220 Samsung Electronics Co., Ltd. IP Telephony System Error Handling & Management IP Telephony System Error Handling & Management Distribution.

© 2006 Cisco Systems, Inc. All rights reserved. Optimizing Converged Cisco Networks (ONT) Module 2: Cisco VoIP Implementations.

Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,

1 PCM & DPCM & DM. 2 Pulse-Code Modulation (PCM) : In PCM each sample of the signal is quantized to one of the amplitude levels, where B is the number.

1 Audio Compression. 2 Digital Audio  Human auditory system is much more sensitive to quality degradation then is the human visual system  redundancy.

Compression No. 1  Seattle Pacific University Data Compression Kevin Bolding Electrical Engineering Seattle Pacific University.

1 Speech Synthesis User friendly machine must have complete voice communication abilities Voice communication involves Speech synthesis Speech recognition.

Submitted By: Santosh Kumar Yadav (111432) M.E. Modular(2011) Under the Supervision of: Mrs. Shano Solanki Assistant Professor, C.S.E NITTTR, Chandigarh.

CS Spring 2009 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2009.

ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska

VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.

Digital Multiplexing 1- Pulse Code Modulation 2- Plesiochronous Digital Hierarchy 3- Synchronous Digital Hierarchy.

LOG Objectives  Describe some of the VoIP implementation challenges such as Delay/Latency, Jitter, Echo, and Packet Loss  Describe the voice encoding.

ITU-T G.729 EE8873 Rungsun Munkong March 22, 2004.

Digital Audio III. Sound compression (I) Compression of sound data requires different techniques from those for graphical data Requirements are less stringent.

1 Audio Coding. 2 Digitization Processing Signal encoder Signal decoder samplingquantization storage Analog signal Digital data.

4.2 Digital Transmission Pulse Modulation Pulse Code Modulation

SPEECH CODING Maryam Zebarjad Alessandro Chiumento Supervisor : Sylwester Szczpaniak.

Voice Coding in 3G Networks

Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)

CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2014.

Voice Sampling. Sampling Rate Nyquist’s theorem states that a signal can be reconstructed if it is sampled at twice the maximum frequency of the signal.

CELP / FS-1016 – 4.8kbps Federal Standard in Voice Coding

Fundamentals of Multimedia Chapter 6 Basics of Digital Audio Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.

1 Speech Compression (after first coding) By Allam Mousa Department of Telecommunication Engineering An Najah University SP_3_Compression.

Lifecycle from Sound to Digital to Sound. Characteristics of Sound Amplitude Wavelength (w) Frequency ( ) Timbre Hearing: [20Hz – 20KHz] Speech: [200Hz.

Digital Communications Chapter 13. Source Coding

Audio Henning Schulzrinne Dept. of Computer Science

Mobile Systems Workshop 1 Narrow band speech coding for mobile phones

Understanding the Internet Low Bit Rate Coder

Presentation transcript:

Speech-Coding Techniques Chapter 3

Internet Telephony 3-2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth, the lower the quality RTP payload types Processing power The better quality (for a given bandwidth) uses a more complex algorithm A balance between quality and cost

Internet Telephony 3-3 Voice Quality Bandwidth is easily quantified Voice quality is subjective MOS, Mean Opinion Score ITU-T Recommendation P.800 Excellent – 5 Good – 4 Fair – 3 Poor – 2 Bad – 1 A minimum of 30 people Listen to voice samples or in conversations

Internet Telephony 3-4 P.800 recommendations The selection of participants The test environment Explanations to listeners Analysis of results Toll quality A MOS of 4.0 or higher

Internet Telephony 3-5 Subjective and objective quality-testing techniques PSQM – Perceptual Speech Quality Measurement ITU-T P.861 faithfully represent human judgement and perception algorithmic comparison between the output signal and a know input type of speaker, loudness, delay, active/silence frames, clipping, environmental noise

Internet Telephony 3-6 A Little About Speech Speech Air pushed from the lungs past the vocal cords and along the vocal tract The basic vibrations – vocal cords The sound is altered by the disposition of the vocal tract ( tongue and mouth) Model the vocal tract as a filter The shape changes relatively slowly The vibrations at the vocal cords The excitation signal

Internet Telephony 3-7 Speech sounds Voiced sound The vocal cords vibrate open and close Interrupt the air flow Quasi-periodic pluses of air The rate of the opening and closing – the pitch A high degree of periodicity at the pitch period 2-20 ms

Internet Telephony 3-8 Voiced speech Power spectrum density

Internet Telephony 3-9 Unvoiced sounds Forcing air at high velocities through a constriction The glottis is held open Noise-like turbulence Show little long-term periodicity Short-term correlations still present

Internet Telephony 3-10 unvoiced speech Power spectrum density

Internet Telephony 3-11 Plosive sounds A complete closure in the vocal tract Air pressure is built up and released suddenly A vast array of sounds The speech signal is relatively predictable over time The reduction of transmission bandwidth can be significant

Internet Telephony 3-12 Voice Sampling A-to-D discrete samples of the waveform and represent each sample by some number of bits A signal can be reconstructed if it is sampled at a minimum of twice the maximum freq. Human speech Hz 8000 samples per second time Each sample is encoded into an 8-bit PCM code word (e.g ) => 8000 x 8 bit/s

Internet Telephony 3-13 Quantization How many bits is used to represent Quantization noise The difference between the actual level of the input analog signal More bits to reduce Diminishing returns Uniform quantization levels Louder talkers sound better 11.2/11 v.s. 2.2/2

Internet Telephony 3-14 Non-uniform quantization Smaller quantization steps at smaller signal levels Spread signal-to-noise ratio more evenly

Internet Telephony 3-15 DTX and Comfort Noise DTX is Discontinuous Transmission Voice activity detector (VAD) detects if there is active speech or not. When there is no active speech different DTX procedures can be used: No Transmission at all Comfort Noise (CN) using RFC 3389 Codec built CN in like AMR SID (Silence Descriptor) Frequency of Comfort Noise packets varies but is usually some fraction of normal packet rate

Internet Telephony 3-16 Type of Speech Coders Waveform codecs Sample and code High-quality and not complex Large amount of bandwidth source codecs (vocoders) Match the incoming signal to a math model Linear-predictive filter model of the vocal tract A voiced/unvoiced flag for the excitation The information is sent rather than the signal Low bit rates, but sounds synthetic Higher bit rates do not improve much

Internet Telephony 3-17 Hybrid codecs Attempt to provide the best of both Perform a degree of waveform matching Utilize the sound production model Quite good quality at low bit rate

Internet Telephony 3-18 G.711 The most commonplace codec Used in circuit-switched telephone network PCM, Pulse-Code Modulation If uniform quantization 12 bits * 8 k/sec = 96 kbps Non-uniform quantization 64 kbps DS0 rate mu-law North America A-law Other countries, a little friendlier to lower signal levels An MOS of about 4.3

Internet Telephony 3-19 DPCM DPCM, Differential PCM Only transmit the difference between the predicated value and the actual value Voice changes relatively slowly It is possible to predict the value of a sample base on the values of previous samples The receiver perform the same prediction The simplest form No prediction No algorithmic delay

Internet Telephony 3-20 ADPCM ADPCM, Adaptive DPCM Predicts sample values based on Past samples Factoring in some knowledge of how speech varies over time The error is quantized and transmitted Fewer bits required G kbps G.726 A-law/mu-law PCM -> 16, 24, 32, 40 kbps An MOS of about 4.0 at 32 kbps

Internet Telephony 3-21 Analysis-by-Synthesis (AbS) Codecs Hybrid codec Fill the gap between waveform and source codecs The most successful and commonly used Time-domain AbS codecs Not a simple two-state, voiced/unvoiced Different excitation signals are attempted Closest to the original waveform is selected MPE, Multi-Pulse Excited RPE, Regular-Pulse Excited CELP, Code-Excited Linear Predictive

Internet Telephony 3-22 G.728 LD-CELP CELP codecs A filter; its characteristics change over time A codebook of acoustic vectors A vector = a set of elements representing various char. of the excitation Transmit Filter coefficients, gain, a pointer to the vector chosen Low Delay CELP Backward-adaptive coder Use previous samples to determine filter coefficients Operates on five samples at a time Delay < 1 ms Only the pointer is transmitted

Internet Telephony vectors in the code book 10-bit pointer (index) 16 kbps LD-CELP encoder Minimize a frequency-weighted mean-square error

Internet Telephony 3-24 LD-CELP decoder An MOS score of about 3.9 One-quarter of G.711 bandwidth

Internet Telephony 3-25 G ACELP 6.3 or 5.3 kbps Both mandatory Can change from one to another during a conversation The coder A band-limited input speech signal Sampled at 8 KHz, 16-bit uniform PCM quantization Operate on blocks of 240 samples at a time A look-ahead of 7.5 ms A total algorithmic delay of 37.5 ms + other delays A high-pass filter to remove any DC component

Internet Telephony 3-26 Various operations to determine the appropriate filter coefficients 5.3 kbps, Algebraic Code-Excited Linear Prediction 6.3 kbps, Multi-pulse Maximum Likelihood Quantization The transmission Linear predication coefficients Gain parameters Excitation codebook index 24-octet frames at 6.3 kbps, 20-octet frames at 5.3 kbps

Internet Telephony 3-27 G Annex A Silence Insertion Description (SID) frames of size four octets The two lsbs of the first octet 006.3kbps24 octets/frame 015.3kbps20 10SID frame 4 An MOS of about 3.8 At least 27.5 ms delay

Internet Telephony 3-28 G kbps Input frames of 10 ms, 80 samples for 8 KHz sampling rate 5 ms look-ahead Algorithmic delay of 15 ms An 80-bit frame for 10 ms of speech A complex codec G.729.A (Annex A), a number of simplifications Same frame structure Encoder/decoder, G.729/G.729.A Slightly lower quality

Internet Telephony 3-29 G.729.B VAD, Voice Activity Detection Based on analysis of several parameters of the input The current frames plus two preceding frames DTX, Discontinuous Transmission Send nothing or send an SID frame SID frame contains information to generate comfort noise CNG, Comfort Noise Generation G.729, an MOS of about 4.0 G.729A an MOS of about 3.7

Internet Telephony 3-30 G.729 Annex D a lower-rate extension 6.4 kbps; 10 ms speech samples, 64 bits/frame MOS  6.3 kbps G G.729 Annex E a higher bit rate enhancement the linear prediction filter of G.729 has 10 coef. that of G.729 Annex E has 30 coef. the codebook of G.729 has 35 bits that of G.729 Annex E has 44 bits 118 bits/frame; 11.8 kbps

Internet Telephony 3-31 Other Codecs CDMA QCELP defined in IS-733 Variable-rate coder Two most common rates The high rate, 13.3 kbps A lower rate, 6.2 kbps Silence suppression For use with RTP, RFC 2658

Internet Telephony 3-32 GSM Enhanced Full-Rate (EFR) GSM An enhanced version of GSM Full-Rate ACELP-based codec The same bit rate and the same overall packing structure 12.2 kbps Support discontinuous transmission For use with RTP, RFC 1890

Internet Telephony 3-33 GSM Adaptive Multi-Rate (AMR) codec 20 ms coding delay Eight different modes 4.75 kbps to 12.2 kbps 12.2 kbps, GSM EFR 7.4 kbps, IS-641 (TDMA cellular systems) Change the mode at any time Offer discontinuous transmission The SID (Silence Descriptor) is sent in every 8 th frame and is 5 bytes in size The coding choice of many 3G wireless networks

Internet Telephony 3-34 The MOS values are for laboratory conditions G.711 does not deal with lost packets G.729 can accommodate a lost frame by interpolating from previous frames But cause errors in subsequent speech frames Processing Power G.728 or G.729, 40 MIPS G MIPS

Internet Telephony 3-35 iLBC a FREE codec for robust VoIP kbit/s with an encoding frame length of 30 ms and kbps of 20 ms Computational complexity in a range of G.729A

Internet Telephony 3-36 Speex Open-source patent-free speech codec CELP (code-excited linear prediction) codec operating modes: narrowband (8 kHz sampling rate) 2.15 – 24.6 kb/s delay of 30 ms wideband (16 kHz sampling rate) kb/s delay of 34 ms ultra-wideband (32 kHz sampling rate) intensity stereo encoding variable bit rate (VBR) possible voice activity detection (VAD)

Internet Telephony 3-37 Cascaded Codecs E.g., G.711 stream -> G.729 encoder/decoder Might not even come close to G.729 Each coder only generate an approximate of the incoming signal Audio samples tml tml

Internet Telephony 3-38 Effects of packetization

Internet Telephony 3-39 Tones, Signal, and DTMF Digits The hybrid codecs are optimized for human speech Other data may need to be transmitted Tones: fax tones, dialing tone, busy tone DTMF digits for two-stage dialing or voic G.711 is OK G and G.729 can be unintelligible The ingress gateway needs to intercept The tones and DTMF digits Use an external signaling system

Internet Telephony 3-40 Easy at the start of a call Difficult in the middle of a call Encode the tones differently from the speech Send them along the same media path An RTP packet provides the name of the tone and the duration Or, a dynamic RTP profile; an RTP packet containing the frequency, volume and the duration RFC 2198 An RTP payload format for redundant audio data Sending both types of RTP payload

Internet Telephony 3-41 RTP Payload Format for DTMF Digits An Internet Draft Both methods described before A large number of tones and events DTMF digits, a busy tone, a congestion tone, a ringing tone, etc. The named events E: the end of the tone, R: reserved

Internet Telephony 3-42 Payload format