Voice Sampling. Sampling Rate Nyquist’s theorem states that a signal can be reconstructed if it is sampled at twice the maximum frequency of the signal.

Slides:



Advertisements
Similar presentations
EET260: A/D and D/A conversion
Advertisements

Speech Coding Techniques
Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.
Digital Coding of Analog Signal Prepared By: Amit Degada Teaching Assistant Electronics Engineering Department, Sardar Vallabhbhai National Institute of.
Analogue to Digital Conversion (PCM and DM)
CHAPTER 4 DIGITAL MODULATION Part 1.
4.2 Digital Transmission Pulse Modulation (Part 2.1)
CELLULAR COMMUNICATIONS 5. Speech Coding. Low Bit-rate Voice Coding  Voice is an analogue signal  Needed to be transformed in a digital form (bits)
8/16/20021 Digital Transmission Key Learning Points Fundamentals of Voice Digitization Pulse Code Modulation Quantification Noise Multiplexed Digital Lines.
Speech Coding Nicola Orio Dipartimento di Ingegneria dell’Informazione IV Scuola estiva AISV, 8-12 settembre 2008.
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
Digital Voice Communication Link EE 413 – TEAM 2 April 21 st, 2005.
© 2006 Cisco Systems, Inc. All rights reserved. 2.2: Digitizing and Packetizing Voice.
Fundamental of Wireless Communications ELCT 332Fall C H A P T E R 6 SAMPLING AND ANALOG-TO-DIGITAL CONVERSION.
COMP 249 :: Spring 2005 Slide: 1 Audio Coding Ketan Mayer-Patel.
Chapter 4 Digital Transmission
Waveform SpeechCoding Algorithms: An Overview
Digital Transmission.
CS :: Fall 2003 Audio Coding Ketan Mayer-Patel.
1/21 Chapter 5 – Signal Encoding and Modulation Techniques.
Speech coding. What’s the need for speech coding ? Necessary in order to represent human speech in a digital form Applications: mobile/telephone communication,
Modulation, Demodulation and Coding Course Period Sorour Falahati Lecture 2.
Fundamentals of Digital Communication
Chapter Seven: Digital Communication
DIGITAL VOICE NETWORKS ECE 421E Tuesday, October 02, 2012.
Computer Networks Digitization. Spring 2006Computer Networks2 Transfer of an Analog Signal  When analog data (voice, pictures, video) are transformed.
LECTURE Copyright  1998, Texas Instruments Incorporated All Rights Reserved Encoding of Waveforms Encoding of Waveforms to Compress Information.
AUDIO COMPRESSION msccomputerscience.com. The process of digitizing audio signals is called PCM PCM involves sampling audio signal at minimum rate which.
Pulse Code Modulation (PCM)
Speech Coding Using LPC. What is Speech Coding  Speech coding is the procedure of transforming speech signal into more compact form for Transmission.
10/6/2015 3:12 AM1 Data Encoding ─ Analog Data, Digital Signals (5.3) CSE 3213 Fall 2011.
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
Signal Encoding Techniques. Lecture Learning Outcomes Be able to understand, appreciate and differentiate the different signal encoding criteria available.
Speech Coding Submitted To: Dr. Mohab Mangoud Submitted By: Nidal Ismail.
SPEECH CODING Maryam Zebarjad Alessandro Chiumento.
CE Digital Signal Processing Fall 1992 Waveform Coding Hossein Sameti Department of Computer Engineering Sharif University of Technology.
Digital Transmission Outlines:- Multiplexing FDM TDM WDM
© 2006 Cisco Systems, Inc. All rights reserved. Optimizing Converged Cisco Networks (ONT) Module 2: Cisco VoIP Implementations.
1 PCM & DPCM & DM. 2 Pulse-Code Modulation (PCM) : In PCM each sample of the signal is quantized to one of the amplitude levels, where B is the number.
1 Audio Compression. 2 Digital Audio  Human auditory system is much more sensitive to quality degradation then is the human visual system  redundancy.
3-2008UP-Copyrights reserved1 ITGD4103 Data Communications and Networks Lecture-11:Data encoding techniques week 12- q-2/ 2008 Dr. Anwar Mousa University.
British Computer Society (BCS)
Pulse Code Modulation PCM is a method of converting an analog signal into a digital signal. (A/D conversion) The amplitude of Analog signal can take any.
Submitted By: Santosh Kumar Yadav (111432) M.E. Modular(2011) Under the Supervision of: Mrs. Shano Solanki Assistant Professor, C.S.E NITTTR, Chandigarh.
CS Spring 2009 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2009.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
Digital Multiplexing 1- Pulse Code Modulation 2- Plesiochronous Digital Hierarchy 3- Synchronous Digital Hierarchy.
ITU-T G.729 EE8873 Rungsun Munkong March 22, 2004.
Digital Audio III. Sound compression (I) Compression of sound data requires different techniques from those for graphical data Requirements are less stringent.
1 Audio Coding. 2 Digitization Processing Signal encoder Signal decoder samplingquantization storage Analog signal Digital data.
4.2 Digital Transmission Pulse Modulation Pulse Code Modulation
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2014.
COMMUNICATION SYSTEM EEEB453 Chapter 5 (Part III) DIGITAL TRANSMISSION Intan Shafinaz Mustafa Dept of Electrical Engineering Universiti Tenaga Nasional.
1 What is Multimedia? Multimedia can have a many definitions Multimedia means that computer information can be represented through media types: – Text.
Fundamentals of Multimedia Chapter 6 Basics of Digital Audio Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
PAM Modulation Lab#3. Introduction An analog signal is characterized by the fact that its amplitude can take any value over a continuous range. On the.
1 Speech Compression (after first coding) By Allam Mousa Department of Telecommunication Engineering An Najah University SP_3_Compression.
Lifecycle from Sound to Digital to Sound. Characteristics of Sound Amplitude Wavelength (w) Frequency ( ) Timbre Hearing: [20Hz – 20KHz] Speech: [200Hz.
Analog to digital conversion
Digital Communications Chapter 13. Source Coding
Vocoders.
Topics discussed in this section:
UNIT – III I: Digital Transmission.
UNIT II.
4.1 Chapter 4 Digital Transmission Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
CS 4594 Data Communications
Sampling and Quantization
Presentation transcript:

Voice Sampling

Sampling Rate Nyquist’s theorem states that a signal can be reconstructed if it is sampled at twice the maximum frequency of the signal. Speech frequency range is 300 – 3400 cycles/second So for conversational speech the maximum would be 4000 cycles/second The sampling rate would then be 8000 samples per second

Quantization

Quantization Noise When we use bits to represent each level The number of bits used determines the number of levels The number of levels determines the accuracy of our representation of the original signal The difference between the actual signal and the digital reproduction is known as Quantization Noise

Linear Quantization Applicable when the signal is in a finite range (fmin, fmax) The entire data range is divided into L equal intervals of length Q (known as quantization interval or quantization step-size) Q=(fmax-fmin)/L Interval i is mapped to the middle value of this interval We store/send only the index of quantized value min

Signal Range is Symmetric

Errors Errors occur on every sample except where the sample size exactly coincides the mid-point of the decision level. If smaller steps are taken the quantization error will be less. However, increasing the steps will complicate the coding operation and increase bandwidth requirements. Quantizing noise depends on step size and not on signal amplitude

Non-Linear Quantization The quantizing intervals are not of equal size Small quantizing intervals are allocated to small signal values (samples) and large quantization intervals to large samples so that the signal-to-quantization distortion ratio is nearly independent of the signal level S/N ratios for weak signals are much better but are slightly less for the stronger signals “Companding” is used to quantize signals

Function representation

Companding Formed from the words compressing and expanding. A PCM compression technique where analogue signal values are rounded on a non-linear scale. The data is compressed before sent and then expanded at the receiving end using the same non-linear scale. Companding reduces the noise and crosstalk levels at the receiver.

u-LAW and A-LAW definitions A-law and u-law are companding schemes used in telephone networks to get more dynamics to the 8 bit samples that is available with linear coding. Typically bit samples (linear scale) sampled at 8 kHz sample are companded to 8 bit (logarithmic scale) for transmission over 64 kbit/s data channel. In the receiving end the data is then converted back to linear scale ( bit) and played back. converted back

Speech Codecs Waveform codec Source codec (vocoders) Hybrid codec

Waveform Codec Waveform codec’s attempt, without using any knowledge of how the signal to be coded was generated, to produce a reconstructed signal whose waveform is as close as possible to the original. This means that in theory they should be signal independent and work well with non-speech signals. Generally they are low complexity codec’s which produce high quality speech at rates above about 16 kbits/s. When the data rate is lowered below this level the reconstructed speech quality that can be obtained degrades rapidly

Source Codec Source coders operate using a model of how the source was generated, and attempt to extract, from the signal being coded, the parameters of the model. It is these model parameters which are transmitted to the decoder. Source coders for speech are called vocoders, and work as follows. The vocal tract is represented as a time-varying filter and is excited with either a white noise source, for unvoiced speech segments, or a train of pulses separated by the pitch period for voiced speech. Therefore the information which must be sent to the decoder is the filter specification, a voiced/unvoiced flag, the necessary variance of the excitation signal, and the pitch period for voiced speech.

Hybrid Codec Hybrid codecs attempt to fill the gap between waveform and source codecs. Waveform coders are capable of providing good quality speech at bit rates down to about 16 kbits/s, but are of limited use at rates below this. Source coders on the other hand can provide intelligible speech at 2.4 kbits/s and below, but cannot provide natural sounding speech at any bit rate. Although other forms of hybrid codecs exist, the most successful and commonly used are time domain Analysis-by-Synthesis (AbS) codecs.

G.711 Pulse Code Modulation (PCM) codecs are the simplest form of waveform codecs. Narrowband speech is typically sampled 8000 times per second, and then each speech sample must be quantized. If linear quantization is used then about 12 bits per sample are needed, giving a bit rate of about 96 kbits/s. However this can be easily reduced by using non-linear quantization. For coding speech it was found that with non-linear quantization 8 bits per sample was sufficient for speech quality which is almost indistinguishable from the original. This gives a bit rate of 64 kbits/s, and two such non-linear PCM codecs were standardised in the 1960s

Adaptive Differential PCM (ADPCM) Adaptive Differential Pulse Code Modulation (ADPCM) codecs are waveform codecs which instead of quantizing the speech signal directly, quantize the difference between the speech signal and a prediction that has been made of the speech signal. If the prediction is accurate then the difference between the real and predicted speech samples will have a lower variance than the real speech samples, and will be accurately quantized with fewer bits than would be needed to quantize the original speech samples.

G.721, G.726 & G.727 In the mid 1980s the CCITT standardised a 32 kbits/s ADPCM, known as G721, which gave reconstructed speech almost as good as the 64 kbits/s PCM codecs. Later in recommendations G726 and G727 codecs operating at 40,32,24 and 16 kbits/s were standardised

Code-Excited Linear Predictive (CELP) At bit rates of around 16 kbits/s and lower the quality of waveform codecs falls rapidly, as can be seen in figure shown earlier. Thus at these rates hybrid codecs, especially CELP codecs and their derivatives, tend to be used. However because of the forward adaptive determination of the short term filter coefficients used in most of these codecs, they tend to have high delays.

G.728 (Low-Delay) CELP Codecs CELP codec which was developed at AT&T Bell Labs, and was standardised in 1992 as G728. This codec uses backward adaption to calculate the short term filter coefficients, which means that rather than buffer 20 ms or so of the input speech to calculate the filter coefficients they are found from the past reconstructed speech. This means that the codec can use a much shorter frame length than traditional CELP codecs, and G728 uses a frame length of only 5 samples giving it a total delay of less than 2 ms.

G.723 (Algebraic Code-Excited Linear Prediction (ACELP) Normal conversation involves significant periods of silence. G723 specifies a mechanism for silence suppression where Silence Insertion Description (SID) frames can be used. These are only 32bits long – this means that silence only occupies 1Kbps – compared to 64Kbps for G711. G.723 has an MOS score of 3.8 but has a delay of 37.5 mSecs at the encoder

G.729 G.729 is an umbrella of vocoder standards. The G.729 codec perform voice compression at bit rates that vary between 6.4 and 12.4 kbps. The figure below shows an example of the G.729 vocoder connected to a digital communication channel. The input speech is fed into the G.729 encoder as a stream of 16-bit PCM samples, sampled at a rate of 8000 samples/second. The G.729 encoder compresses the data into the Encode Stream.

G G.729 also uses samples of the actual human speech to set the vocoder settings properly. It also compares the actual voice from the synthetic voice to come up with a "code." The code along with the vocoder settings are what's sent to the remote end. The remote end takes the code and vocoder settings and plays the sound.