ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska

Slides:



Advertisements
Similar presentations
Speech Coding Techniques
Advertisements

Tamara Berg Advanced Multimedia
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
Spread Spectrum Chapter 7.
Spread Spectrum Chapter 7. Spread Spectrum Input is fed into a channel encoder Produces analog signal with narrow bandwidth Signal is further modulated.
Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.
Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
A 12-WEEK PROJECT IN Speech Coding and Recognition by Fu-Tien Hsiao and Vedrana Andersen.
Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
Speech Coding Nicola Orio Dipartimento di Ingegneria dell’Informazione IV Scuola estiva AISV, 8-12 settembre 2008.
Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec
Pole Zero Speech Models Speech is nonstationary. It can approximately be considered stationary over short intervals (20-40 ms). Over thisinterval the source.
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
Communication Systems
Analysis & Synthesis The Vocoder and its related technology.
Voice Transformation Project by: Asaf Rubin Michael Katz Under the guidance of: Dr. Izhar Levner.
Matched Filters By: Andy Wang.
EE513 Audio Signals and Systems Digital Signal Processing (Systems) Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
LE 460 L Acoustics and Experimental Phonetics L-13
Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.
LECTURE Copyright  1998, Texas Instruments Incorporated All Rights Reserved Encoding of Waveforms Encoding of Waveforms to Compress Information.
Speech Coding Using LPC. What is Speech Coding  Speech coding is the procedure of transforming speech signal into more compact form for Transmission.
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
Chapter 16 Speech Synthesis Algorithms 16.1 Synthesis based on LPC 16.2 Synthesis based on formants 16.3 Synthesis based on homomorphic processing 16.4.
Speech Coding Submitted To: Dr. Mohab Mangoud Submitted By: Nidal Ismail.
Concepts of Multimedia Processing and Transmission IT 481, Lecture #4 Dennis McCaughey, Ph.D. 25 September, 2006.
SPEECH CODING Maryam Zebarjad Alessandro Chiumento.
1 Linear Prediction. 2 Linear Prediction (Introduction) : The object of linear prediction is to estimate the output sequence from a linear combination.
1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
1 Linear Prediction. Outline Windowing LPC Introduction to Vocoders Excitation modeling  Pitch Detection.
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
Submitted By: Santosh Kumar Yadav (111432) M.E. Modular(2011) Under the Supervision of: Mrs. Shano Solanki Assistant Professor, C.S.E NITTTR, Chandigarh.
Linear Predictive Analysis 主講人:虞台文. Contents Introduction Basic Principles of Linear Predictive Analysis The Autocorrelation Method The Covariance Method.
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
ITU-T G.729 EE8873 Rungsun Munkong March 22, 2004.
CHAPTER 5 SIGNAL SPACE ANALYSIS
(Extremely) Simplified Model of Speech Production
A Comparison Of Speech Coding With Linear Predictive Coding (LPC) And Code-Excited Linear Predictor Coding (CELP) By: Kendall Khodra Instructor: Dr. Kepuska.
1 Audio Coding. 2 Digitization Processing Signal encoder Signal decoder samplingquantization storage Analog signal Digital data.
More On Linear Predictive Analysis
SPEECH CODING Maryam Zebarjad Alessandro Chiumento Supervisor : Sylwester Szczpaniak.
Present document contains informations proprietary to France Telecom. Accepting this document means for its recipient he or she recognizes the confidential.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Normal Equations The Orthogonality Principle Solution of the Normal Equations.
Baseband Receiver Receiver Design: Demodulation Matched Filter Correlator Receiver Detection Max. Likelihood Detector Probability of Error.
Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)
Lecture 12: Parametric Signal Modeling XILIANG LUO 2014/11 1.
Voice Sampling. Sampling Rate Nyquist’s theorem states that a signal can be reconstructed if it is sampled at twice the maximum frequency of the signal.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
Fundamentals of Multimedia Chapter 6 Basics of Digital Audio Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Performance of Digital Communications System
Linear Prediction.
1 Speech Compression (after first coding) By Allam Mousa Department of Telecommunication Engineering An Najah University SP_3_Compression.
Chapter 7. Classification and Prediction
PATTERN COMPARISON TECHNIQUES
Figure 11.1 Linear system model for a signal s[n].
Digital Communications Chapter 13. Source Coding
Vocoders.
EE Audio Signals and Systems
Linear Prediction.
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
Microcomputer Systems 2
Linear Predictive Coding Methods
Mobile Systems Workshop 1 Narrow band speech coding for mobile phones
Vocoders.
Linear Prediction.
EE Audio Signals and Systems
Speech Processing Final Project
Presentation transcript:

ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska Speech Processing Project Linear Predictive coding using Voice excited Vocoder ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska

Models Implemented In MATLAB

LPC Background The speech signal is filtered to no more than one half the system sampling frequency and then A/D conversion is performed. The speech is processed on a frame by frame basis where the analysis frame length can be variable. For each frame a pitch period estimation is made along with a voicing decision. A linear predictive coefficient analysis is performed to obtain an inverse model of the speech spectrum A (z). In addition a gain parameter G, representing some function of the speech energy is computed. An encoding procedure is then applied for transforming the analyzed parameters into an efficient set of transmission parameters with the goal of minimizing the degradation in the synthesized speech for a specified number of bits. Knowing the transmission frame rate and the number of bits used for each transmission parameters, one can compute a noise-free channel transmission bit rate.

LPC Theory At the receiver, the transmitted parameters are decoded into quantized versions of the coefficients analysis and pitch estimation parameters. An excitation signal for synthesis is then constructed from the transmitted pitch and voicing parameters. The excitation signal then drives a synthesis filter 1/A (z) corresponding to the analysis model A (z). The digital samples s^(n) are then passed through an D/A converter and low pass filtered to generate the synthetic speech s(t). Either before or after synthesis, the gain is used to match the synthetic speech energy to the actual speech energy. The digital samples are the converted to an analog signal and passed through a filter similar to the one at the input of the system.

Speech production Where p is the number of poles. G is the filter Gain. and a[k] are the parameters that determine the poles.

Voiced and Unvoiced sounds There are two mutually exclusive ways excitation functions to model voiced and unvoiced speech sounds. For a short time-basis analysis: voiced speech is considered periodic with a fundamental frequency of Fo, and a pitch period of 1/Fo, which depends on the speaker. Hence, Voiced speech is generated by exciting the all pole filter model by a periodic impulse train. On the other hand, unvoiced sounds are generated by exciting the all-pole filter by the output of a random noise generator

Voiced/Unvoiced The fundamental difference between these two types of speech sounds comes from the following: the way they are produced. The vibrations of the vocal cords produce voiced sounds. The rate at which the vocal cords vibrate dictates the pitch of the sound. On the other hand, unvoiced sounds do not rely on the vibration of the vocal cords. The unvoiced sounds are created by the constriction of the vocal tract. The vocal cords remain open and the constrictions of the vocal tract force air out to produce the unvoiced sounds Given a short segment of a speech signal, lets say about 20 ms or 160 samples at a sampling rate 8 KHz, the speech encoder at the transmitter must determine the proper excitation function, the pitch period for voiced speech, the gain, and the coefficients

Mathematical Representation of the Model The parameters of the all-pole filter model are determined from the speech samples by means of linear prediction. To be specific the output of the Linear Prediction filter is: the corresponding error between the observed sample S(n) and the predicted value is

Cont’d by minimizing the sum of the squared error we can determine the pole parameters of the model. The result of differentiating the sum above with respect to each of the parameters and equation the result to zero, is a sep of p linear equations: where represent the autocorrelation of the sequence defined as

Auto-correlation where is a pxp autocorrelation matrix, is a px1 autocorrelation vector, and a is a px1 vector of model parameters. [row col] = size(data); if col==1 data=data'; end nframe = 0; msfr = round(sr/1000*fr); % Convert ms to samples msfs = round(sr/1000*fs); % Convert ms to samples duration = length(data); speech = filter([1 -preemp], 1, data)'; % Preemphasize speech msoverlap = msfs - msfr; ramp = [0:1/(msoverlap-1):1]'; % Compute part of window for frameIndex=1:msfr:duration-msfs+1 % frame rate=20ms frameData = speech(frameIndex:(frameIndex+msfs-1)); % frame size=30ms nframe = nframe+1; autoCor = xcorr(frameData); % Compute the cross correlation autoCorVec = autoCor(msfs+[0:L]);

Gain representation in the system The gain parameter of the filter can be obtained by the input-output relationship as follow where X(n) represent the input sequence. We can further manipulate this equation and in terms of the error sequence we have

if the input excitation is normalized to unit energy by design, then where G^2 is set equal to the residual energy resulting from the least square optimization once the LPC coefficients are computed, we can determine weather the input speech frame is voiced, and if it is indeed voiced sound, then what is the pitch. We can determine the pitch by computing the following sequence in matlab: which is defined as the autocorrelation sequence of the prediction coefficients.

Pitch Detection The pitch is detected by finding the peak of the normalized sequence In the time interval corresponds to 3 to 15 ms in the 20ms sampling frame. If the value of this peak is at least 0.25, the frame of speech is considered voiced with a pitch period equal to the value of where Is a maximum value If the peak value is less than 0.25, the frame speech is considered unvoiced and the pitch would equal to zero

The value of the LPC coefficients, the pitch period, and the type of excitation are then transmitted to the receiver. The decoder synthesizes the speech signal by passing the proper excitation through the all pole filter model of the vocal tract. Typically the pitch period requires 6 bits, the gain parameters are represented in 5 bits after the dynamic range is compressed and the prediction coefficients require 8-10 bits normally for accuracy reasons. This is very important in LPC because any small changes in the prediction coefficients result in large change in the pole positions of the filter model, which cause instability in the model. This is overcome by using the PARACOR method .

Model LPC Vocoder (pitch detector)

Voice excited LPC Vocoder