EE513 Audio Signals and Systems

Slides:

Advertisements

Similar presentations

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.

Advertisements

Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),

Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.

EE513 Audio Signals and Systems Digital Signal Processing (Synthesis) Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.

OPTIMUM FILTERING.

Speech Sound Production: Recognition Using Recurrent Neural Networks Abstract: In this paper I present a study of speech sound production and methods for.

A 12-WEEK PROJECT IN Speech Coding and Recognition by Fu-Tien Hsiao and Vedrana Andersen.

Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.

EE513 Audio Signals and Systems LPC Analysis and Speech Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

1 Frequency Domain Analysis/Synthesis Concerned with the reproduction of the frequency spectrum within the speech waveform Less concern with amplitude.

Speech and Audio Processing and Recognition

Speech Coding Nicola Orio Dipartimento di Ingegneria dell’Informazione IV Scuola estiva AISV, 8-12 settembre 2008.

Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.

EE513 Audio Signals and Systems Wiener Inverse Filter Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

EE Audio Signals and Systems Psychoacoustics (Masking) Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

EE513 Audio Signals and Systems Noise Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

Over-Sampling and Multi-Rate DSP Systems

EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

EE513 Audio Signals and Systems Digital Signal Processing (Systems) Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.

1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.

Speech Coding Using LPC. What is Speech Coding  Speech coding is the procedure of transforming speech signal into more compact form for Transmission.

Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.

Chapter 16 Speech Synthesis Algorithms 16.1 Synthesis based on LPC 16.2 Synthesis based on formants 16.3 Synthesis based on homomorphic processing 16.4.

Speech Coding Submitted To: Dr. Mohab Mangoud Submitted By: Nidal Ismail.

T – Biomedical Signal Processing Chapters

EE Audio Signals and Systems Digital Signal Processing (Synthesis) Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

1 Linear Prediction. 2 Linear Prediction (Introduction) : The object of linear prediction is to estimate the output sequence from a linear combination.

1 Linear Prediction. Outline Windowing LPC Introduction to Vocoders Excitation modeling  Pitch Detection.

Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.

Structure of Spoken Language

ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska

EE Audio Signals and Systems Linear Prediction Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

EE513 Audio Signals and Systems

EE Audio Signals and Systems Room Acoustics Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,

More On Linear Predictive Analysis

SPEECH CODING Maryam Zebarjad Alessandro Chiumento Supervisor : Sylwester Szczpaniak.

EE Audio Signals and Systems Speech Production Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)

By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.

Linear Prediction.

Digital Signal Processing Lecture 6 Frequency Selective Filters

Adv DSP Spring-2015 Lecture#11 Spectrum Estimation Parametric Methods.

EE422G Signals and Systems Laboratory Fourier Series and the DFT Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

EE599-2 Audio Signals and Systems

Figure 11.1 Linear system model for a signal s[n].

Talking with computers

EEE422 Signals and Systems Laboratory

Speech Signal Processing

Digital Communications Chapter 13. Source Coding

Linear Prediction Simple first- and second-order systems

EE Audio Signals and Systems

Linear Prediction.

1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.

Microcomputer Systems 2

Linear Predictive Coding Methods

Mobile Systems Workshop 1 Narrow band speech coding for mobile phones

The Vocoder and its related technology

Chapter 6 Discrete-Time System

ESTIMATED INVERSE SYSTEM

EE513 Audio Signals and Systems

Linear Prediction.

EE Audio Signals and Systems

Speech Processing Final Project

Presentation transcript:

EE513 Audio Signals and Systems LPC Analysis and Speech Kevin D. Donohue Electrical and Computer Engineering University of Kentucky

Speech Generation Speech can be divided into fundamental building blocks of sounds referred to as phonemes. All sounds result from turbulence through obstructed air flow The vocal cords create quasi-periodic obstructions of air flow as a sound source at the base of the vocal tract. Phonemes associated with the vocal cord are referred to as voiced speech. Single shot turbulence from obstructed air flow through the vocal tract is primarily generated by the teeth, tongue and lips. Phonemes associated with non-periodic obstructed air flow are referred to as unvoiced speech. Taken from http://www.kt.tu-cottbus.de/speech-analysis/

Speech Production Models The general speech model: Unvoiced Speech Quasi-Periodic Pulsed Air Air Burst or Continuous flow Voiced Speech Vocal Tract Filter Vocal Radiator Sources can be modeled as quasi periodic impulse trains or random sequences of impulses. Vocal tract filter can be modeled as an all-pole filter related to the tract resonances. The radiator can be modeled as a simple gain with spatial direction (possibly some filtering)

Vocal Tract Resonances Vocal tract length corresponds to signal wavelength (). It can be obtained from resonant frequencies (f ) estimated from recorded speech sounds and the speed of sound (c), using equation: First 3 resonances of tube with 1 closed end 1/4 Wavelength 3/4 Wavelength 5/4 Wavelength Image adapted from: hyperphysics.phy-astr.gsu.edu

Vocal Tract Resonances The resonances of the vocal tract are called formants and can be estimated from peaks of the spectrum where the effects of pitch have been smoothed out (i.e. spectral envelope).

Low Order AR Modeling If the voiced speech is characterized by an all pole model with low order (i.e. about 10 for sampling rate of 8kHz), then the pole frequencies correspond to the resonances of the vocal tract: The above transfer function can represent a filter that computes the error between the current sample and the sample predicted from previous samples. Therefore, it is call a prediction error filter.

Example Create an “auh” sound (as the “a” in about or “u” in hum) and use the (linear prediction coefficient) LPC command to model this sound being generated from a quasi-periodic sequence of impulses exciting an all pole filter. The LPC command finds a vector of filter coefficients such that prediction error is minimized. Predict x(n) from previous samples: Compute prediction error sequence with: Use Z-transforms to find transfer function of filter that recovers x(n) from the LPCs and error sequence e(n).

LPC Derivation Derive an algorithm to compute LPC coefficients from a stream of data that minimizes the mean squared prediction error. Let be the sequence of data points and be the Mth order LPC coefficients, and be the prediction estimate. The mean squared error for the prediction is given by:

LPC Computation Put prediction equations in matrix form: Each row of is a prediction of the corresponding sample in

LPC Computation The mean squared error can be expressed as: If derivative is taken with respect to a and set equal to 0, the result is:

LPC Computation Transpose of the data matrix times itself results in the autocorrelation matrix: The data matrix transpose times the future (p-vector) values become a sequence of autocorrelation values starting with the first lag:

Autocorrelation and LPC Define the autocorrelation of a sequence as: Note that the LPC coefficients are computed from the autocorrelation coefficients: Autocorrelation Matrix

Script for Analysis winlens = 50; %PSD window length in milliseconds [y,fs] = wavread('../data/aaa3.wav'); % Read in wavefile winlen = winlens*fs/1000; [cb,ca] = butter(5,2*100/fs,'high'); % Filter to remove LF recording noise yf = filtfilt(cb,ca,y); [a,er] = lpc(yf,10); % Compute LPC coefficient with model order 10 predy = filter(a,1,yf); % Compute prediction error with all zero filter kd=1; % Starting figure number figure(kd) ; plot(predy); hold on; plot(yf,'g'); hold off; title('Prediction error'); xlabel('Samples'); ylabel('Amplitude') recon = filter(1,a,predy); % Compute reconstructed signal from error and all-pole filter figure(kd+1) % Plot reconstructed signal plot(recon,'b') hold on % Plot with original delayed by a unit so it does not entirely overlap the perfectly reconstructed signal plot(yf(2:end),'r') hold off xlabel('Samples'); ylabel('Amplitude') title('Reconstructed Signal (blue) and Original (red)') % By examining a the error sequence, generate a simple impulse sequence to simulate its period (about 103 sample period) g = []; for k=1:150 g = [g, 1, zeros(1,55)]; end

Script for Analysis % Run simulated error sequence through all pole filter sim = filter(1,a,g); soundsc([(sim')/std(sim); zeros(fix(fs)*1,1); yf/std(yf)],fs) % Plot pole zero diagram figure(kd+2) r = (roots(a)) w = [0:.001:2*pi]; plot(real(r),imag(r),'xr',real(exp(j*w)),imag(exp(j*w)),'b') title('Pole diagram of vocal tract filter') xlabel('Real'); ylabel('Imaginary') % Find resonant frequencies corresponding to poles froots = (fs/2)*angle(r)/pi; nf = find(froots > 0 & froots < fs/2); % Find those corresponding to complex conjugate poles figure(kd+3) % Examine average specturm with formant frequencies [pd,f] = pwelch(yf,hamming(winlen),fix(winlen/2),2*winlen,fs); dbspec = 20*log10(pd); mxp = max(dbspec); % Find max and min points for graphing verticle lines mnp = min(dbspec); plot(f,dbspec,'b') % Plot PSD hold

Script for Analysis % Over lines on plot where formant frequencies were estimated from LPCs for k=1:length(nf) plot([froots(nf(k)), froots(nf(k))], [mnp(1), mxp(1)], 'k--') end hold off title('PSD plot with formant frequencies (Black broken lines)') xlabel('Hertz') ylabel('dB') % Get spectrum from the AR (LPC) parameters [hz,fz] = freqz(1, a, 1024, fs); figure(kd+4) plot(fz,abs(hz)) title('Spectrum Generated by LPCs') ylabel('Amplitude')

LPC Analysis Result Pole Frequencies of LPC model from vocal tract shape Frequency periodicities from harmonics of Pitch frequency

Vocal Tract Filter Implementations Direct form 1 for all pole model: z-1 … +

Vocal Tract Filter Implementations Direct form 1, second order sections: … z-1 + z-1 + + z-1 + + z-1

Vocal Tract Filter Implementations Lattice implementation are popular because of good numerical error and stability properties. The filter is implement in modular stages with coefficients directly related to stability criterion and tube resonances of the vocal tract (example of 2nd order system): z-1 + z-1 +

Example Record a neutral vowel sound, estimate the formant frequencies, and estimate the size of the vocal tract based on a 345 m/s speed of sound and assume an open-at-one-end tube model. Use LPCs estimated from the neutral vowel sound, to filter another sample of speech from the same speaker. Use it as an all zero filter and then as an all pole filter. Listen to the sound and describe what is happening. Convert the LPC coefficients for all-pole filter into a second order section and implement filter. Describe advantages of this approach. Modify the filter by maintaining the angle of the poles/zeros but move their magnitudes closer to the unit circle. Listen to the sound and explain what is happening.

Homework (1) Record a free vowel sound and estimate the size of your vocal tract based on the formant frequencies. Compute the LPCs from a free vowel sound and use the LPCs to filter another segment of speech with –10dB of white noise added. Use the LPCs as an all-zero filter and as an all-pole filter. Describe the sound of the filtered outputs and explain what is happening between the 2 filters. Move the poles and zeros further away from the unit circles and repeat part b). Describe the effect on the filtered sound when pole and zeros are moved away from the unit circle. Submit this description and the mfiles used to process the data.