Pitch Determination by Wavelet Transformation Santhosh Bellikoth ECE 5525- Speech Processing Instructor: Dr Kepuska.

Slides:



Advertisements
Similar presentations
Acoustic/Prosodic Features
Advertisements

1 A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old.
Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.
Fundamental Frequency & Jitter Lab 2. Fundamental Frequency Pitch is the perceptual correlate of F 0 Perception is not equivalent to measurement: –Pitch=
A System for Hybridizing Vocal Performance By Kim Hang Lau.
Filtering Filtering is one of the most widely used complex signal processing operations The system implementing this operation is called a filter A filter.
Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.
Voice source characterisation Gerrit Bloothooft UiL-OTS Utrecht University.
Time-Frequency and Time-Scale Analysis of Doppler Ultrasound Signals
Multi-Resolution Analysis (MRA)
Introduction to Wavelets -part 2
ECE 501 Introduction to BME ECE 501 Dr. Hang. Part V Biomedical Signal Processing Introduction to Wavelet Transform ECE 501 Dr. Hang.
Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.
OBJECTIVES: 1. DETERMINE WHETHER A GRAPH REPRESENTS A FUNCTION. 2. ANALYZE GRAPHS TO DETERMINE DOMAIN AND RANGE, LOCAL MAXIMA AND MINIMA, INFLECTION POINTS,
Representing Acoustic Information
Introduction to Spectral Estimation
Pulse Modulation 1. Introduction In Continuous Modulation C.M. a parameter in the sinusoidal signal is proportional to m(t) In Pulse Modulation P.M. a.
CS 551/651: Structure of Spoken Language Lecture 1: Visualization of the Speech Signal, Introductory Phonetics John-Paul Hosom Fall 2010.
Chapter 5 Frequency Domain Analysis of Systems. Consider the following CT LTI system: absolutely integrable,Assumption: the impulse response h(t) is absolutely.
LE 460 L Acoustics and Experimental Phonetics L-13
Instrumental Assessment SPPA 6400 Voice Disorders: Tasko.
Lecture 1 Signals in the Time and Frequency Domains
IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults.
Schizophrenia and Depression – Evidence in Speech Prosody Student: Yonatan Vaizman Advisor: Prof. Daphna Weinshall Joint work with Roie Kliper and Dr.
Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.
MUSIC 318 MINI-COURSE ON SPEECH AND SINGING
Chapter 16 Speech Synthesis Algorithms 16.1 Synthesis based on LPC 16.2 Synthesis based on formants 16.3 Synthesis based on homomorphic processing 16.4.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Chapter 5 Frequency Domain Analysis of Systems. Consider the following CT LTI system: absolutely integrable,Assumption: the impulse response h(t) is absolutely.
♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 1/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay NCC 2011 : 17.
ECE 4710: Lecture #6 1 Bandlimited Signals  Bandlimited waveforms have non-zero spectral components only within a finite frequency range  Waveform is.
Wavelet transform Wavelet transform is a relatively new concept (about 10 more years old) First of all, why do we need a transform, or what is a transform.
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
Course Outline (Tentative) Fundamental Concepts of Signals and Systems Signals Systems Linear Time-Invariant (LTI) Systems Convolution integral and sum.
1 Using Wavelets for Recognition of Cognitive Pattern Primitives Dasu Aravind Feature Group PRISM/ASU 3DK – 3DK – September 21, 2000.
“Digital stand for training undergraduate and graduate students for processing of statistical time-series, based on fractal analysis and wavelet analysis.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
Speech Enhancement Using a Minimum Mean Square Error Short-Time Spectral Amplitude Estimation method.
P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti / DSP 2009, Santorini, 5-7 July DSP 2009 (Santorini, Greece. 5-7 July 2009), Session: S4P,
IIT Bombay 14 th National Conference on Communications, 1-3 Feb. 2008, IIT Bombay, Mumbai, India 1/27 Intro.Intro.
Indoor Location Detection By Arezou Pourmir ECE 539 project Instructor: Professor Yu Hen Hu.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
The Wavelet Tutorial: Part2 Dr. Charturong Tantibundhit.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
In The Name of God The Compassionate The Merciful.
1 Roadmap SignalSystem Input Signal Output Signal characteristics Given input and system information, solve for the response Solving differential equation.
Performance of Digital Communications System
بسم الله الرحمن الرحيم University of Khartoum Department of Electrical and Electronic Engineering Third Year – 2015 Dr. Iman AbuelMaaly Abdelrahman
CLASSIFICATION OF ECG SIGNAL USING WAVELET ANALYSIS
Dyadic Behavior Analysis in Depression Severity Assessment Interviews
CS 591 S1 – Computational Audio
Figure 11.1 Linear system model for a signal s[n].
EEE422 Signals and Systems Laboratory
Speech Signal Processing
Vocoders.
Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.
Spread Spectrum Audio Steganography using Sub-band Phase Shifting
Harmonics Ben Kemink.
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
UNIT II Analysis of Continuous Time signal
Speech Perception CS4706.
Linear Predictive Coding Methods
Voice source characterisation
Wavelet transform Wavelet transform is a relatively new concept (about 10 more years old) First of all, why do we need a transform, or what is a transform.
INTRODUCTION TO THE SHORT-TIME FOURIER TRANSFORM (STFT)
A System for Hybridizing Vocal Performance
COPYRIGHT © All rights reserved by Sound acoustics Germany
Presentation transcript:

Pitch Determination by Wavelet Transformation Santhosh Bellikoth ECE Speech Processing Instructor: Dr Kepuska

Pitch Determination Equivalent to fundamental frequency estimation Equivalent to fundamental frequency estimation Essential Component in all Speech Processing system Essential Component in all Speech Processing system

Applications of Pitch Detector Speaker Identification and Verification Speaker Identification and Verification Pitch Synchronous speech analysis and Synthesis Pitch Synchronous speech analysis and Synthesis Linguistic and phonetic knowledge acquisition Linguistic and phonetic knowledge acquisition Voice disease diagnosis Voice disease diagnosis

Continuous Wavelet transform Continuous Wavelet transform is defined as the convolution of a signal x (t) with a wavelet functionΨ(t) shifted in time by a translation parameter ‘b‘ and a dilation parameter ‘a’ Continuous Wavelet transform is defined as the convolution of a signal x (t) with a wavelet functionΨ(t) shifted in time by a translation parameter ‘b‘ and a dilation parameter ‘a’

Dyadic Wavelet Transform Dyadic Wavelet Transform is defined as Dyadic Wavelet Transform is defined as

Dyadic Wavelet Transform Properties Linearity Linearity Time Shift Variance Time Shift Variance Detection of sharp and slow variation in the signal, which makes it useful tool for the analysis of Speech Signal. Detection of sharp and slow variation in the signal, which makes it useful tool for the analysis of Speech Signal.

Plot of Haar Wavelet and Scaling Function

Pitch Detection Steps Segmentation of Speech Signal Segmentation of Speech Signal Scale Selection Scale Selection Computation of Wavelet Transformation of each frame at various scales Computation of Wavelet Transformation of each frame at various scales Locating Position of local maxims for each frame Locating Position of local maxims for each frame Locating position of GCIs Locating position of GCIs Calculation of Pitch Periods Calculation of Pitch Periods

Segmentation of Speech Signal 1) Segmentation without Overlapping Speech Signal is segmented using a hamming window of 40 ms duration Speech Signal is segmented using a hamming window of 40 ms duration 2) Segmentation with 50 % Overlapping Rectangular window is used with overlapping of less than 10 % Rectangular window is used with overlapping of less than 10 %

Scale Selection Dyalet Wavelet Transform is computed at scales a=2^j for all j. Dyalet Wavelet Transform is computed at scales a=2^j for all j. Number of Scales for computation of can be reduced based on the nature of the speech signal. Number of Scales for computation of can be reduced based on the nature of the speech signal.

Number of Scales Selection Wavelet with input center frequency fci and input bandwidth Δfi, Scale parameter ‘a’ corresponding to the required output center frequency fco using the following relation Wavelet with input center frequency fci and input bandwidth Δfi, Scale parameter ‘a’ corresponding to the required output center frequency fco using the following relation a= fci/fco a= fci/fco

Input and Output bandwidth Input bandwidth of the wavelet Input bandwidth of the wavelet Δfi= 2*fci Δfi= 2*fci  Output Bandwidth of the wavelet Δfo=2*fco Δfo=2*fco

Approximation of ‘a’ If fci/fco is not to some power of 2, then it is rounded off to nearest power If fci/fco is not to some power of 2, then it is rounded off to nearest power For high pitch speakers lower bound is decreased and upper bound is increased for the better results For high pitch speakers lower bound is decreased and upper bound is increased for the better results

Computation of Dyadic Wavelet Transform The Dyadic Wavelet Transform is computed for each frame by the following equation The Dyadic Wavelet Transform is computed for each frame by the following equation

Speech Signal to be Segmented

First Three Frames of Original Speech Signal with 50% overlapping

Speech Segment and Dyadic Wavelet Transform

Locating Positions of Local maxims For locating the position of local maxims, first all the peaks of the waveform are located. For locating the position of local maxims, first all the peaks of the waveform are located. Positions of local maxims are computed by setting a threshold, which is 80% of the global maximal. Positions of local maxims are computed by setting a threshold, which is 80% of the global maximal.

Locating all the upside peaks of a waveform and local maxims

Locating the position of GCI’s (Glottal closure Instant) If the position of local maxima at a scale matches the position of local maxima of frame whose wavelet transform has been calculated, then those locations are called GCI’s position If the position of local maxima at a scale matches the position of local maxima of frame whose wavelet transform has been calculated, then those locations are called GCI’s position If it does not match then it is compared with the Wavelet transform at next higher scale If it does not match then it is compared with the Wavelet transform at next higher scale

Pitch Calculation Pitch can be computed as Pitch can be computed as d is the difference between two GCI positions in terms of sample and fs is the sampling frequency of the speech signal d is the difference between two GCI positions in terms of sample and fs is the sampling frequency of the speech signal

Acoustic Measures Jita Jita Jita is absolute Jitter, which gives an evaluation in msec of the period to period variability of the Pitch period with in the analyzed voice sample Jita is absolute Jitter, which gives an evaluation in msec of the period to period variability of the Pitch period with in the analyzed voice sample

Jitter Jitter percent gives an evaluation of the variability of the pitch period within the analyzed voice sample in percent. Jitter percent gives an evaluation of the variability of the pitch period within the analyzed voice sample in percent. P is the pitch period and N is the number of pitch estimated. P is the pitch period and N is the number of pitch estimated.

Shimmer (DB) Shimmer in dB gives an evaluation of the period to period variability of the peak to peak amplitude within the analyzed voice sample. Shimmer in dB gives an evaluation of the period to period variability of the peak to peak amplitude within the analyzed voice sample.

Shimmer(%) Shimmer percent gives an evaluation in percent of the variability of the peak to peak amplitude within the analyzed voice sample. Shimmer percent gives an evaluation in percent of the variability of the peak to peak amplitude within the analyzed voice sample. Shimmer in percent is given by Shimmer in percent is given by

Conclusion Acoustic parameters computed using wavelet transform can be used for the objective analysis of pathological voice. Acoustic parameters computed using wavelet transform can be used for the objective analysis of pathological voice. These Acoustic parameters can be used to differentiate between normal and pathological voice. These Acoustic parameters can be used to differentiate between normal and pathological voice.

Final Program clc; clear all; close all; [s,fs]=wavread('U:\speech2_10k.wav'); %s=s1(1:10000); m=400; wL=400; L=length(s); nf=floor(L/wL); j=1; t=10;

Final program cmp1=[]; cmp2=[]; cmp3=[]; gci=[]; q=[]; d=[]; a=[]; %b=[]; disp('Enter x=1 for male voice'); disp('Enter x=2 for female voice');

Final Program x=input('Enter the value of x ='); switch x case 1 for i=1:nf-1 f(j,:)=f_ovp(s,m,wL,i); g=gne(f(j,:)); c1=cwt(f(j,:),4,'haar'); c2=cwt(f(j,:),8,'haar'); c3=cwt(f(j,:),16,'haar'); c4=cwt(f(j,:),32,'haar');

Final Program [p1,q1,d1]=f_shim_max(c1); [p2,q2,d2]=f_shim_max(c2); [p3,q3,d3]=f_shim_max(c3); [p4,q4,d4]=f_shim_max(c4); L1=length(p1); L2=length(p2); L3=length(p3); L4=length(p4); if L1==L2 cmp1=comp_t(p1,p2,t);

Final Program elseif L2==L3 cmp2=comp_t(p2,p3,t); elseif L3==L4 cmp3=comp_t(p3,p4,t); end if ~isempty(cmp1) gci=[gci,p1']; q=[q,q1']; d=[d,d1']; elseif ~isempty(cmp2)

Final Program gci=[gci,p2']; q=[q,q2']; d=[d,d2']; elseif ~isempty(cmp3) gci=[gci,p3']; q=[q,q3']; d=[d,d3']; elseif isempty(cmp1)& isempty(cmp2) d=[d,zeros(1,1)]; end

Final Program end a=[a g]; % b=[b g2]; j=j+1; end %d1=diff(gci); case 2 for i=1:nf-1 f(j,:)=f_ovp3t(s,m,wL,i); c1=cwt(f(j,:),8,'haar'); c2=cwt(f(j,:),16,'haar'); c3=cwt(f(j,:),32,'haar'); c4=cwt(f(j,:),64,'haar'); g=gne(f(j,:)); [p1,q1,d1]=f_shim_max(c1); [p2,q2,d2]=f_shim_max(c2);

Final Program [p3,q3,d3]=f_shim_max(c3); [p4,q4,d4]=f_shim_max(c4); L1=length(p1); L2=length(p2); L3=length(p3); L4=length(p4); if L1==L2 cmp1=comp_t(p1,p2,t); elseif L2==L3 cmp2=comp_t(p2,p3,t); elseif L3==L4 cmp3=comp_t(p3,p4,t); end if ~isempty(cmp1) gci=[gci,p1'];

Final Program q=[q,q1']; d=[d,d1']; elseif ~isempty(cmp2) gci=[gci,p2']; q=[q,q2']; d=[d,d2']; elseif ~isempty(cmp3) gci=[gci,p3']; q=[q,q3']; d=[d,d3']; elseif isempty(cmp1)& isempty(cmp2) d=[d,zeros(1,1)]; end a=[a g]; % b=[b g2];

Final Program d=smooth_d(d); p=d./fs; L5=length(gci); L6=length(p); L7=abs(L5-L6); m=mean(p); fo=1/m; m1=max(p); m2=min(f_wz(p)); fh=1/m2; fl=1/m1; jit=jita(p); jitt=jitter(p); shdB=shimdB(q,L6); sh=shimmer(q,L6); GNE=max(a);

Final Program %GNE2=max(b); disp('Fundamental frequency ='); disp(fo); disp('Highest frequency='); disp(fh); disp('Lowest frequency='); disp(fl); disp('Jita ='); disp(jit); disp('Jitter in percentage'); disp(jitt); disp('Shimmer in dB ='); disp(shdB); disp('shimmer in percentage='); disp(sh);

Final Program disp('Press any key for plot'); pause; if L5==L6 stairs(gci,p); xlabel('Number of Samples'); ylabel('Pitch period in msec'); title('Pitch contour'); elseif L5<L6 gci=[gci,zeros(1,L7)]; stairs(gci,p); xlabel('Number of Samples'); ylabel('Pitch period in msec'); title('Pitch contour'); else p=[p,zeros(1,L7)]; stairs(gci,p); xlabel('Number of Samples'); ylabel('Pitch period in msec'); title('Pitch contour'); end

Results and Observations Enter x=1 for male voice Enter x=1 for male voice Enter x=2 for female voice Enter x=2 for female voice Enter the value of x =1 Enter the value of x =1 Fundamental frequency = Fundamental frequency = Highest frequency= Highest frequency= e e+003 Lowest frequency= Lowest frequency= Jita = Jita = Jitter in percentage Jitter in percentage

Results and observations Jitter in percentage Jitter in percentage Shimmer in dB = Shimmer in dB = shimmer in percentage= shimmer in percentage= Press any key for plot Press any key for plot >> >> Variables created in current workspace. Variables created in current workspace. >> >>

QUESTIONS??????? QUESTIONS???????