Pitch Estimation by Enhanced Super Resolution determinator By Sunya Santananchai Chia-Ho Ling.

Pitch Estimation by Enhanced Super Resolution determinator By Sunya Santananchai Chia-Ho Ling

Objective  Estimate value of the fundamental frequency of speech by using Enhance Super Resolution determinator (eSRFD)

Introduction  The fundamental frequency of speech is defined as the rate of glottal pluses generated by the vibration of the vocal folds.  The pitch of speech is the perceptual correlate of fundamental frequency.  The fundamental frequency of speech is important in the prosodic features of stress and intonation.

fundamental frequency determination Algorithm (FDAs).  Determine the fundamental frequency of speech waveform or analyzing the pitch automatically.  Desire to examine methods of fundamental frequency extraction which use radically different techniques

The algorithms to determine the  Cepstrum-based determinator (CFD) (Noll, 1969).  Harmonic product spectrum (HPS) (Schroeder, 1968; Noll, 1970)  Feature-based tracker (FBFT) (Phillips, 1985)  Parallel processing method (PP) (Gold & Rabiner, 1969)  Integrated tracking algorithm (IFTA) (Secrest & Doddington, 1983)  Super resolution determinator (SRFD) (Medan et al., 1991)

Enhance Super Resolution determinator (eSRFD)  based on the SRFD method which uses a waveform similarity metric normalized cross-correlation coefficient.  Performances of the SRFD algorithm, to reduced the occurrence of errors.

The eSRFD algorithm  Pass the speech waveform to low-pass filter.  The speech waveform is initially low-pass filtered.

 Each frame of filtered sample data processed by the silence detector.  Signal is analysed frame-by-frame; interval 6.4 ms of non-overlapping.  Contains a set of samples  Divided 3 consecutive segment

Analysis segments for the enhanced super resolution determinator

 Normalized cross-correlation for ‘voiced’ frame:  If frame of data is not classified as silence or unvoice, then candidate values for the fundamental period by using the first normalized cross-correlation of

 Definition threshold for candidate value  Candidate values of the fundamental period are obtained by locating peaks in the normalized crosscorrelation coefficient for which the value of exceeds a specified the threshold.

 A second normalized cross-correlation coefficient.  The frame is classified as ‘voiced’ which has >  Determined the second normalized cross- correlation coefficient

 Candidate score for  Candidates for exceeds the threshold are given a score of 2, others are 1. are given a score of 2, others are 1.  If there are 1 or more candidates with a score of 2 in a frame, then all those candidates with a score of 1 are removed from the list of candidates.  If there is only one candidate (with score 1 or 2), the candidate is assumed to be the best estimate of the fundamental period of that frame.

 Otherwise, an optimal fundamental period is sought from the set of remaining candidates, calculated the coefficient of each candidate.  The first coefficient is assumed to be the optimal value. If the subsequent * 0.77 > the current optimal value, the subsequent is the optimal value.

 In the case of only 1 candidate score 1 but no candidate score2, the frame status will be reconsidered depends on the frames state of previous frame.  If the previous frame is ‘silent’, the current value is hold and depends on the next frame.  If the next frame is also ‘silent’, the current frame will be considered as ‘silent’.  Otherwise, the current frame is considered as ‘voiced’ and the held will be considered as the good estimation for the current frame.

 Modification apply biasing to and  Biasing is applied if the following conditions  The two previous frames were classified as ‘voiced’  The value of the previous frame is not being temporarily held.  The of previous frame is less than 7/4 *( of its preceding voiced frame ), and greater than 5/8*  The biasing tends to increase the percentage of unvoiced regions of speech being incorrectly classified as ‘voiced’.

 Calculate the fundamental period:  The fundamental period for the frame is estimated by calculate

Implementation  In this report will be cover the eSRFD algorithm, implementation by MATLAB ver 7.2b to program following by eSRFD algoithm

The Result

Conclusion  The acoustic correlate of pitch is the fundamental frequency of speech.  Enhance SRFD (eSRFD) is the performances of the SRFD which can reduce the occurrence of error involved in the extraction of fundamental frequency[1].  It have occurrence error in the result which depend on kind of speech waveform.  In addition, the result in this project has more occurrence error than Paul Baghaw’s result[2] because of the problem from design to implement programming follow by eSRFD algorithm.

References  [1] Pual Christopher Bagshaw (1994). Automatic prosodic analysis for computer aided pronunciation teaching. The university of Edinburgh.  [2] Bagshaw, Paul C, Hiller, S M, Jack, Mervyn A (1993). Enhanced pitch tracking and the processing of f0 contours for computer aided intonation teaching. International Speech Communication Association. In Proc. Eurospeech '93, Berlin, volume 2, pages 1003- 1006, 1993.

Pitch Estimation by Enhanced Super Resolution determinator By Sunya Santananchai Chia-Ho Ling.

Similar presentations

Presentation on theme: "Pitch Estimation by Enhanced Super Resolution determinator By Sunya Santananchai Chia-Ho Ling."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Pitch Estimation by Enhanced Super Resolution determinator By Sunya Santananchai Chia-Ho Ling.

Similar presentations

Presentation on theme: "Pitch Estimation by Enhanced Super Resolution determinator By Sunya Santananchai Chia-Ho Ling."— Presentation transcript:

Similar presentations

About project

Feedback