Download presentation
Presentation is loading. Please wait.
Published byBrittney Ami Sherman Modified over 9 years ago
1
Pitch Estimation by Enhanced Super Resolution determinator By Sunya Santananchai Chia-Ho Ling
2
Objective Estimate value of the fundamental frequency of speech by using Enhance Super Resolution determinator (eSRFD)
3
Introduction The fundamental frequency of speech is defined as the rate of glottal pluses generated by the vibration of the vocal folds. The pitch of speech is the perceptual correlate of fundamental frequency. The fundamental frequency of speech is important in the prosodic features of stress and intonation.
4
fundamental frequency determination Algorithm (FDAs). Determine the fundamental frequency of speech waveform or analyzing the pitch automatically. Desire to examine methods of fundamental frequency extraction which use radically different techniques
5
The algorithms to determine the Cepstrum-based determinator (CFD) (Noll, 1969). Harmonic product spectrum (HPS) (Schroeder, 1968; Noll, 1970) Feature-based tracker (FBFT) (Phillips, 1985) Parallel processing method (PP) (Gold & Rabiner, 1969) Integrated tracking algorithm (IFTA) (Secrest & Doddington, 1983) Super resolution determinator (SRFD) (Medan et al., 1991)
6
Enhance Super Resolution determinator (eSRFD) based on the SRFD method which uses a waveform similarity metric normalized cross-correlation coefficient. Performances of the SRFD algorithm, to reduced the occurrence of errors.
7
The eSRFD algorithm Pass the speech waveform to low-pass filter. The speech waveform is initially low-pass filtered.
8
Each frame of filtered sample data processed by the silence detector. Signal is analysed frame-by-frame; interval 6.4 ms of non-overlapping. Contains a set of samples Divided 3 consecutive segment
9
Analysis segments for the enhanced super resolution determinator
10
Normalized cross-correlation for ‘voiced’ frame: If frame of data is not classified as silence or unvoice, then candidate values for the fundamental period by using the first normalized cross-correlation of
11
Definition threshold for candidate value Candidate values of the fundamental period are obtained by locating peaks in the normalized crosscorrelation coefficient for which the value of exceeds a specified the threshold.
12
A second normalized cross-correlation coefficient. The frame is classified as ‘voiced’ which has > Determined the second normalized cross- correlation coefficient
13
Candidate score for Candidates for exceeds the threshold are given a score of 2, others are 1. are given a score of 2, others are 1. If there are 1 or more candidates with a score of 2 in a frame, then all those candidates with a score of 1 are removed from the list of candidates. If there is only one candidate (with score 1 or 2), the candidate is assumed to be the best estimate of the fundamental period of that frame.
14
Otherwise, an optimal fundamental period is sought from the set of remaining candidates, calculated the coefficient of each candidate. The first coefficient is assumed to be the optimal value. If the subsequent * 0.77 > the current optimal value, the subsequent is the optimal value.
16
In the case of only 1 candidate score 1 but no candidate score2, the frame status will be reconsidered depends on the frames state of previous frame. If the previous frame is ‘silent’, the current value is hold and depends on the next frame. If the next frame is also ‘silent’, the current frame will be considered as ‘silent’. Otherwise, the current frame is considered as ‘voiced’ and the held will be considered as the good estimation for the current frame.
17
Modification apply biasing to and Biasing is applied if the following conditions The two previous frames were classified as ‘voiced’ The value of the previous frame is not being temporarily held. The of previous frame is less than 7/4 *( of its preceding voiced frame ), and greater than 5/8* The biasing tends to increase the percentage of unvoiced regions of speech being incorrectly classified as ‘voiced’.
18
Calculate the fundamental period: The fundamental period for the frame is estimated by calculate
19
Implementation In this report will be cover the eSRFD algorithm, implementation by MATLAB ver 7.2b to program following by eSRFD algoithm
21
The Result
23
Conclusion The acoustic correlate of pitch is the fundamental frequency of speech. Enhance SRFD (eSRFD) is the performances of the SRFD which can reduce the occurrence of error involved in the extraction of fundamental frequency[1]. It have occurrence error in the result which depend on kind of speech waveform. In addition, the result in this project has more occurrence error than Paul Baghaw’s result[2] because of the problem from design to implement programming follow by eSRFD algorithm.
24
References [1] Pual Christopher Bagshaw (1994). Automatic prosodic analysis for computer aided pronunciation teaching. The university of Edinburgh. [2] Bagshaw, Paul C, Hiller, S M, Jack, Mervyn A (1993). Enhanced pitch tracking and the processing of f0 contours for computer aided intonation teaching. International Speech Communication Association. In Proc. Eurospeech '93, Berlin, volume 2, pages 1003- 1006, 1993.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.