Download presentation
Presentation is loading. Please wait.
Published byKaylah Yoke Modified over 10 years ago
1
1 A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old Dominion University, Norfolk, VA 23529, USA. * Currently at Binghamton University 09/17/2006
2
2 Outline Introduction Algorithm Algorithm overview The use of nonlinear processing Pitch tracking from the spectrum Experimental evaluation Conclusion
3
3 Introduction Pitch(the fundamental frequency) applications Automatic speech recognition (ASR), speech synthesis, speech articulation training aids, etc. Pitch detection algorithms “Robust and accurate fundamental frequency estimation based on dominant harmonic components,” Nakatani, etc => High accuracy for noisy speech reported using the harmonic dominance spectrum “Yet another algorithm for pitch tracking(YAAPT),” Zahorian, etc => Hybrid spectral-temporal processing for pitch tracking
4
4 Algorithm Overview
5
5 Restoration of missing fundamental in telephone speech A periodic sound is characterized by the spectrum of its harmonics The signal the fundamental missed be approximated as After squaring and applying trigonometric identities 1 st harmonic2 nd harmonicFundamental The Use of Nonlinear Processing The fundamental reappears
6
6 Illustration of Nonlinear Processing The telephone speech signal (top panel) and squared telephone signal (bottom panel) for one frame
7
7 Illustration of Nonlinear Processing The magnitude spectrum for the telephone (top panel) and nonlinear processed signal (bottom panel)
8
8 Spectral Effects from Nonlinear Processing The missing fundamental in the telephone speech (top panel) is restored in the squared signal (bottom panel)
9
9 Pitch Tracking From the Spectrum The pitch track from the spectrum refines the pitch candidates estimated from the temporal method To achieve a noise robust pitch track from the spectrum, an autocorrelation type of function is proposed
10
10 The function takes into account multiple harmonics Equation WL k 2k 3k 4k Autocorrelation type of Function : The spectrum, : Window length (20Hz): The number of harmonics (3), : Frequency index, XXX
11
11 Peaks in Autocorrelation Type of Function A very prominent peak is observed in the proposed function
12
12 Candidate Insertion to Reduce Pitch Doubling/Halving If all candidates are larger than a threshold (typically 150 Hz), an additional candidate is inserted at half the frequency of the highest-ranking candidate Similar logic is used to reduce pitch halving P2 (Hz)= P1 (Hz)/ 2
13
13 Experimental Evaluation Database Keele pitch extraction database 5 male and 5 female speakers, about 35seconds speaker High quality speech and telephone speech Additive Gaussian noise Controls (reference pitch) Control C1: supplied in Keele database Control C2: computed from the laryngograph signal with the proposed algorithm
14
14 Definition of Error Measures Gross error The percentage of frames such that the pitch estimate of the tracker deviates significantly (typically 20%) from the reference pitch (control) Only evaluated in the voiced sections of the reference
15
15 Experiment 1 Results Individual performance of the proposed algorithm ControlStudio, Clean (%) Studio, 5dB Noise(%) Telephone, Clean (%) Telephone, 5dB Noise(%) YAAPTC14.267.628.1417.85 YAAPT*C11.591.992.694.48 Spectral method C14.234.456.526.95 NCCFC13.584.528.0016.61 YAAPT*: Using control C1 for the spectral pitch track NCCF : Normalized cross correlation function, used as the temporal method in YAPPT
16
16 Experiment 2 Results The results of the new method with various error thresholds Error Threshold ControlStudio, Clean (%) Studio, 5dB Noise(%) Telephone, Clean (%) Telephone, 5dB Noise(%) 10%C15.467.319.3916.14 10%C24.186.067.7714.78 20%C12.903.654.867.45 20%C21.562.163.275.85 40%C12.252.442.753.63 40%C20.911.060.992.05
17
17 Comparisons DASH, REPS, YIN: the results are reported in “Robust and accurate fundamental frequency estimation...,” Nakatani, etc. *: SRAEN filter simulated telephone speech Control Studio, Clean (%) Studio, 5dB Noise(%) Telephone, Clean (%) Telephone, 5dB Noise(%) Proposed Method C12.903.654.86(4.52 *)7.45(5.90 *) DASHC12.812.323.73*4.15 * REPSC12.682.986.91*8.49 * YINC12.577.227.55*14.6*
18
18 Conclusion A new pitch-tracking algorithm has been developed which combines multiple information sources to enable accurate robust F0 tracking An analysis of errors indicates better performance for both high quality and telephone speech than previously reported performance for pitch tracking Acknowledgements This work was partially supported by JWFC 900
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.