Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old.

Similar presentations


Presentation on theme: "1 A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old."— Presentation transcript:

1 1 A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old Dominion University, Norfolk, VA 23529, USA. * Currently at Binghamton University 09/17/2006

2 2 Outline  Introduction  Algorithm Algorithm overview The use of nonlinear processing Pitch tracking from the spectrum  Experimental evaluation  Conclusion

3 3 Introduction  Pitch(the fundamental frequency) applications Automatic speech recognition (ASR), speech synthesis, speech articulation training aids, etc.  Pitch detection algorithms “Robust and accurate fundamental frequency estimation based on dominant harmonic components,” Nakatani, etc => High accuracy for noisy speech reported using the harmonic dominance spectrum “Yet another algorithm for pitch tracking(YAAPT),” Zahorian, etc => Hybrid spectral-temporal processing for pitch tracking

4 4 Algorithm Overview

5 5  Restoration of missing fundamental in telephone speech  A periodic sound is characterized by the spectrum of its harmonics The signal the fundamental missed be approximated as After squaring and applying trigonometric identities 1 st harmonic2 nd harmonicFundamental The Use of Nonlinear Processing The fundamental reappears

6 6 Illustration of Nonlinear Processing  The telephone speech signal (top panel) and squared telephone signal (bottom panel) for one frame

7 7 Illustration of Nonlinear Processing  The magnitude spectrum for the telephone (top panel) and nonlinear processed signal (bottom panel)

8 8 Spectral Effects from Nonlinear Processing  The missing fundamental in the telephone speech (top panel) is restored in the squared signal (bottom panel)

9 9 Pitch Tracking From the Spectrum  The pitch track from the spectrum refines the pitch candidates estimated from the temporal method  To achieve a noise robust pitch track from the spectrum, an autocorrelation type of function is proposed

10 10  The function takes into account multiple harmonics  Equation WL k 2k 3k 4k Autocorrelation type of Function : The spectrum, : Window length (20Hz): The number of harmonics (3), : Frequency index, XXX

11 11 Peaks in Autocorrelation Type of Function A very prominent peak is observed in the proposed function

12 12 Candidate Insertion to Reduce Pitch Doubling/Halving  If all candidates are larger than a threshold (typically 150 Hz), an additional candidate is inserted at half the frequency of the highest-ranking candidate  Similar logic is used to reduce pitch halving P2 (Hz)= P1 (Hz)/ 2

13 13 Experimental Evaluation  Database Keele pitch extraction database 5 male and 5 female speakers, about 35seconds speaker High quality speech and telephone speech Additive Gaussian noise  Controls (reference pitch) Control C1: supplied in Keele database Control C2: computed from the laryngograph signal with the proposed algorithm

14 14 Definition of Error Measures  Gross error The percentage of frames such that the pitch estimate of the tracker deviates significantly (typically 20%) from the reference pitch (control) Only evaluated in the voiced sections of the reference

15 15 Experiment 1 Results  Individual performance of the proposed algorithm ControlStudio, Clean (%) Studio, 5dB Noise(%) Telephone, Clean (%) Telephone, 5dB Noise(%) YAAPTC14.267.628.1417.85 YAAPT*C11.591.992.694.48 Spectral method C14.234.456.526.95 NCCFC13.584.528.0016.61 YAAPT*: Using control C1 for the spectral pitch track NCCF : Normalized cross correlation function, used as the temporal method in YAPPT

16 16 Experiment 2 Results  The results of the new method with various error thresholds Error Threshold ControlStudio, Clean (%) Studio, 5dB Noise(%) Telephone, Clean (%) Telephone, 5dB Noise(%) 10%C15.467.319.3916.14 10%C24.186.067.7714.78 20%C12.903.654.867.45 20%C21.562.163.275.85 40%C12.252.442.753.63 40%C20.911.060.992.05

17 17 Comparisons  DASH, REPS, YIN: the results are reported in “Robust and accurate fundamental frequency estimation...,” Nakatani, etc.  *: SRAEN filter simulated telephone speech Control Studio, Clean (%) Studio, 5dB Noise(%) Telephone, Clean (%) Telephone, 5dB Noise(%) Proposed Method C12.903.654.86(4.52 *)7.45(5.90 *) DASHC12.812.323.73*4.15 * REPSC12.682.986.91*8.49 * YINC12.577.227.55*14.6*

18 18 Conclusion  A new pitch-tracking algorithm has been developed which combines multiple information sources to enable accurate robust F0 tracking  An analysis of errors indicates better performance for both high quality and telephone speech than previously reported performance for pitch tracking  Acknowledgements This work was partially supported by JWFC 900


Download ppt "1 A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old."

Similar presentations


Ads by Google