Overview of Real-Time Pitch Tracking Approaches Music information retrieval seminar McGill University Francois Thibault
Presentation Goals Describe the requirements of RT pitch tracking algorithm for musical applications Describe the requirements of RT pitch tracking algorithm for musical applications Briefly introduce key developments in RT pitch tracking algorithms Briefly introduce key developments in RT pitch tracking algorithms Provide insight on what techniques might be more suitable for a given application Provide insight on what techniques might be more suitable for a given application
Pitch tracking requirements in musical context Must often function in real-time Must often function in real-time Minimal output latency Minimal output latency Accuracy in the presence of noise Accuracy in the presence of noise Frequency resolution Frequency resolution Flexibility and adaptability to various musical requirements: Flexibility and adaptability to various musical requirements: Pitch range Pitch range Dynamic range Dynamic range …
Overview of techniques Time-domain methods Time-domain methods Autocorrelation Function (Rabiner 77) Autocorrelation Function (Rabiner 77) Average Magnitude Difference Function (AMDF) Average Magnitude Difference Function (AMDF) Fundamental Period Measurement (Kuhn 90) Fundamental Period Measurement (Kuhn 90) Frequency-domain methods Frequency-domain methods Cepstrum (Noll 66) Cepstrum (Noll 66) Harmonic Product Spectrum (Schroeder 68) Harmonic Product Spectrum (Schroeder 68) Constant-Q transform (Brown 92) Constant-Q transform (Brown 92) Least-Squares fitting (Choi 97) Least-Squares fitting (Choi 97) Maximum Likelihood (McAulay 86, Puckette 98) Maximum Likelihood (McAulay 86, Puckette 98) Other approaches… Other approaches…
Autocorrelation method Based on the fact that periodic signal will correlate strongly with itself offset by the fundamental period Based on the fact that periodic signal will correlate strongly with itself offset by the fundamental period Measures to which extent a signal correlates with a time-shifted version of itself Measures to which extent a signal correlates with a time-shifted version of itself The time shifts which display peaks in the ACF corresponds to likely period estimate The time shifts which display peaks in the ACF corresponds to likely period estimate
Autocorrelation Pros/Cons Simple implementation (good for hardware) Simple implementation (good for hardware) Can handle poor quality signals (phase insensitive) Can handle poor quality signals (phase insensitive) Often requires preprocessing (spectral flattening) Often requires preprocessing (spectral flattening) Poor resolution for high frequencies Poor resolution for high frequencies Analysis parameters hard to tune Analysis parameters hard to tune Uncertainty between peaks generated by formants and periodicity of sound can lead to wrong estimation Uncertainty between peaks generated by formants and periodicity of sound can lead to wrong estimation
AMDF Again based on the idea that a periodic signal will be similar to itself when shifted by fundamental period Again based on the idea that a periodic signal will be similar to itself when shifted by fundamental period Similar in concept to ACF, but looks at with time shifted version of itself Similar in concept to ACF, but looks at difference with time shifted version of itself The time shifts which display valleys correspond to likely period estimates The time shifts which display valleys correspond to likely period estimates
AMDF Pros/Cons Poor frequency resolution Poor frequency resolution Even simpler implementation then ACF (good for hardware) Even simpler implementation then ACF (good for hardware) Less computationally expensive then ACF Less computationally expensive then ACF Combination of AMDF and ACF yields result more robust to noise (Kobayashi 95) Combination of AMDF and ACF yields result more robust to noise (Kobayashi 95)
Fundamental Period Measurement approach Signal is first ran through bank of half-octave bandpass filters Signal is first ran through bank of half-octave bandpass filters If filters are sharp enough, the output of one filter should display the input waveform freed of its upper partials (nearly sinusoidal) If filters are sharp enough, the output of one filter should display the input waveform freed of its upper partials (nearly sinusoidal) It is up to a decision algorithm to decide which filter output corresponds to fundamental frequency It is up to a decision algorithm to decide which filter output corresponds to fundamental frequency Time between zero crossings of that filter output determines period Time between zero crossings of that filter output determines period
FPM Pros/Cons Easy implementation (hardware and software) Easy implementation (hardware and software) Efficiency of computation Efficiency of computation Decision algorithm highly dependent on thresholds Decision algorithm highly dependent on thresholds But, automatic threshold setting provided for most situations But, automatic threshold setting provided for most situations
Cepstrum approach Tool often used in speech processing Tool often used in speech processing Cepstrum is defined as power spectrum of logarithm of the power spectrum Cepstrum is defined as power spectrum of logarithm of the power spectrum Clearly separate contribution of vocal tract and excitation Clearly separate contribution of vocal tract and excitation A strong peak is displayed in the excitation part (high cepstral region) at the fundamental frequency A strong peak is displayed in the excitation part (high cepstral region) at the fundamental frequency Use a peak picker on cepstrum and translate quefrency into fundamental frequency Use a peak picker on cepstrum and translate quefrency into fundamental frequency
Cepstrum Pros/Cons Less confusion between candidates than in ACF Less confusion between candidates than in ACF Proven method, especially suitable for signal easily characterized by source-filter models (e.g. voice) Proven method, especially suitable for signal easily characterized by source-filter models (e.g. voice) Relatively computationally intensive (2 FFTs) Relatively computationally intensive (2 FFTs)
Harmonic Product Spectrum approach Measures the maximum coincidence of harmonics for each spectral frame Measures the maximum coincidence of harmonics for each spectral frame Resulting periodic correlation array is searched for maximum which should correspond to fundamental frequency Resulting periodic correlation array is searched for maximum which should correspond to fundamental frequency Algorithm ran for octave correction Algorithm ran for octave correction
HPS Pros/Cons Simple to implement Simple to implement Does well under wide variety of conditions Does well under wide variety of conditions Poor low frequency resolution Poor low frequency resolution Computing complexity augmented by zero padding required for interpolation of low frequencies Computing complexity augmented by zero padding required for interpolation of low frequencies Requires post-processing for error correction Requires post-processing for error correction
Constant-Q transform approach First computes the Constant-Q transform to obtain constant pattern in log frequency domain (Q = fc/bw) First computes the Constant-Q transform to obtain constant pattern in log frequency domain (Q = fc/bw) Compute the cross-correlation with a fixed comb pattern (ideal partial positions for given fundamental frequency) Compute the cross-correlation with a fixed comb pattern (ideal partial positions for given fundamental frequency) Peak-pick the result to obtain fundamental frequency Peak-pick the result to obtain fundamental frequency
Constant-Q Pros/Cons Complexity of constant-Q reduced but still… (Brown and Puckette 91) Complexity of constant-Q reduced but still… (Brown and Puckette 91) Sensitive to octave errors Sensitive to octave errors Other peaks could be candidates Other peaks could be candidates
Least-Squares fitting approach Perform least-squares spectral analysis --> minimize error by fitting sinusoids to the signal segment Perform least-squares spectral analysis --> minimize error by fitting sinusoids to the signal segment Strong sinusoidal components are identified as sharp valleys in least-square error signal Strong sinusoidal components are identified as sharp valleys in least-square error signal Relatively few evaluation of the error signal are required to identify a valley Relatively few evaluation of the error signal are required to identify a valley Fundamental frequency is obtained as average of partial frequencies over their partial number Fundamental frequency is obtained as average of partial frequencies over their partial number Uses rectangular windowing to provide faster response Uses rectangular windowing to provide faster response
LS fitting Pros/Cons Operates on shorter frame segments Operates on shorter frame segments Best option for real-time applications with minimum latency requirements Best option for real-time applications with minimum latency requirements Efficient evaluation scheme allows reasonable computation complexity Efficient evaluation scheme allows reasonable computation complexity
Maximum Likelihood Maximum likelihood algorithm searches trough a set of possible ideal spectra and chooses closest match (Noll 69) Maximum likelihood algorithm searches trough a set of possible ideal spectra and chooses closest match (Noll 69) Was adapted to sinusoidal modeling theory, by finding best fit for harmonic partials sets to the measured model (McAulay 86) Was adapted to sinusoidal modeling theory, by finding best fit for harmonic partials sets to the measured model (McAulay 86) Enhance discrimination by suppressing partials of small amplitude values Enhance discrimination by suppressing partials of small amplitude values
ML Pros/Cons Inherits high computational requirement from sinusoidal modeling Inherits high computational requirement from sinusoidal modeling Very robust estimation Very robust estimation Allows guess of fundamental frequency even with several partials missing. Allows guess of fundamental frequency even with several partials missing.
Other approaches Neural Nets (Barnar 91) Neural Nets (Barnar 91) Hidden Markov Models (Doval 91) Hidden Markov Models (Doval 91) Parrallel processing approaches (Rabiner 69) Parrallel processing approaches (Rabiner 69) Fourier of Fourier transforms (Marchand 2001) Fourier of Fourier transforms (Marchand 2001) Two-way mismatch model (Cano 98) Two-way mismatch model (Cano 98) Subharmonic to harmonic ratio (Sun 2000) Subharmonic to harmonic ratio (Sun 2000)
Conclusions Lot of research still… Motivated by speech telecommunication Lot of research still… Motivated by speech telecommunication Abundant literature since 1950 Abundant literature since 1950 Complete and objective performance overviews seems missing Complete and objective performance overviews seems missing Combination of techniques in parallel processing seems foreseeable with today’s fast computers Combination of techniques in parallel processing seems foreseeable with today’s fast computers