Overview of Real-Time Pitch Tracking Approaches Music information retrieval seminar McGill University Francois Thibault.

Slides:



Advertisements
Similar presentations
An Approach in Reproducing the Auto-Tune Effect Mentees: Dong-San Choi & Tejas Rawal Mentor: David Jun.
Advertisements

Acoustic/Prosodic Features
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
Analysis and Digital Implementation of the Talk Box Effect Yuan Chen Advisor: Professor Paul Cuff.
1 A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old.
Masters Presentation at Griffith University Master of Computer and Information Engineering Magnus Nilsson
DFT/FFT and Wavelets ● Additive Synthesis demonstration (wave addition) ● Standard Definitions ● Computing the DFT and FFT ● Sine and cosine wave multiplication.
CS 551/651: Structure of Spoken Language Lecture 11: Overview of Sound Perception, Part II John-Paul Hosom Fall 2010.
Content-based retrieval of audio Francois Thibault MUMT 614B McGill University.
An Overview of Pitch Detection Algorithms Alexandre Savard MUMT611: Music Information Acquisition, Preservation, and Retrieval February 2006.
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Pitch Detection and Tracking Juhan Nam 1.
A Robust Algorithm for Pitch Tracking David Talkin Hsiao-Tsung Hung.
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
2004 COMP.DSP CONFERENCE Survey of Noise Reduction Techniques Maurice Givens.
Complete Discrete Time Model Complete model covers periodic, noise and impulsive inputs. For periodic input 1) R(z): Radiation impedance. It has been shown.
Xkl: A Tool For Speech Analysis Eric Truslow Adviser: Helen Hanson.
1 Machine learning for note onset detection. Alexandre Lacoste & Douglas Eck.
RAKE Receiver Marcel Bautista February 12, Propagation of Tx Signal.
Effective Bits. An ideal model of a digital waveform recorder OffsetGain Sampling Timebase oscillator Fs ADC Waveform Memory Address counter Compute Engine.
Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2006 with a lot of slides stolen from Steve Seitz and.
A PRESENTATION BY SHAMALEE DESHPANDE
Spectral Analysis Spectral analysis is concerned with the determination of the energy or power spectrum of a continuous-time signal It is assumed that.
 Distortion – the alteration of the original shape of a waveform.  Function of distortion analyzer: measuring the extent of distortion (the o/p differs.
Basics of Signal Processing. frequency = 1/T  speed of sound × T, where T is a period sine wave period (frequency) amplitude phase.
Source/Filter Theory and Vowels February 4, 2010.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Audio and Music Representations (Part 2) 1.
LE 460 L Acoustics and Experimental Phonetics L-13
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
Lecture 1 Signals in the Time and Frequency Domains
Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
Speech Enhancement Using Spectral Subtraction
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
Chapter 16 Speech Synthesis Algorithms 16.1 Synthesis based on LPC 16.2 Synthesis based on formants 16.3 Synthesis based on homomorphic processing 16.4.
Module 2 SPECTRAL ANALYSIS OF COMMUNICATION SIGNAL.
1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
Basics of Neural Networks Neural Network Topologies.
1 Spectral filtering for CW searches S. D’Antonio *, S. Frasca %&, C. Palomba & * INFN Roma2 % Universita’ di Roma “La Sapienza” & INFN Roma Abstract:
INF380 - Proteomics-101 INF380 – Proteomics Chapter 10 – Spectral Comparison Spectral comparison means that an experimental spectrum is compared to theoretical.
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
Pitch Estimation by Enhanced Super Resolution determinator By Sunya Santananchai Chia-Ho Ling.
Demodulation of DSB-SC AM Signals
Performance Comparison of Speaker and Emotion Recognition
More On Linear Predictive Analysis
Introduction to psycho-acoustics: Some basic auditory attributes For audio demonstrations, click on any loudspeaker icons you see....
Autoregressive (AR) Spectral Estimation
Pitch Tracking MUMT 611 Philippe Zaborowski February 2005.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
Yi Jiang MS Thesis 1 Yi Jiang Dept. Of Electrical and Computer Engineering University of Florida, Gainesville, FL 32611, USA Array Signal Processing in.
Piano Music Transcription Wes “Crusher” Hatch MUMT-614 Thurs., Feb.13.
Digital Image Processing Image Enhancement in Spatial Domain
Comparison of filters for burst detection M.-A. Bizouard on behalf of the LAL-Orsay group GWDAW 7 th IIAS-Kyoto 2002/12/19.
Spectral Analysis Spectral analysis is concerned with the determination of the energy or power spectrum of a continuous-time signal It is assumed that.
ARTIFICIAL NEURAL NETWORKS
Vocoders.
Linear Prediction.
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
Pitch Estimation By Chih-Ti Shih 12/11/2006 Chih-Ti Shih.
ECE 477 DESIGN REVIEW PART 2 Team 14
Digital Systems: Hardware Organization and Design
Presenter: Shih-Hsiang(士翔)
Combination of Feature and Channel Compensation (1/2)
Music Signal Processing
Presentation transcript:

Overview of Real-Time Pitch Tracking Approaches Music information retrieval seminar McGill University Francois Thibault

Presentation Goals Describe the requirements of RT pitch tracking algorithm for musical applications Describe the requirements of RT pitch tracking algorithm for musical applications Briefly introduce key developments in RT pitch tracking algorithms Briefly introduce key developments in RT pitch tracking algorithms Provide insight on what techniques might be more suitable for a given application Provide insight on what techniques might be more suitable for a given application

Pitch tracking requirements in musical context Must often function in real-time Must often function in real-time Minimal output latency Minimal output latency Accuracy in the presence of noise Accuracy in the presence of noise Frequency resolution Frequency resolution Flexibility and adaptability to various musical requirements: Flexibility and adaptability to various musical requirements: Pitch range Pitch range Dynamic range Dynamic range …

Overview of techniques Time-domain methods Time-domain methods Autocorrelation Function (Rabiner 77) Autocorrelation Function (Rabiner 77) Average Magnitude Difference Function (AMDF) Average Magnitude Difference Function (AMDF) Fundamental Period Measurement (Kuhn 90) Fundamental Period Measurement (Kuhn 90) Frequency-domain methods Frequency-domain methods Cepstrum (Noll 66) Cepstrum (Noll 66) Harmonic Product Spectrum (Schroeder 68) Harmonic Product Spectrum (Schroeder 68) Constant-Q transform (Brown 92) Constant-Q transform (Brown 92) Least-Squares fitting (Choi 97) Least-Squares fitting (Choi 97) Maximum Likelihood (McAulay 86, Puckette 98) Maximum Likelihood (McAulay 86, Puckette 98) Other approaches… Other approaches…

Autocorrelation method Based on the fact that periodic signal will correlate strongly with itself offset by the fundamental period Based on the fact that periodic signal will correlate strongly with itself offset by the fundamental period Measures to which extent a signal correlates with a time-shifted version of itself Measures to which extent a signal correlates with a time-shifted version of itself The time shifts which display peaks in the ACF corresponds to likely period estimate The time shifts which display peaks in the ACF corresponds to likely period estimate

Autocorrelation Pros/Cons Simple implementation (good for hardware) Simple implementation (good for hardware) Can handle poor quality signals (phase insensitive) Can handle poor quality signals (phase insensitive) Often requires preprocessing (spectral flattening) Often requires preprocessing (spectral flattening) Poor resolution for high frequencies Poor resolution for high frequencies Analysis parameters hard to tune Analysis parameters hard to tune Uncertainty between peaks generated by formants and periodicity of sound can lead to wrong estimation Uncertainty between peaks generated by formants and periodicity of sound can lead to wrong estimation

AMDF Again based on the idea that a periodic signal will be similar to itself when shifted by fundamental period Again based on the idea that a periodic signal will be similar to itself when shifted by fundamental period Similar in concept to ACF, but looks at with time shifted version of itself Similar in concept to ACF, but looks at difference with time shifted version of itself The time shifts which display valleys correspond to likely period estimates The time shifts which display valleys correspond to likely period estimates

AMDF Pros/Cons Poor frequency resolution Poor frequency resolution Even simpler implementation then ACF (good for hardware) Even simpler implementation then ACF (good for hardware) Less computationally expensive then ACF Less computationally expensive then ACF Combination of AMDF and ACF yields result more robust to noise (Kobayashi 95) Combination of AMDF and ACF yields result more robust to noise (Kobayashi 95)

Fundamental Period Measurement approach Signal is first ran through bank of half-octave bandpass filters Signal is first ran through bank of half-octave bandpass filters If filters are sharp enough, the output of one filter should display the input waveform freed of its upper partials (nearly sinusoidal) If filters are sharp enough, the output of one filter should display the input waveform freed of its upper partials (nearly sinusoidal) It is up to a decision algorithm to decide which filter output corresponds to fundamental frequency It is up to a decision algorithm to decide which filter output corresponds to fundamental frequency Time between zero crossings of that filter output determines period Time between zero crossings of that filter output determines period

FPM Pros/Cons Easy implementation (hardware and software) Easy implementation (hardware and software) Efficiency of computation Efficiency of computation Decision algorithm highly dependent on thresholds Decision algorithm highly dependent on thresholds But, automatic threshold setting provided for most situations But, automatic threshold setting provided for most situations

Cepstrum approach Tool often used in speech processing Tool often used in speech processing Cepstrum is defined as power spectrum of logarithm of the power spectrum Cepstrum is defined as power spectrum of logarithm of the power spectrum Clearly separate contribution of vocal tract and excitation Clearly separate contribution of vocal tract and excitation A strong peak is displayed in the excitation part (high cepstral region) at the fundamental frequency A strong peak is displayed in the excitation part (high cepstral region) at the fundamental frequency Use a peak picker on cepstrum and translate quefrency into fundamental frequency Use a peak picker on cepstrum and translate quefrency into fundamental frequency

Cepstrum Pros/Cons Less confusion between candidates than in ACF Less confusion between candidates than in ACF Proven method, especially suitable for signal easily characterized by source-filter models (e.g. voice) Proven method, especially suitable for signal easily characterized by source-filter models (e.g. voice) Relatively computationally intensive (2 FFTs) Relatively computationally intensive (2 FFTs)

Harmonic Product Spectrum approach Measures the maximum coincidence of harmonics for each spectral frame Measures the maximum coincidence of harmonics for each spectral frame Resulting periodic correlation array is searched for maximum which should correspond to fundamental frequency Resulting periodic correlation array is searched for maximum which should correspond to fundamental frequency Algorithm ran for octave correction Algorithm ran for octave correction

HPS Pros/Cons Simple to implement Simple to implement Does well under wide variety of conditions Does well under wide variety of conditions Poor low frequency resolution Poor low frequency resolution Computing complexity augmented by zero padding required for interpolation of low frequencies Computing complexity augmented by zero padding required for interpolation of low frequencies Requires post-processing for error correction Requires post-processing for error correction

Constant-Q transform approach First computes the Constant-Q transform to obtain constant pattern in log frequency domain (Q = fc/bw) First computes the Constant-Q transform to obtain constant pattern in log frequency domain (Q = fc/bw) Compute the cross-correlation with a fixed comb pattern (ideal partial positions for given fundamental frequency) Compute the cross-correlation with a fixed comb pattern (ideal partial positions for given fundamental frequency) Peak-pick the result to obtain fundamental frequency Peak-pick the result to obtain fundamental frequency

Constant-Q Pros/Cons Complexity of constant-Q reduced but still… (Brown and Puckette 91) Complexity of constant-Q reduced but still… (Brown and Puckette 91) Sensitive to octave errors Sensitive to octave errors Other peaks could be candidates Other peaks could be candidates

Least-Squares fitting approach Perform least-squares spectral analysis --> minimize error by fitting sinusoids to the signal segment Perform least-squares spectral analysis --> minimize error by fitting sinusoids to the signal segment Strong sinusoidal components are identified as sharp valleys in least-square error signal Strong sinusoidal components are identified as sharp valleys in least-square error signal Relatively few evaluation of the error signal are required to identify a valley Relatively few evaluation of the error signal are required to identify a valley Fundamental frequency is obtained as average of partial frequencies over their partial number Fundamental frequency is obtained as average of partial frequencies over their partial number Uses rectangular windowing to provide faster response Uses rectangular windowing to provide faster response

LS fitting Pros/Cons Operates on shorter frame segments Operates on shorter frame segments Best option for real-time applications with minimum latency requirements Best option for real-time applications with minimum latency requirements Efficient evaluation scheme allows reasonable computation complexity Efficient evaluation scheme allows reasonable computation complexity

Maximum Likelihood Maximum likelihood algorithm searches trough a set of possible ideal spectra and chooses closest match (Noll 69) Maximum likelihood algorithm searches trough a set of possible ideal spectra and chooses closest match (Noll 69) Was adapted to sinusoidal modeling theory, by finding best fit for harmonic partials sets to the measured model (McAulay 86) Was adapted to sinusoidal modeling theory, by finding best fit for harmonic partials sets to the measured model (McAulay 86) Enhance discrimination by suppressing partials of small amplitude values Enhance discrimination by suppressing partials of small amplitude values

ML Pros/Cons Inherits high computational requirement from sinusoidal modeling Inherits high computational requirement from sinusoidal modeling Very robust estimation Very robust estimation Allows guess of fundamental frequency even with several partials missing. Allows guess of fundamental frequency even with several partials missing.

Other approaches Neural Nets (Barnar 91) Neural Nets (Barnar 91) Hidden Markov Models (Doval 91) Hidden Markov Models (Doval 91) Parrallel processing approaches (Rabiner 69) Parrallel processing approaches (Rabiner 69) Fourier of Fourier transforms (Marchand 2001) Fourier of Fourier transforms (Marchand 2001) Two-way mismatch model (Cano 98) Two-way mismatch model (Cano 98) Subharmonic to harmonic ratio (Sun 2000) Subharmonic to harmonic ratio (Sun 2000)

Conclusions Lot of research still… Motivated by speech telecommunication Lot of research still… Motivated by speech telecommunication Abundant literature since 1950 Abundant literature since 1950 Complete and objective performance overviews seems missing Complete and objective performance overviews seems missing Combination of techniques in parallel processing seems foreseeable with today’s fast computers Combination of techniques in parallel processing seems foreseeable with today’s fast computers