GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Pitch Detection and Tracking Juhan Nam 1.

Slides:



Advertisements
Similar presentations
Change-Point Detection Techniques for Piecewise Locally Stationary Time Series Michael Last National Institute of Statistical Sciences Talk for Midyear.
Advertisements

Acoustic/Prosodic Features
1 A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old.
November 12, 2013Computer Vision Lecture 12: Texture 1Signature Another popular method of representing shape is called the signature. In order to compute.
Easily extensible unix software for spectral analysis, display modification, and synthesis of musical sounds James W. Beauchamp School of Music Dept.
Multipitch Tracking for Noisy Speech
Franz de Leon, Kirk Martinez Web and Internet Science Group  School of Electronics and Computer Science  University of Southampton {fadl1d09,
DFT/FFT and Wavelets ● Additive Synthesis demonstration (wave addition) ● Standard Definitions ● Computing the DFT and FFT ● Sine and cosine wave multiplication.
Content-based retrieval of audio Francois Thibault MUMT 614B McGill University.
An Overview of Pitch Detection Algorithms Alexandre Savard MUMT611: Music Information Acquisition, Preservation, and Retrieval February 2006.
Overview of Real-Time Pitch Tracking Approaches Music information retrieval seminar McGill University Francois Thibault.
Look Who’s Talking Now SEM Exchange, Fall 2008 October 9, Montgomery College Speaker Identification Using Pitch Engineering Expo Banquet /08/09.
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
A.Diederich– International University Bremen – Sensation and Perception – Fall Frequency Analysis in the Cochlea and Auditory Nerve cont'd The Perception.
Content-Based Classification, Search & Retrieval of Audio Erling Wold, Thom Blum, Douglas Keislar, James Wheaton Presented By: Adelle C. Knight.
Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
1 Integration of Background Modeling and Object Tracking Yu-Ting Chen, Chu-Song Chen, Yi-Ping Hung IEEE ICME, 2006.
Communications & Multimedia Signal Processing Refinement in FTLP-HNM system for Speech Enhancement Qin Yan Communication & Multimedia Signal Processing.
CELLULAR COMMUNICATIONS DSP Intro. Signals: quantization and sampling.
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING MARCH 2010 Lan-Ying Yeh
Representing Acoustic Information
Harmonics, Timbre & The Frequency Domain
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Audio and Music Representations (Part 2) 1.
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
Topics covered in this chapter
Instrument Recognition in Polyphonic Music Jana Eggink Supervisor: Guy J. Brown University of Sheffield
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.
Multimedia Specification Design and Production 2013 / Semester 2 / week 3 Lecturer: Dr. Nikos Gazepidis
CSC361/661 Digital Media Spring 2002
Multiresolution STFT for Analysis and Processing of Audio
Local invariant features Cordelia Schmid INRIA, Grenoble.
Tracking with Unreliable Node Sequences Ziguo Zhong, Ting Zhu, Dan Wang and Tian He Computer Science and Engineering, University of Minnesota Infocom 2009.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
Polyphonic Music Transcription Using A Dynamic Graphical Model Barry Rafkind E6820 Speech and Audio Signal Processing Wednesday, March 9th, 2005.
Dan Rosenbaum Nir Muchtar Yoav Yosipovich Faculty member : Prof. Daniel LehmannIndustry Representative : Music Genome.
1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
09/19/2002 (C) University of Wisconsin 2002, CS 559 Last Time Color Quantization Dithering.
Extracting Melody Lines from Complex Audio Jana Eggink Supervisor: Guy J. Brown University of Sheffield {j.eggink
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Chapter 24 –Sound 24.3 –Sound, Perception and Music pp
Sound Notes 3 Frequency, Pitch and Music. Frequency Frequency – the number of complete waves ______ _____________. Different sounds have ____________.
Audio processing methods on marine mammal vocalizations Xanadu Halkias Laboratory for the Recognition and Organization of Speech and Audio
Audio Tempo Extraction Presenter: Simon de Leon Date: February 9, 2006 Course: MUMT611.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
Pitch Estimation by Enhanced Super Resolution determinator By Sunya Santananchai Chia-Ho Ling.
Autonomous Robots Vision © Manfred Huber 2014.
Chapter 2. Fourier Representation of Signals and Systems
Pitch Tracking MUMT 611 Philippe Zaborowski February 2005.
Automatic Transcription System of Kashino et al. MUMT 611 Doug Van Nort.
Query by Singing and Humming System
The Discrete Fourier Transform
Comparison of filters for burst detection M.-A. Bizouard on behalf of the LAL-Orsay group GWDAW 7 th IIAS-Kyoto 2002/12/19.
Audio Processing Mitch Parry. Resource! Sound Waves and Harmonic Motion.
1 Tempo Induction and Beat Tracking for Audio Signals MUMT 611, February 2005 Assignment 3 Paul Kolesnik.
PATTERN COMPARISON TECHNIQUES
CS 591 S1 – Computational Audio
Vocoders.
Computer Vision Lecture 16: Texture II
Sound & Sound Waves.
24.3 –Sound , Perception and Music pp
EE513 Audio Signals and Systems
Presenter: Simon de Leon Date: March 2, 2006 Course: MUMT611
Fourier Transform of Boundaries
Harmonically Informed Multi-pitch Tracking
Music Signal Processing
Presentation transcript:

GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Pitch Detection and Tracking Juhan Nam 1

Introduction Music is described with what? – The majority of musical symbols are notes which mainly contains pitch information – We (our brains) usually memorize music as a melody, that is, a sequence of pitches 2

Outlines Introduction – Definition of Pitch – Information in Pitch – Pitch and Harmonicity Pitch Detection Algorithms – Time-Domain Approaches – Frequency-Domain Approaches – Psychoacoustic Model Approaches – Learning-based Approaches Pitch Tracking Applications 3

Definition of Pitch Pitch – Defined as auditory attribute of sound according to which sounds can be ordered on a scale from low and high (ANSI, 1994) – One way of measuring pitch is finding the frequency of a sine wave that is matched to the target sound in a psychophysical experiment – thus, subject to individual persons: e.g. tone-deaf Fundamental Frequency – Physical attribute of sounds measured from periodicity – Often called F0 Thus, pitch should be discriminated from F0 – However, they are very close for sounds of our interest (i.e. musical sounds). So pitch is often used mixed with F0 4

Information in pitch Music – Melody or notes – Harmony (when there are multiple notes with different pitches ) – Size (or register) of musical instruments: Bass, Cello, violin Speech – Person: gender, age, identity – Context: question, mood, attitude, – Meaning: Chinese (Mandarin) Others – Vocalization of animals (e.g. bird’s chirp, whale): size and types, communication 5

Pitch and Harmonicity Not all sounds have pitch Harmonics sounds – Regularly spaced harmonic partials – Speech or Singing Voice: Vowel – Musical Instruments: Piano*, Guitar, Strings, Woodwind, Brass, Organ Non-harmonic sounds – No harmonic patterns or irregular harmonic partials – Speech or Singing Voice: Consonant – Musical Instruments: Drum, Mallet (has pitch but not harmonic) 6 *Inharmonicity in Piano Vibraphone [From Klapuri’s slides]

Pitch Detection Algorithms Taxonomy of Algorithms – Time-Domain Approaches – Frequency-Domain Approaches – Psychoacoustic Model Approaches – Learning-based Approaches 7

Time-Domain Approach Basic Ideas – Periodicity: x(t) = x(t+T) – Measure similarity (or distance) between two segments – Find the period (T) that gives the closest distance Two main approaches – Auto-correlation function (ACF): distance by inner product – Average magnitude difference function(AMDF): distance by difference (e.g., L1, L2 norm) 8 T

Auto-Correlation Function (ACF) Measuring self-similarity by 9 Singing Voice (Sondhi 1967)

Biased auto-correlation Unbiased auto-correlation Auto-Correlation Function (ACF) 10

Comparison of spectrogram and ACF 11 Spectrogram (tracking max values) ACF (tracking max values)

Interpretation of ACF in Frequency Domain By convolution theorem, auto-correlation can be computed in frequency domain and also efficiently using FFT Thus, the ACF can be computed as 12

Interpretation of ACF in Frequency Domain This is equivalent to ACF is a simple template-based approach in frequency domain – Positive weights for (harmonic) peaks and negative weights for valleys 13

Problems in ACF Bias to the large peak around zero lag Not robust to octave errors, particularly, lower octaves – ACF is sensitive to amplitude changes Equal weights for all harmonic partials – In general, low-numbered harmonic partials are more important in determining pitch 14

Average Magnitude Difference Function (AMDF) Measuring self-similarity by In YIN, p is set to 2 And the AMDF is normalized as 15 Minimize the negative ACF plus a lag-dependent term (de Cheveigné & Kawahara, 2002)

Average Magnitude Difference Function (AMDF) 16 AMDF Normalized AMDF

Why YIN (AMDF) works better 17 Robust to changes in amplitude – The difference takes care of amplitude changes. – This reduces octave errors. Zero-lag bias is avoided by the normalized AMDF The normalized AMDF allows using a fixed threshold – Can choose multiple candidates and refine peaks

Example of AMDF (YIN) 18

Frequency-Domain Approach Basic Ideas – Periodic in time domain  Harmonic in frequency domain – Measure how harmonic the spectrum – Find F0 that best explains the harmonic patterns (harmonic partials) Template matching – Harmonic Sieve or Spectral Template Cepstrum Harmonic-Product-Sum (HPS) 19

Harmonic Sieves (or Comb-filtering) Using sharp harmonic sieves to take peak regions only – ACF is similar to this but not sharp enough Sigmund~ (PD) and fiddle~ (MaxMSP) are based on weighted harmonics sieves 20 (Puckette et al. 1998)

Spectral Template Cross-correlation with an ideal template on a log-scale spectrogram 21 [From Ellis’ e4896 course slides]

Cepstrum Real Cepstrum is defined as Basic ideas – Harmonic partials are periodic in frequency domain – (Inverse) FFT find the the periodicity 22 Liftering (Noll, 1967)

Harmonic Product Sum (HPS) Harmonic Product Sum (HPS) is obtained by multiplying the original magnitude spectrum its decimated spectra by an integer number 23 (Noll, 1969)

Stabilize & Combine Auditory Model 24 input HC ACF Summary ACF Correlogram Correlogram is formed by concatenating the ACF of individual HC output Summary ACF is computed by summing the ACF across all channels – The peaks in the ACF represent periodicity features – This is known to be robust to band-limited noises

Example of Auditory Model 25 Summary ACF

Pitch Tracking Pitch is usually continuous over time – Once a pitch with strong harmonicity is detected on a frame, the following frames form smooth pitch contour Pitch tracking methods – Post processing: first detect pitch in a frame-by-frame manner and then find a continuous path by smoothing. Median Filtering Dynamic Programming (Talkin, 1995) – Probabilistic approach: detect multiple pitch candidates every frame and and find the best path Viterbi-decoding: Probabilistic YIN (Mauch, 2014) 26

Issues and Challenges Voice activity detection (VAD) / singing voice detection (SVAD) – Discriminate voice/unvoiced/silent frames Latency: real-time implementation – The use of long windows results in slight delay – Post-processing / Probabilistic approaches need larger delay Noisy environment – Learning-based approaches: NMF or classifiers – Active research topic Melody Transcription – Pre-dominant Pitch Detection – Singing Voice Separation + Pitch Tracking – active research topic Polyphonic Pitch 27

Musical Applications Sound Modification – Time-stretching using PSOLA – Auto-tune: pitch-correction or T-Pain effect Music Performance – Tuning musical instruments – Pitch-based sound control: e.g. fiddle~ – Score-following and auto-accompaniment Query-by humming – Relative pitch change might be more important Singing evaluation (e.g. karaoke) and visualization 28 Original (Variable) Time-Stretched (N. Bryan, 2012)

References A. de Cheveigné and H. Kawahara, “YIN, a Fundamental Frequency Estimator for Speech and Music, A. Noll, “Cepstrum Pitch Determination,” A. Noll, “Pitch Determination of Human Speech by the Harmonic Product Spectrum, the harmonic sum spectrum and a maximum likelihood estimate”, 1969 M. Puckette, T. Apel and D. Zicarelli, “Real-time audio analysis tools for Pd and MSP,” 1998 M. Sondhi,“New Methods of Pitch Extraction,” D. Talkin,“A Robust Algorithm for Pitch Tracking (RAPT),” M. Mauch and S. Dixon,“PYIN: A Fundamental Frequency Estimator Using Probabilistic Threshold Distributions,”