On Recognizing Music Using HMM Following the path craved by Speech Recognition Pioneers
Outline Aim of this project HMM Speech Recognition Paradigm Structure of musical tones Designing a HMM based Music Recognizer using HTK
Aim of this project Recognize different types of steady state musical instrument Piano,Guitar, Flute, Trumpet (String and Wind) Not Drums, Cymbals, Gongs (Percussion) Design this recognizer based on methods used in Speech Recognition
HMM Speech Recognition Paradigm Different types of systems Isolated word based Phoneme based Discrete or continuous Feature Analysis Options Linear Prediction Analysis Filterbank Analysis HMM topology definition Initialization and training of the HMM Recognition and Evaluation
Types of systems Phoneme based recognizer A set of sounds that is sufficient to compose speech in a language, each modeled using a HMM Not relevant to music Isolated word based recognizer Each vocabulary is modeled using a HMM We treat each instrument as a music vocabulary, and hope to recognize it
Discrete or Continuous System Concerns the visible observations emitted by an HMM - discrete symbols or continuous signals? Continuous Model The emitting state follows a probability density function so as to capture the details of a signal Discrete model The emitted observations are limited into a set of distinct symbols
Feature Analysis Linear Prediction Analysis A transfer function that models the shape of the vocal tract Models how voice is produced Filterbank Analysis Use Fourier Transfer to decompose waves into sine wave components Similar to the mechanism of the cochlea in the ear Models how voice are heard
Initialize and Training the HMM Viterbi Algorithm Use it to generate the MPE (Most Probable Explanation) of a training sound, we can find which vector belongs to which state Update the observation probability distribution with attributes of the vector and the state transition matrix by counting frequency of vectors being in a state Repeat until converge Baum-Welch Algorithm Use it to find the probability of vector belongs to a state Do not give a definite answer but will smooth the transition between states Repeat until converge
Design and Justification of the HMM Music Recognizer Structure of musical tones Simpler Structure Model information to consider Design Modeled on he isolated word based system Semi-Continuous System: Tied Mixture System Filterbank Analysis Increase the number of dimension in feature analysis (typically 13 in speech) Left-right 2 state HMM Training is the same as in speech
Results Implementation Planned to do the above model using HTK Cannot find enough training sample (need $$$ to buy them) Pending Questions What should be the dimension size in feature analysis The 2 state model is very coarse, what is a good HMM structure Automatic structure learning
Summary Outlined the HMM Speech Recognition Paradigm Outlined a feasible method of how music can be recognized based on this technique Outlined further questions
THANK YOU! Q & A