Polyphonic Music Transcription Using A Dynamic Graphical Model Barry Rafkind E6820 Speech and Audio Signal Processing Wednesday, March 9th, 2005.

Polyphonic Music Transcription Using A Dynamic Graphical Model Barry Rafkind E6820 Speech and Audio Signal Processing Wednesday, March 9th, 2005

Presentation Outline Motivating Music Transcription My Project Proposal Project Timeline

Motivating Music Transcription Given a musical recording, we wish to obtain a MIDI score for : –Performance (convert MIDI to a music score) –Analysis (evaluate intonation or number of missed or incorrect notes - useful for music education) –Comparison with other music (copyright infringement / search) –Replay on MIDI synthesizers (use different instruments / change settings / overlay tracks, etc...)

Recent Previous Work Multi-Instrument Musical Transcription Using A Dynamic Graphical Model, Michael Jordan, 2004 Automatic Transcription of Piano Music, Christopher Raphael, 2002, Univ. of Massachusetts, Amherst Polyphonic Pitch Extraction, Graham Poliner, E6820 Speech & Audio Signal Processing, Spring 2004 Many, many, many more…. Try searching Google for PDF documents with keywords : music transcription

My Project Proposal Jordan presents a multi-instrument transcription system capable of listening to a recording in which two or more instruments are playing, and identifying both the notes that were played and the instruments that played them. The system models two musical instruments, each capable of playing at most one note at a time. My Goal : implement and improve upon Jordan’s Dynamic Graphical Model (DGM) approach. Whereas he made assumptions about how to model each instrument, I want to let the system learn what to look for by starting with a general model. Jordan uses a reduced set of states and parameters for efficiency. Try to use a larger model if possible.

Dynamic Graphical Model (DGM) - what is it? My Project Proposal Hidden State Variables Correspond to Discrete Set of Allowable Intensity and Pitch Values

Key Points in Jordan’s Approach –Use of a note-event timbre model that includes both a spectral model (in frequency) and a dynamic intensity versus time model (or a “time envelope model”). –We will perform inference (using the Viterbi Algorithm) on the DGM to compute the path of maximum posterior probability to find explicit note-on events. (note locations) My Project Proposal

Intensity Transition Model for Violin My Project Proposal

Intensity Transition Model for Piano My Project Proposal

General Intensity Transition Model My Project Proposal

Pitch Transition Model Build a pitch state conditional probability distribution as a function of both the previous pitch state and the previous intensity state. Transition probabilities are also based on Shephard's pitch helix : defines psycho- acoustic distance between pitches. My Project Proposal

Observation Model - explains the sound Model the spectrum of a harmonic musical signal as a series of narrow bump functions that are harmonically spaced. That is, conditional on the fundamental frequency Pitch(t) of the musical signal, we model the spectrum as consisting of a series of bump functions located at integer multiples of Pitch(t). Each bump function is given a scale parameter alpha(n) that can depend on Pitch(t). The motivation for this is that the relative spectral content of an instrument can depend on what pitch is being played. The intensity envelope at time t scales all of the harmonics. My Project Proposal

Observation Model Model the spectrum of a harmonic musical signal as a series of narrow bump functions that are harmonically spaced. That is, conditional on the fundamental frequency Pitch(t) of the musical signal, we model the spectrum as consisting of a series of bump functions located at integer multiples of Pitch(t). Each bump function is given a scale parameter alpha(n) that can depend on Pitch(t). The motivation for this is that the relative spectral content of an instrument can depend on what pitch is being played. The intensity envelope at time t scales all of the harmonics. My Project Proposal

Evalution Metrics Note Error Rate (based on “minimum edit distance” in speech) = 100 x ( Insertions + Substitutions + Deletions ) / Total Number of Notes in Score. We want to minimize this. Dixon Success Score = 100 x (Correct Notes / ( Correct + False Positives + Deletions ). We want to maximize this. My Project Proposal

Seven Weeks Left 3/14 - Collect MIDI Data + Convert to WAV Audio, Understand DGM 3/21 - Start building / understanding graphical models 3/28 - Continue building / understanding graphical models 4/04 - Finish building / understanding graphical models 4/11 - Evaluate Results / Fix Bugs 4/18 - Try new data / Fix bugs. Begin Preparing Final Presentation. 4/25 - Finish Preparing Final Presentation 4/27 - Final Presentation in Class

Polyphonic Music Transcription Using A Dynamic Graphical Model Barry Rafkind E6820 Speech and Audio Signal Processing Wednesday, March 9th, 2005.

Similar presentations

Presentation on theme: "Polyphonic Music Transcription Using A Dynamic Graphical Model Barry Rafkind E6820 Speech and Audio Signal Processing Wednesday, March 9th, 2005."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Polyphonic Music Transcription Using A Dynamic Graphical Model Barry Rafkind E6820 Speech and Audio Signal Processing Wednesday, March 9th, 2005.

Similar presentations

Presentation on theme: "Polyphonic Music Transcription Using A Dynamic Graphical Model Barry Rafkind E6820 Speech and Audio Signal Processing Wednesday, March 9th, 2005."— Presentation transcript:

Similar presentations

About project

Feedback