Note-level Music Transcription by Maximum Likelihood Sampling Zhiyao Duan ¹ & David Temperley ² 1.Department of Electrical and Computer Engineering 2.Eastman.

Slides:



Advertisements
Similar presentations
A Hierarchical Multiple Target Tracking Algorithm for Sensor Networks Songhwai Oh and Shankar Sastry EECS, Berkeley Nest Retreat, Jan
Advertisements

Biagio Di Micco17/07/ Radiative Phi Decays Meeting 1  Status of the work Biagio Di Micco Università degli Studi di Roma 3.
Monte Carlo Methods and Statistical Physics
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Phytoplankton absorption from ac-9 measurements Julia Uitz Ocean Optics 2004.
Efficient Cosmological Parameter Estimation with Hamiltonian Monte Carlo Amir Hajian Amir Hajian Cosmo06 – September 25, 2006 Astro-ph/
Bayesian Inference for Signal Detection Models of Recognition Memory Michael Lee Department of Cognitive Sciences University California Irvine
Enabling Access to Sound Archives through Integration, Enrichment and Retrieval Report about polyphonic music transcription.
Soundprism An Online System for Score-informed Source Separation of Music Audio Zhiyao Duan and Bryan Pardo EECS Dept., Northwestern Univ. Interactive.
1 Vertically Integrated Seismic Analysis Stuart Russell Computer Science Division, UC Berkeley Nimar Arora, Erik Sudderth, Nick Hay.
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Visual Recognition Tutorial
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Speaker Adaptation for Vowel Classification
Today Introduction to MCMC Particle filters and MCMC
Bootstrapping a Heteroscedastic Regression Model with Application to 3D Rigid Motion Evaluation Bogdan Matei Peter Meer Electrical and Computer Engineering.
Yi Wang, Bhaskar Krishnamachari, Qing Zhao, and Murali Annavaram 1 The Tradeoff between Energy Efficiency and User State Estimation Accuracy in Mobile.
Harmonically Informed Multi-pitch Tracking Zhiyao Duan, Jinyu Han and Bryan Pardo EECS Dept., Northwestern Univ. Interactive Audio Lab,
1 AUTOMATIC TRANSCRIPTION OF PIANO MUSIC - SARA CORFINI LANGUAGE AND INTELLIGENCE U N I V E R S I T Y O F P I S A DEPARTMENT OF COMPUTER SCIENCE Automatic.
1 CE 530 Molecular Simulation Lecture 7 David A. Kofke Department of Chemical Engineering SUNY Buffalo
Maximum likelihood estimation of relative transcript abundances Advanced bioinformatics 2012.
Instrument Recognition in Polyphonic Music Jana Eggink Supervisor: Guy J. Brown University of Sheffield
A Time Based Approach to Musical Pattern Discovery in Polyphonic Music Tamar Berman Graduate School of Library and Information Science University of Illinois.
Tracking by Sampling Trackers Junseok Kwon* and Kyoung Mu lee Computer Vision Lab. Dept. of EECS Seoul National University, Korea Homepage:
Stochastic Algorithms Some of the fastest known algorithms for certain tasks rely on chance Stochastic/Randomized Algorithms Two common variations – Monte.
Annealing Paths for the Evaluation of Topic Models James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine* *James.
Irakli Chakaberia Final Examination April 28, 2014.
Polyphonic Music Transcription Using A Dynamic Graphical Model Barry Rafkind E6820 Speech and Audio Signal Processing Wednesday, March 9th, 2005.
Stochastic Monte Carlo methods for non-linear statistical inverse problems Benjamin R. Herman Department of Electrical Engineering City College of New.
Probabilistic Reasoning for Robust Plan Execution Steve Schaffer, Brad Clement, Steve Chien Artificial Intelligence.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
Monte Carlo Methods Versatile methods for analyzing the behavior of some activity, plan or process that involves uncertainty.
Multi-hop-based Monte Carlo Localization for Mobile Sensor Networks
MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.
TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.
Rhythmic Transcription of MIDI Signals Carmine Casciato MUMT 611 Thursday, February 10, 2005.
1/27 Discrete and Genetic Algorithms in Bioinformatics 許聞廉 中央研究院資訊所.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Extracting Melody Lines from Complex Audio Jana Eggink Supervisor: Guy J. Brown University of Sheffield {j.eggink
Polyphonic Transcription Bruno Angeles McGill University - Schulich School of Music MUMT-621 Fall /14.
CS654: Digital Image Analysis Lecture 25: Hough Transform Slide credits: Guillermo Sapiro, Mubarak Shah, Derek Hoiem.
Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.
NEW POWER QUALITY INDICES Zbigniew LEONOWICZ Department of Electrical Engineering Wroclaw University of Technology, Poland The Seventh IASTED International.
Song-level Multi-pitch Tracking by Heavily Constrained Clustering Zhiyao Duan, Jinyu Han and Bryan Pardo EECS Dept., Northwestern Univ. Interactive Audio.
Stable Multi-Target Tracking in Real-Time Surveillance Video
Audio Tempo Extraction Presenter: Simon de Leon Date: February 9, 2006 Course: MUMT611.
Robust Object Tracking by Hierarchical Association of Detection Responses Present by fakewen.
1 AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Hong.
Michael Isard and Andrew Blake, IJCV 1998 Presented by Wen Li Department of Computer Science & Engineering Texas A&M University.
MCMC reconstruction of the 2 HE cascade events Dmitry Chirkin, UW Madison.
Presented by: Idan Aharoni
Tutorial I: Missing Value Analysis
The Overtone Series Derivation of Tonic Triad – Tonal Model Timbre
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
Zhiyao Duan, Changshui Zhang Department of Automation Tsinghua University, China Music, Mind and Cognition workshop.
Giuseppe Ruggiero CERN Straw Chamber WG meeting 07/02/2011 Spectrometer Reconstruction: Pattern recognition and Efficiency 07/02/ G.Ruggiero - Spectrometer.
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
Learning to analyse tonal music Pl á cido Rom á n Illescas David Rizo Jos é Manuel I ñ esta Pattern recognition and Artificial Intelligence group University.
Describing Arc Flash Incident Energy per Feeder Length in the Presence of Distributed Resources Tom R. Chambers, P.E. Power System Engineering, Inc. Madison,
Genre Classification of Music by Tonal Harmony Carlos Pérez-Sancho, David Rizo Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante,
Planck working group 2.1 diffuse component separation review Paris november 2005.
Automatic Transcription of Polyphonic Music
An Enhanced Support Vector Machine Model for Intrusion Detection
Term Project Presentation By: Keerthi C Nagaraj Dated: 30th April 2003
Data Analysis in Particle Physics
Parallelizing the Condensation Algorithm for Visual Tracking
Presenter: Simon de Leon Date: March 2, 2006 Course: MUMT611
Harmonically Informed Multi-pitch Tracking
ATLAS full run-2 luminosity combination
Presentation transcript:

Note-level Music Transcription by Maximum Likelihood Sampling Zhiyao Duan ¹ & David Temperley ² 1.Department of Electrical and Computer Engineering 2.Eastman School of Music University of Rochester Presentation at ISMIR 2014 Taipei, Taiwan October 28, 2014

Different Levels of Music Transcription Frame-level (multi-pitch estimation) –Estimate pitches and polyphony in each frame –Many methods Note-level (note tracking) –Estimate pitch, onset, offset of notes –Fewer methods Song-level (multi-pitch streaming) –Stream pitches by sources –Very few methods 2

Existing Note Tracking Methods Connect proximate frame-level pitch estimates –Misses in pitch estimates will cause fragmented notes –False alarms will generate spurious notes that are unreasonably short Fill gaps and prune short notes –Deals with notes individually, and does not consider interactions between different notes 3 Frame-level pitch estimates Ryynanen’05, Bello’06, Kameoka’07, Poliner’07, Lagrange’07, Chang’08, Raczynski’09, Dessein’10, Grindlay’11, Benetos’11, Grosche’12, etc.

Problems Contains many spurious notes caused by consistent MPE errors (usually octave/harmonic errors) Often violates instantaneous polyphony constraints 4 Ground-truth Results from the existing “connect-fill-prune” approach

Our Idea Consider interactions between notes A generation-evaluation strategy –Generate a number of transcription candidates –Evaluate each candidate on how well its notes explain the audio as a whole 5

Proposed System 6 Generate subsets as transcription candidates Evaluate candidates and select the best [Duan, Pardo, & Zhang, 2010]

Note Sampling Strategies What we want –Sampling space not too big –Only sample “good” notes –Diversity in transcription candidates –Candidates obey polyphony constraints 7 How to sample efficiently and effectively?

Note Sampling Algorithm 8

Note Likelihood Indicates how “good” the note is by itself –Also called “salience”, “activation”, “strength” Note likelihood = geometric mean of single- pitch-likelihood of pitches in the note –Multi-pitch estimation algorithms almost always estimate a likelihood (salience) for each pitch estimate 9

Candidate Evaluation 10

Single-pitch vs. Multi-pitch Likelihood Single-pitch likelihood (salience)  Note likelihood –E.g., total spectral energy at its harmonic positions –Describes how well a pitch fits in the audio individually A correct pitch usually has a high likelihood Octave/harmonic errors may also have high likelihood Multi-pitch likelihood  Transcription likelihood –Defined as the match between spectral peaks and harmonics of all pitches –Describes how well a set of pitches explain the audio as a whole Octave/harmonic relations would not improve likelihood much 11

An Example Pitch candidateC3C4E4 Log single-pitch- likelihood Pitch set candidate{C3}{C3, C4}{C3, E4} Log multi-pitch- likelihood Trombone: C3 Violin: E4 12 Higher value is better

Experiments Bach10 dataset: 110 polyphonic combinations derived from 10 pieces of 4-part J.S. Bach chorales, played by violin, clarinet, saxophone, and bassoon –60 duets, 40 trios, 10 quartets Comparison methods –Benetos13: shift-invariant PLCA (frame-level) + median filtering of pitch activity matrix (note-level) –Klapuri06: iterative spectral subtraction (frame-level) + our preliminary note tracking (note-level) 13

Performance Measures 14

Comparison with state of the art 15

Works with state of the art 16

Example 17

Conclusions A new method for note-level transcription, considering note interactions –Generate transcription candidates by sampling notes according to note length and note likelihood, derived from single-pitch likelihood –Evaluate candidates according to transcription likelihood, derived from multi-pitch likelihood Good performance against state of the art Can work with any MPE or note tracking algorithm, as long as single-pitch likelihood (salience) is calculated 18

Limitations and Future Work Only removes spurious notes, but can’t add back missed notes Different runs of sampling are independent A better sampling technique –E.g., Using Markov Chain Monte Carlo to add back missed notes and to consider dependencies between different runs of sampling A better evaluation technique –E.g., considering musical knowledge to evaluate the “musical plausibility” of transcription candidates 19