Presentation is loading. Please wait.

Presentation is loading. Please wait.

Note-level Music Transcription by Maximum Likelihood Sampling Zhiyao Duan ¹ & David Temperley ² 1.Department of Electrical and Computer Engineering 2.Eastman.

Similar presentations


Presentation on theme: "Note-level Music Transcription by Maximum Likelihood Sampling Zhiyao Duan ¹ & David Temperley ² 1.Department of Electrical and Computer Engineering 2.Eastman."— Presentation transcript:

1

2 Note-level Music Transcription by Maximum Likelihood Sampling Zhiyao Duan ¹ & David Temperley ² 1.Department of Electrical and Computer Engineering 2.Eastman School of Music University of Rochester Presentation at ISMIR 2014 Taipei, Taiwan October 28, 2014

3 Different Levels of Music Transcription Frame-level (multi-pitch estimation) –Estimate pitches and polyphony in each frame –Many methods Note-level (note tracking) –Estimate pitch, onset, offset of notes –Fewer methods Song-level (multi-pitch streaming) –Stream pitches by sources –Very few methods 2

4 Existing Note Tracking Methods Connect proximate frame-level pitch estimates –Misses in pitch estimates will cause fragmented notes –False alarms will generate spurious notes that are unreasonably short Fill gaps and prune short notes –Deals with notes individually, and does not consider interactions between different notes 3 Frame-level pitch estimates Ryynanen’05, Bello’06, Kameoka’07, Poliner’07, Lagrange’07, Chang’08, Raczynski’09, Dessein’10, Grindlay’11, Benetos’11, Grosche’12, etc.

5 Problems Contains many spurious notes caused by consistent MPE errors (usually octave/harmonic errors) Often violates instantaneous polyphony constraints 4 Ground-truth Results from the existing “connect-fill-prune” approach

6 Our Idea Consider interactions between notes A generation-evaluation strategy –Generate a number of transcription candidates –Evaluate each candidate on how well its notes explain the audio as a whole 5

7 Proposed System 6 Generate subsets as transcription candidates Evaluate candidates and select the best [Duan, Pardo, & Zhang, 2010]

8 Note Sampling Strategies What we want –Sampling space not too big –Only sample “good” notes –Diversity in transcription candidates –Candidates obey polyphony constraints 7 How to sample efficiently and effectively?

9 Note Sampling Algorithm 8

10 Note Likelihood Indicates how “good” the note is by itself –Also called “salience”, “activation”, “strength” Note likelihood = geometric mean of single- pitch-likelihood of pitches in the note –Multi-pitch estimation algorithms almost always estimate a likelihood (salience) for each pitch estimate 9

11 Candidate Evaluation 10

12 Single-pitch vs. Multi-pitch Likelihood Single-pitch likelihood (salience)  Note likelihood –E.g., total spectral energy at its harmonic positions –Describes how well a pitch fits in the audio individually A correct pitch usually has a high likelihood Octave/harmonic errors may also have high likelihood Multi-pitch likelihood  Transcription likelihood –Defined as the match between spectral peaks and harmonics of all pitches –Describes how well a set of pitches explain the audio as a whole Octave/harmonic relations would not improve likelihood much 11

13 An Example Pitch candidateC3C4E4 Log single-pitch- likelihood -338.8-466.9-475 Pitch set candidate{C3}{C3, C4}{C3, E4} Log multi-pitch- likelihood -338.8-346.2-318.9 Trombone: C3 Violin: E4 12 Higher value is better

14 Experiments Bach10 dataset: 110 polyphonic combinations derived from 10 pieces of 4-part J.S. Bach chorales, played by violin, clarinet, saxophone, and bassoon –60 duets, 40 trios, 10 quartets Comparison methods –Benetos13: shift-invariant PLCA (frame-level) + median filtering of pitch activity matrix (note-level) –Klapuri06: iterative spectral subtraction (frame-level) + our preliminary note tracking (note-level) 13

15 Performance Measures 14

16 Comparison with state of the art 15

17 Works with state of the art 16

18 Example 17

19 Conclusions A new method for note-level transcription, considering note interactions –Generate transcription candidates by sampling notes according to note length and note likelihood, derived from single-pitch likelihood –Evaluate candidates according to transcription likelihood, derived from multi-pitch likelihood Good performance against state of the art Can work with any MPE or note tracking algorithm, as long as single-pitch likelihood (salience) is calculated 18

20 Limitations and Future Work Only removes spurious notes, but can’t add back missed notes Different runs of sampling are independent A better sampling technique –E.g., Using Markov Chain Monte Carlo to add back missed notes and to consider dependencies between different runs of sampling A better evaluation technique –E.g., considering musical knowledge to evaluate the “musical plausibility” of transcription candidates 19

21


Download ppt "Note-level Music Transcription by Maximum Likelihood Sampling Zhiyao Duan ¹ & David Temperley ² 1.Department of Electrical and Computer Engineering 2.Eastman."

Similar presentations


Ads by Google