Piano Music Transcription Wes “Crusher” Hatch MUMT-614 Thurs., Feb.13.

Slides:



Advertisements
Similar presentations
An Approach in Reproducing the Auto-Tune Effect Mentees: Dong-San Choi & Tejas Rawal Mentor: David Jun.
Advertisements

Source separation and analysis of piano music signals using instrument-specific sinusoidal model Wai Man SZETO and Kin Hong WONG
Entropy and Dynamism Criteria for Voice Quality Classification Applications Authors: Peter D. Kukharchik, Igor E. Kheidorov, Hanna M. Lukashevich, Denis.
Multipitch Tracking for Noisy Speech
Computational Rhythm and Beat Analysis Nick Berkner.
The evaluation and optimisation of multiresolution FFT Parameters For use in automatic music transcription algorithms.
An Overview of Pitch Detection Algorithms Alexandre Savard MUMT611: Music Information Acquisition, Preservation, and Retrieval February 2006.
Overview of Real-Time Pitch Tracking Approaches Music information retrieval seminar McGill University Francois Thibault.
Enabling Access to Sound Archives through Integration, Enrichment and Retrieval Report about polyphonic music transcription.
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
tracking beat tracking beat Mcgill university :: music technology :: mumt 611>>
Introduction The aim the project is to analyse non real time EEG (Electroencephalogram) signal using different mathematical models in Matlab to predict.
Automatic Lip- Synchronization Using Linear Prediction of Speech Christopher Kohnert SK Semwal University of Colorado, Colorado Springs.
1 Machine learning for note onset detection. Alexandre Lacoste & Douglas Eck.
Evaluation of the Audio Beat Tracking System BeatRoot By Simon Dixon (JNMR 2007) Presentation by Yading Song Centre for Digital Music
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
Chord Recognition EE6820 Speech and Audio Signal Processing and Recognition Mid-term Presentation JunHao Ip.
Pitch Recognition with Wavelets Final Presentation by Stephen Geiger.
Machine Learning Neural Networks
Seneff’s Auditory Model Miriam Cordero Ruiz (SONY Advanced Technology Center Stuttgart) Leuven, july 2002.
Learning to Align Polyphonic Music. Slide 1 Learning to Align Polyphonic Music Shai Shalev-Shwartz Hebrew University, Jerusalem Joint work with Yoram.
Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration.
1 Manipulating Digital Audio. 2 Digital Manipulation  Extremely powerful manipulation techniques  Cut and paste  Filtering  Frequency domain manipulation.
Music Processing Roger B. Dannenberg. Overview  Music Representation  MIDI and Synthesizers  Synthesis Techniques  Music Understanding.
/14 Automated Transcription of Polyphonic Piano Music A Brief Literature Review Catherine Lai MUMT-611 MIR February 17,
Chapter 7 Artificial Neural Networks
Harmonics, Timbre & The Frequency Domain
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Audio and Music Representations (Part 2) 1.
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
Instrument Recognition in Polyphonic Music Jana Eggink Supervisor: Guy J. Brown University of Sheffield
Educational Software using Audio to Score Alignment Antoine Gomas supervised by Dr. Tim Collins & Pr. Corinne Mailhes 7 th of September, 2007.
R ESEARCH BY E LAINE C HEW AND C HING -H UA C HUAN U NIVERSITY OF S OUTHERN C ALIFORNIA P RESENTATION BY S EAN S WEENEY D IGI P EN I NSTITUTE OF T ECHNOLOGY.
LECTURE Copyright  1998, Texas Instruments Incorporated All Rights Reserved Encoding of Waveforms Encoding of Waveforms to Compress Information.
The Wavelet Tutorial Dr. Charturong Tantibundhit.
By Sarita Jondhale1 Signal Processing And Analysis Methods For Speech Recognition.
Polyphonic Music Transcription Using A Dynamic Graphical Model Barry Rafkind E6820 Speech and Audio Signal Processing Wednesday, March 9th, 2005.
Jacob Zurasky ECE5526 – Spring 2011
Using Blackboard Systems for Polyphonic Transcription A Literature Review by Cory McKay.
Rhythmic Transcription of MIDI Signals Carmine Casciato MUMT 611 Thursday, February 10, 2005.
Overview of Part I, CMSC5707 Advanced Topics in Artificial Intelligence KH Wong (6 weeks) Audio signal processing – Signals in time & frequency domains.
Basics of Neural Networks Neural Network Topologies.
Learning to perceive how hand-written digits were drawn Geoffrey Hinton Canadian Institute for Advanced Research and University of Toronto.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Polyphonic Transcription Bruno Angeles McGill University - Schulich School of Music MUMT-621 Fall /14.
Gammachirp Auditory Filter
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
Akram Bitar and Larry Manevitz Department of Computer Science
Audio Tempo Extraction Presenter: Simon de Leon Date: February 9, 2006 Course: MUMT611.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Performance Comparison of Speaker and Emotion Recognition
More On Linear Predictive Analysis
Introduction to psycho-acoustics: Some basic auditory attributes For audio demonstrations, click on any loudspeaker icons you see....
Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)
Automatic Transcription System of Kashino et al. MUMT 611 Doug Van Nort.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
1 Hidden Markov Model: Overview and Applications in MIR MUMT 611, March 2005 Paul Kolesnik MUMT 611, March 2005 Paul Kolesnik.
Artificial Neural Networks (ANN). Artificial Neural Networks First proposed in 1940s as an attempt to simulate the human brain’s cognitive learning processes.
Speech Segregation Based on Oscillatory Correlation DeLiang Wang The Ohio State University.
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
1 Tempo Induction and Beat Tracking for Audio Signals MUMT 611, February 2005 Assignment 3 Paul Kolesnik.
Automatic Transcription of Polyphonic Music
Rhythmic Transcription of MIDI Signals
Catherine Lai MUMT-611 MIR February 17, 2005
Information-Theoretic Listening
Term Project Presentation By: Keerthi C Nagaraj Dated: 30th April 2003
Advances in Deep Audio and Audio-Visual Processing
Realtime Recognition of Orchestral Instruments
Music Signal Processing
Akram Bitar and Larry Manevitz Department of Computer Science
Presentation transcript:

Piano Music Transcription Wes “Crusher” Hatch MUMT-614 Thurs., Feb.13

Introduction Polyphonic pitch extraction Want to realize “computational scene analysis” (Klaburi) Problem is comparable to speech recognition

Current State of affairs Many different approaches –Nothing is 100% reliable, or even 90%…or 80%… –Drawback: no one heuristic means that no one is building on, or learning from, previous work and experience

Parameters to extract Pitch Amplitude Onset and duration Do NOT require: –Spatial location –timbre

Benefits of knowing timbre Can assume a piano sound for input, and: –Simplifies things down the road –Don’t need to calculate a “sound source model of an instrument” (Marolt) –Can make assumptions about strengths of various partials (Martin) –Makes other techniques possible (eg. differential spectrum analysis, Hawley)

Recent developments A few techniques are gaining prominence: –Blackboard systems (Bello, Monti & Sandler, Martin) –Neural networks –Pitch perception models based on human audition (gammatone filterbank front-end) –To a lesser extent: Hidden Markov models

Benefits of Blackboards Can incorporate all previous approaches, and methodologies Top-down or bottom-up Easily expandable –Can be easily updated to accommodate new technology

A very general heuristic Front-end Analysis, representation, pitch hypothesis’ Top-down processes, (which in turn effects front-end analysis and pitch guesses) Transcribed notes out (Guido, MIDI, etc.)

Commonalities between systems transform data into freq. representation –STFT & tracking phase vocoder (Dixon) –Sinusoid tracks (Martin) –Gammatone filterbank (Marolt, Martin) Top-down organization System has the ability to learn –Neural nets (Marolt, Bello) –HMM (Raphael) –“timbre adaption” (Dixon--soon)

Top-down is super Bottom-up: analysis --> note hypothesis’ –Unidirectional –Doesn’t know about past analysis’, only concern is hierarchal flow of data –inflexible Top-down: high --> low level –Different levels of the system are determined by predictive models and previous knowledge –Implemented by neural nets, blackboard system

Happy schematic Low level --> mid-level --> high level

Front-end techniques Sinusoidal –STFT Constant frequency spacing means better resolution in high freq.’s, poorer resolution in low freq. range – tracking phase vocoder –Sinusoid track Track continuous regions of local energy maxima in time-frequency domain (eg. Dixon)

Front-end techniques, cont. Correllation –Try to model human audition Constant Q: mimics log. resolution of human ear –Gammatone filterbank output of each filter then processed by a model of “inner hair cell” dynamics Further analysis by short-time auto-correllation Variable filter widths; filters generally implemented across ~ Hz –Same problems as found in scene analysis

Onset detection Neural nets –Differences between 6 ms and 18 ms amplitude envelopes (Martolt) Change in high frequency content (Bello) Zero-lag correlation for each filterbank channel –Running estimate of energy (Martin)

Analysis & pitch hypothesis’ Blackboards –contain a variety of KS’ Neural nets “fuzzy logic” –May contain front-end processing, or may be fed results thereof –Can be used for entire process (front-end, data representation, pitch hypothesis’) or just to tabulate pitch guesses at the end

Analysis & pitch hypothesis’ Peak-picking together w/phase spectrum (helps to resolve low freq. uncertainties) –“atoms of energy localized in time and frequency” (Dixon) HMM Neural nets (note, chord recognizers) –trained to look for one given note (eg. C4) –Can also be a KS in blackboard system

Pitfalls Octave errors: most common error source Some solutions: –“feedback to provide inhibition from the output of the note recognition stage to its input” (Martolt) –Instrumental models (have knowledge about strengths of various partials--”spectral shape”) –Apply general musical knowledge (voice leading rules, harmony & counterpoint, etc.) (Kashino)

Different systems’ results Dixon:70-80% correct SONIC (Martolt):80-95% correct, (13-25% extra notes) Monti & Sandler:74% correct Raphael:39% wrong/missed Bello, Martin:no data available

Conclusions Exponentially more difficult than monophonic transcriptions Are slowly approaching very good, robust systems –Compare to Moore, 1975 –Very few restrictions in the input data Top-level organizations are key –Blackboards, neural networks