Using Blackboard Systems for Polyphonic Transcription A Literature Review by Cory McKay.

Slides:



Advertisements
Similar presentations
ARCHITECTURES FOR ARTIFICIAL INTELLIGENCE SYSTEMS
Advertisements

Boyce/DiPrima 9th ed, Ch 2.8: The Existence and Uniqueness Theorem Elementary Differential Equations and Boundary Value Problems, 9th edition, by William.
Auditory scene analysis 2
Neural Networks  A neural network is a network of simulated neurons that can be used to recognize instances of patterns. NNs learn by searching through.
Multipitch Tracking for Noisy Speech
Expert System Shells - Examples
Pitch Perception.
Tetsuro Kitahara* Masataka Goto** Hiroshi G. Okuno*
1 Machine learning for note onset detection. Alexandre Lacoste & Douglas Eck.
Auditory Scene Analysis (ASA). Auditory Demonstrations Albert S. Bregman / Pierre A. Ahad “Demonstration of Auditory Scene Analysis, The perceptual Organisation.
Overview What : Stroke type Transformation: Timbre Rhythm When: Stroke timing Resynthesis.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Simple Neural Nets For Pattern Classification
Timbre (pronounced like: Tamber) pure tones are very rare a single note on a musical instrument is a superposition (i.e. several things one on top of.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
On Timbre Phy103 Physics of Music. Four complex tones in which all partials have been removed by filtering (Butler Example 2.5) One is a French horn,
Algorithms and Problem Solving-1 Algorithms and Problem Solving.
Un Supervised Learning & Self Organizing Maps Learning From Examples
Learning to Align Polyphonic Music. Slide 1 Learning to Align Polyphonic Music Shai Shalev-Shwartz Hebrew University, Jerusalem Joint work with Yoram.
Marakas: Decision Support Systems, 2nd Edition © 2003, Prentice-Hall Chapter Chapter 7: Expert Systems and Artificial Intelligence Decision Support.
What are harmonics? Superposition of two (or more) frequencies yields a complex wave with a fundamental frequency.
CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.
Radial Basis Function Networks
/14 Automated Transcription of Polyphonic Piano Music A Brief Literature Review Catherine Lai MUMT-611 MIR February 17,
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Harmonically Informed Multi-pitch Tracking Zhiyao Duan, Jinyu Han and Bryan Pardo EECS Dept., Northwestern Univ. Interactive Audio Lab,
Digital Sound and Video Chapter 10, Exploring the Digital Domain.
Polyphonic Queries A Review of Recent Research by Cory Mckay.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Beats and Tuning Pitch recognition Physics of Music PHY103.
Presented by Tienwei Tsai July, 2005
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
INT-Evry (Masters IT– Soft Eng)IntegrationTesting.1 (OO) Integration Testing What: Integration testing is a phase of software testing in which.
 The most intelligent device - “Human Brain”.  The machine that revolutionized the whole world – “computer”.  Inefficiencies of the computer has lead.
Outline What Neural Networks are and why they are desirable Historical background Applications Strengths neural networks and advantages Status N.N and.
Polyphonic Music Transcription Using A Dynamic Graphical Model Barry Rafkind E6820 Speech and Audio Signal Processing Wednesday, March 9th, 2005.
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
3. Rough set extensions  In the rough set literature, several extensions have been developed that attempt to handle better the uncertainty present in.
TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.
DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.
Extracting Melody Lines from Complex Audio Jana Eggink Supervisor: Guy J. Brown University of Sheffield {j.eggink
Polyphonic Transcription Bruno Angeles McGill University - Schulich School of Music MUMT-621 Fall /14.
1 CMSC 671 Fall 2001 Class #25-26 – Tuesday, November 27 / Thursday, November 29.
Song-level Multi-pitch Tracking by Heavily Constrained Clustering Zhiyao Duan, Jinyu Han and Bryan Pardo EECS Dept., Northwestern Univ. Interactive Audio.
Simultaneously Learning and Filtering Juan F. Mancilla-Caceres CS498EA - Fall 2011 Some slides from Connecting Learning and Logic, Eyal Amir 2006.
Uncertainty Management in Rule-based Expert Systems
Performance Comparison of Speaker and Emotion Recognition
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
MSc Project Musical Instrument Identification System MIIS Xiang LI ee05m216 Supervisor: Mark Plumbley.
Issues in Automatic Musical Genre Classification Cory McKay.
Automatic Transcription System of Kashino et al. MUMT 611 Doug Van Nort.
Timbre and Memory An experiment for the musical mind Emily Yang Yu Music 151, 2008.
1 Automatic Music Style Recognition Arturo Camacho.
Piano Music Transcription Wes “Crusher” Hatch MUMT-614 Thurs., Feb.13.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
A Presentation on Adaptive Neuro-Fuzzy Inference System using Particle Swarm Optimization and it’s Application By Sumanta Kundu (En.R.No.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
1 Tempo Induction and Beat Tracking for Audio Signals MUMT 611, February 2005 Assignment 3 Paul Kolesnik.
Automatic Transcription of Polyphonic Music
Lecture 7: Constrained Conditional Models
Fall 2004 Backpropagation CS478 - Machine Learning.
Catherine Lai MUMT-611 MIR February 17, 2005
Neural Networks A neural network is a network of simulated neurons that can be used to recognize instances of patterns. NNs learn by searching through.
Term Project Presentation By: Keerthi C Nagaraj Dated: 30th April 2003
Quantum One.
of the Artificial Neural Networks.
M. Kezunovic (P.I.) S. S. Luo D. Ristanovic Texas A&M University
Harmonically Informed Multi-pitch Tracking
Presentation transcript:

Using Blackboard Systems for Polyphonic Transcription A Literature Review by Cory McKay

Outline Intro to polyphonic transcription Intro to blackboard systems Keith Martin’s work Kunio Kashino’s work Recent contributions Conclusion

Polyphonic Transcription Represent an audio signal as a score Must segregate notes belonging to different voices Problems: variations of timbre within a voice, voice crossing, identification of correct octave No successful general purpose system to date

Polyphonic Transcription Can use simplified models: –Music for a single instrument (e.g. piano) –Extract only a given instrument from mix –Use music which obeys restrictive rules Simplified systems have had success rates of between 80% and 90% These rates may be exaggerated, since only very limited testing suites generally used

Polyphonic Transcription Systems to date generally identify only rhythm, pitch and voice Would like systems that also identify other notated aspects such as dynamics and vibrato Ideal is to have system that can identify and understand parameters of music that humans hear but do not notate

Blackboard Systems Used in AI for decades but only applied to music transcription in early 1990’s Term “blackboard” comes from notion of a group of experts standing around a blackboard working together to solve a problem Each expert writes contributions on blackboard Experts watch problem evolve on blackboard, making changes until a solution is reached

Blackboard Systems “Blackboard” is a central dataspace Usually arranged in hierarchy so that input is at lowest level and output is at highest “Experts” are called “knowledge sources” KSs generally consist of a set of heuristics and a precondition whose satisfaction results in a hypothesis that is written on blackboard Each KS forms hypotheses based on information from front end of system and hypotheses presented by other KSs

Blackboard Systems Problem is solved when all KSs are satisfied with all hypotheses on blackboard to within a given margin of error Eliminates need for global control module Each KS can be easily updated and new KSs can be added with little difficulty Combines top-down and bottom-up processing

Blackboard Systems Music has a naturally hierarchal structure that lends itself well to blackboard systems Allow integration of different types of expertise: –signal processing KSs at low level –human perception KSs at middle level –musical knowledge KSs at upper level

Blackboard Systems Limitation: giving upper level KSs too much specialized knowledge and influence limits generality of transcription systems Ideal system would not use knowledge above the level of human perception and the most rudimentary understanding of music Current trend is to increase significance of upper- level musical KSs in order to increase success rate

Keith Martin (1996 a) “A Blackboard System for Automatic Transcription of Simple Polyphonic Music” Used a blackboard system to transcribe a four- voice Bach chorale with appropriate segregation of voices Limited input signal to synthesized piano performances Gave system only rudimentary musical knowledge, although choice of Bach chorale allowed the use of generally unacceptable assumptions by lower level KSs

Keith Martin (1996 a) Front-end system used short-time Fourier transform on input signal Equivalent to a filter bank that is a gross approximation the way the human cochlea processes auditory signals Blackboard system fed sets of associated onset times, frequencies and amplitudes

Keith Martin (1996 a) Knowledge sources made five classes of hierarchally organized hypotheses: –“Tracks” –Partials –Notes –Intervals –Chords

Keith Martin (1996 a) Three types of knowledge sources: –Garbage collection –Physics –Musical practice Thirteen knowledge sources in all Each KS only authourized to make certain classes of hypotheses

Keith Martin (1996 a) KSs with access to upper-level hypotheses can put “pressure” on KSs with lower-level access to make certain hypotheses and vice versa Example: if the hypotheses have been made that the notes C and G are present in a beat, a KS with information about chords might put forward the hypothesis that there is a C chord, thus putting pressure on other KSs to find an E or Eb. Used a sequential scheduler to coordinate KSs

Keith Martin (1996 b) “Automatic Transcription of Simple Polyphonic Music: Robust Front End Processing” Previous system often misidentified octaves Attempted to improve performance by shifting octave identification task from a top-down process to a bottom-up process

Keith Martin (1996 b) Proposes the use of log-lag correlograms in front end Models the inner hair cells in the cochlea with a bank of filters Determines pitch by measuring the periodic energy in each filter channel as a function of lag Correlograms now basic unit fed to blackboard system No definitive results as to which approach is better

Kashino, Nadaki, Kinoshita and Tanaka (1995) “Application of Bayesian Probability Networks to Music Scene Analysis” Work slightly preceded that of Martin Used test patterns involving more than one instrument Uses principles of stream segregation from auditory scene analysis Implements more high-level musical knowledge Uses Bayesian network instead of Martin’s simple scheduler to coordinate KSs

Kashino, Nadaki, Kinoshita and Tanaka (1995) Knowledge sources used: –Chord transition dictionary –Chord-note relation –Chord naming rules –Tone memory –Timbre models –Human perception rules Used very specific instrument timbres and musical rules, so has limited general applicability

Kashino, Nadaki, Kinoshita and Tanaka (1995) Tone memory: frequency components of different instruments played with different parameters Found that the integration of tone memory with the other KSs greatly improved success rates

Kashino, Nadaki, Kinoshita and Tanaka (1995) Bayesian networks well known for finding good solutions despite noisy input or missing data Often used in implementing learning methods that trade off prior belief in a hypothesis against its agreement with current data Therefore seem to be a good choice for coordinating KSs

Kashino, Nadaki, Kinoshita and Tanaka (1995) No experimental comparisons of this approach and Martin’s simple scheduler Only used simple test patterns rather than real music

Kashino and Hagita (1996) “A Music Scene Analysis System with the MRF- Based Information Integration Scheme” Suggests replacing Bayesian networks with Markov Random Field hypothesis network Successful in correcting two most common problems in previous system: –Misidentification of instruments –Incorrect octave labelling

Kashino and Hagita (1996) MRF-based networks use simulated annealing to converge to a low-energy state MRF approach enables information to be integrated on a multiply connected hypothesis network Bayesian networks only allow singly connected networks Could now deal with two kinds of transition information within a single hypothesis network: –chord transitions –note transitions

Kashino and Hagita (1996) Instrument and octave identification errors corrected, but some new errors introduced Overall, performed roughly 10% better than Bayesian-based system at transcribing 3- part arrangement of Auld Lang Syne Still only had a recognition rate of 71.7%

Kashino and Murase (1998) Shifts some work away from blackboard system by feeding it higher-level information Simplifies and mathematically formalizes notion of knowledge sources Switches back to Bayesian network Perhaps not truly a blackboard system anymore Has very good recognition rate Scalability of system is seriously compromised by new approach

Kashino and Murase (1998) Uses adaptive template matching Implemented using a bank of filters arranged in parallel and a number of templates corresponding to particular notes played by particular instruments The correlation between the outputs of the filters is calculated and a match is then made to one of the templates

Kashino and Murase (1998) Achieved recognition rate of 88.5% on real recordings of piano, violin and flute Including templates for many more instruments could make adaptive template matching intractable Particularly a problem for instruments with –Similar frequency spectra –A great deal of spectral variation from note to note

Hainsworth and Macleod (2001) “Automatic Bass Line Transcription from Polyphonic Music” Wanted to be able to extract a single given instrument from an arbitrary musical signal Contrast to previous approaches of using recordings of only one instrument or a set of pre-defined instruments

Hainsworth and Macleod (2001) Chose to work with bass –Can filter out high frequencies –Notes usually fairly steady Used simple mathematical relations to trim hypotheses rather than a true blackboard system Had a 78.7% success rate on a Miles Davis recording

Bello and Sandler (2000) “Blackboard Systems and Top-Down Processing for the Transcription of Simple Polyphonic Music” Return to a true blackboard system Based on Martin’s implementation, using a conventional scheduler Refines knowledge sources and adds high-level musical knowledge Implements one of knowledge sources as a neural network

Bello and Sandler (2000) The chord recognizer KS is a feedworard network Trained using the spectrograph of different chords of a piano Trained network fed a spectrograph and outputs possible chords Can therefore output more than one hypothesis at each iteration Gives other KSs more information and allows parallel exploration of solution space

Bello and Sandler (2000) Could automatically retrain network to recognize spectrograph of other instruments with no manual modifications needed Preliminary testing showed tendency to misidentify octaves and make incorrect identification of note onsets These problems could potentially be corrected by signal processing system that feeds blackboard system

Conclusions Bass transcription system and more recent work of Kashino useful for specific applications, but limited potential for general transcription purposes True blackboard approach scales well and appears to hold the most potential for general-purpose polyphonic transcription

Conclusions Use of adaptive learning in knowledge sources seems promising Interchangeable modules could be automatically trained to specialize in different areas Could have semi-automatic transcription, where user chooses correct modules and system performs transcription using them