Realtime Recognition of Orchestral Instruments Ichiro Fujinaga and Karl MacMillan Peabody Conservatory of Music Johns Hopkins University
Overview Introduction Lazy learning (exemplar-based learning) Results k-NN classifier Genetic algorithm Features Results Demonstration Conclusions
Introduction Realtime recognition of isolated monophonic orchestral instruments Spectrum analysis by Miller Puckette’s fiddle Adaptive system based on a exemplar-based classifier and a genetic algorithm
Overall Architecture Off-line Live mic Input Sound file Input Data Acquisition & Data Analysis (fiddle) Recognition K-NN Classifier Output Instrument Name Knowledge Base Feature Vectors Genetic Algorithm K-NN Classifier Best Weight Vector Off-line
Exemplar-based categorization Objects are categorized by their similarity to one or more stored examples No abstraction or generalizations, unlike rule-based or prototype-based models of concept formation Can be implemented using k-nearest neighbor classifier Slow and large storage requirements?
K-nearest-neighbor classifier Determine the class of a given sample by its feature vector: Distances between feature vectors of an unclassified sample and previously classified samples are calculated The class represented by the majority of k-nearest neighbors is then assigned to the unclassified sample
Example of k-NN classifier
Example of k-NN classifier
Example of k-NN classifier
Example of k-NN classifier
Distance measures The distance in a N-dimensional feature space between two vectors X and Y can be defined as: A weighted distance can be defined as:
Genetic algorithms Optimization based on biological evolution Maintenance of population using selection, crossover, and mutation Chromosomes = weight vector Fitness function = recognition rate Leave-one-out cross validation
Features Static features (per window) Dynamic features pitch mass or the integral of the curve (zeroth-order moment) centroid (first-order moment) variance (second-order central moment) skewness (third-order central moment) amplitudes of the harmonic partials number of strong harmonic partials spectral irregularity tristimulus Dynamic features means and velocities of static features over time
Data Original source: McGill Master Samples Over 1300 notes from 39 different timbres (23 orchestral instruments) Spectrum analysis by fiddle (2048 points) First 46–232ms of attack (1–9 windows) Each analysis window (46 ms) consists of a list of amplitudes and frequencies of the peaks in the spectra
Results Experiment I SHARC data static features Experiment II fiddle dynamic features Experiment III more features redefinition of attack point
Demonstration Using stored data Using recording Using audience
Conclusions Realtime timbre recognition system Analysis by Puckette’s fiddle Recognition using dynamic features Adaptive recognizer by k-NN classifier enhanced with genetic algorithm A successful implementation of exemplar-based classifier in a time-critical environment
Future research Performer identification Speaker identification Tone-quality analysis Multi-instrument recognition Expert recognition of timbre
Recognition rate for different lengths of analysis window
Comparison with Human Performance