Download presentation
Presentation is loading. Please wait.
Published byAngela Adley Modified over 9 years ago
1
Automatic Music Classification Cory McKay
2
2/47 Introduction Many areas of research in music information retrieval (MIR) involve using computers to classify music in various ways Genre or style classification Mood classification Performer or composer identification Music recommendation Playlist generation Hit prediction Audio to symbolic transcription etc. Such areas often share similar central procedures
3
3/47 Fundamental music classification tasks (1/3) Musical data collection The instances (basic entities) to classify Audio recordings, scores, cultural data, etc. Feature extraction Features represent characteristic information about instances Must provide sufficient information to segment instances among classes (categories) Machine learning Algorithms (“classifiers” or “learners”) learn to associate feature patterns of instances with their classes Musical Data Collection Basic Classification Tasks Feature Extraction Machine Learning Classifications Music
4
4/47 Fundamental music classification tasks (2/3) Many classification tasks require metadata about instances Title, composer, performer, genre, date, etc. Must be validated and corrected Raw information found in ID3 tags, Gracenote CDDB, etc. often erroneous and inconsistent Musical Data Collection Basic Classification Tasks Feature Extraction Machine Learning Metadata Analysis Classifications Music
5
5/47 Fundamental music classification tasks (3/3) Supervised learning requires training Correctly labeled model instances (“ground truth”) are used to teach classifiers to associate certain feature patterns with desired classes Trained classifiers can then classify novel instances Success of classifiers is dependent on the quality of the ground truth It is therefore essential that the metadata labeling of the musical data be accurate Musical Data Collection Basic Classification Tasks Feature Extraction Machine Learning Metadata Metadata Analysis Classifications Music Classifier Training
6
6/47 Consolidating fundamental tasks Properly performing these tasks requires significant effort and knowledge in (at least): Data mining Signal processing Musicology Result: Naïve or improperly performed research Duplication of effort Reluctance to use automatic music classification in musicological or other research where it could be useful Solution: standardized MIR research software Makes automatic music classification technology available to researchers in many disciplines
7
7/47 Existing MIR software Only a few MIR software systems have been built for use by other researchers e.g. Marsyas and M2K Tend to focus primarily on particular sub-tasks e.g. audio feature extraction Not typically well integrated with other systems Do not sufficiently emphasize extensibility Typically have usability problems Installation and licensing issues, poor documentation Result: Emphasis on existing techniques rather than development of new approaches Difficulties in integrating research between labs Inaccessible to non-technical music researchers
8
8/47 jMIR has been developed to meet the need for standardized MIR research software Has a separate software component to address each important aspect of automatic music classification Each component can be used independently Combinations of components can be used as an integrated whole Architectural emphasis on providing an extensible platform for iteratively developing new techniques and algorithms Can also be used directly as is Interfaces designed for both technical and non-technical users Well-documented Free and open source Cross-platform Java implementation jMIR
9
9/47 Musical data collection Musical Data Collection Basic Classification Tasks Feature Extraction Machine Learning Metadata Metadata Analysis Classifications Music Classifier Training
10
10/47 Types of musical data Audio recordings Sampled sound Wave, MP3, AAC, etc. Symbolic recordings Abstract musical instructions Scores, MIDI, Humdrum, etc. Cultural information Information external to musical content itself e.g. playlists, album reviews, Billboard stats, etc. Based on web searches, surveys, expert opinion, etc. Symbolic Recordings MIDI, scores, Humdrum, etc. Audio Recordings MP3, AAC, Wave, etc. Cultural Information Web, surveys, experts, etc. Musical Data Collection
11
11/47 Connections between data types Automatic transcription technologies are increasingly making it possible to automatically generate symbolic recordings from audio Metadata annotations are necessary for linking cultural information with particular recordings Symbolic Recordings MIDI, scores, Humdrum, etc. Audio Recordings MP3, AAC, Wave, etc. Cultural Information Web, surveys, experts, etc. Metadata Transcription Musical Data Collection
12
12/47 jMIR Codaich A research database of labeled MP3 recordings For use in training and testing algorithms There are plans to eventually include additional format types in Codaich Including symbolic formats Symbolic Recordings MIDI, scores, Humdrum, etc. Audio Recordings MP3, AAC, Wave, etc. Cultural Information Web, surveys, experts, etc. Metadata Transcription jMIR Codaich Musical Data Collection
13
13/47 Sharing Codaich Codaich is intended to provide a common knowledge base that can be used by researchers in different labs to compare the effectiveness of their varying approaches Overcoming copyright limitations on distributing music: On-demand Feature Extraction Network (OMEN) Implemented by Daniel McEnnis Researchers use distributed computing and the jMIR jAudio feature extractor to request local feature extraction at sites (e.g., libraries) that have legal access to individual recordings jAudio and OMEN allow custom original features and extraction parameters
14
14/47 Statistics on Codaich 27 305 MP3 recordings Constantly growing 2247 artists 55 genres Popular, classical, jazz and “world” 19 metadata fields
15
15/47 jMIR Bodhidharma MIDI Database Collection of labeled MIDI recordings 950 recordings 38 genres Symbolic Recordings MIDI, scores, Humdrum, etc. Audio Recordings MP3, AAC, Wave, etc. Cultural Information Web, surveys, experts, etc. Metadata Transcription jMIR Codaich jMIR Bodhidharma MIDI Database Musical Data Collection
16
16/47 jMIR jMusicMetaManager Metadata found with recordings is typically problematic Inconsistent Error-prone jMusicMetaManager is software that automatically analyzes metadata across recordings Is currently used to maintain Codaich There are plans to adapt it to MIDI as well Symbolic Recordings MIDI, scores, Humdrum, etc. Audio Recordings MP3, AAC, Wave, etc. Cultural Information Web, surveys, experts, etc. Metadata Transcription jMIR Codaich jMIR jMusicMeta- Manager jMIR Bodhidharma MIDI Database Musical Data Collection
17
17/47 Tasks performed by jMusicMetaManager Detects differing metadata values that should in fact be the same e.g. in an performer identification task, “Charlie Mingus” should not be misclassified as a different performer than “Mingus, Charles” Detects redundant copies of recordings Could contaminate test sets Generates inventory and statistical profile reports 39 reports in all
18
18/47 How jMusicMetaManager works Calculates edit distance between pairs of field values Threshold based on field lengths Performs 23 additional pre- processing equivalency operations Considers varied word orderings and word subsets Applies false error filtering
19
19/47 jMusicMetaManager’s I/O Parses metadata from Apple iTunes XML or MP3 ID3 tags And Gracenote CDDB, indirectly Can export to ACE XML or Weka ARFF Generates reports in frames-based HTML
20
20/47 Musical data collection summary Symbolic Recordings MIDI, scores, Humdrum, etc. Audio Recordings MP3, AAC, Wave, etc. Cultural Information Web, surveys, experts, etc. Metadata Transcription jMIR Codaich jMIR jMusicMeta- Manager jMIR Bodhidharma MIDI Database Musical Data Collection
21
21/47 Feature extraction Musical Data Collection Basic Classification Tasks Feature Extraction Machine Learning Metadata Metadata Analysis Classifications Music Classifier Training
22
22/47 Types of features Low-level Associated with signal processing and basic auditory perception e.g. spectral flux or RMS Usually not intuitively musical High-level Musical abstractions e.g. meter or pitch class distributions Cultural Sociocultural information outside the scope of auditory or musical content e.g. playlist co-occurrence or purchase correlations Feature Extraction Low-Level Features High-Level Features Cultural Features
23
23/47 jMIR jAudio Implemented jointly with Daniel McEnnis Extracts features from audio files MP3, WAV, AIFF, AU, SND 28 bundled core features Mainly low-level Some high-level Audio Recordings jMIR jAudio Feature Extraction Low-Level Features High-Level Features Cultural Features Extracted Feature Values
24
24/47 Developing features with jAudio Two general ways of using jAudio Directly as an audio feature extractor Platform for developing and sharing new features Can be independent features Can be based on existing features New features are added using a modular plugin interface jAudio (like all jMIR feature extractors) automatically calculates feature dependencies and scheduling at runtime
25
25/47 Metafeatures and aggregators jAudio automatically calculates “metafeatures” of new or existing features e.g. running means, standard deviations or derivatives across sample windows jAudio automatically calculates “aggregators” for new or existing features Functions that collapse a sequence of feature vectors into a single vector or smaller sequence of vectors Useful for representing in a low-dimensional way how different features change together e.g. the Area of Moments aggregator transforms a set of feature vectors into a two-dimensional image matrix and calculates two-dimensional moments
26
26/47 Using jAudio Customizable extraction parameters Window size and overlap Normalization Downsampling Individual feature parameters Records and synthesizes audio Converts MIDI to audio Displays audio in both the time and frequency domains
27
27/47 jMIR jSymbolic Extracts high- level features from MIDI files 111 bundled features Currently being expanded to 160 Many are original Symbolic Recordings Audio Recordings jMIR jAudio jMIR jSymbolic Feature Extraction Low-Level Features High-Level Features Cultural Features Extracted Feature Values
28
28/47 jSymbolic’s features Features fall into 7 broad categories Instrumentation Musical Texture Rhythm Dynamics Pitch Statistics Melody Chords Histogram aggregators are often used Rhythm, pitch, pitch class, melody, vertical interval and chord histograms
29
29/47 jMIR jWebMiner Extracts cultural features from the web using web services Google Yahoo! Calculates the coocurrence and cross tabulation of metadata fields e.g. how often does Bach co-occur on a web page with Baroque, compared to Stravinsky? Currently in alpha development Symbolic Recordings Audio Recordings Cultural Information jMIR jAudio jMIR jSymbolic jMIR jWebMiner Feature Extraction Low-Level Features High-Level Features Cultural Features Extracted Feature Values
30
30/47 jWebMiner’s functionality Parses search terms from: iTunes, ACE XML, Weka ARFF, text Can assign higher weights to particular sites e.g. All Music, Wikipedia, Pitchfork, etc. Can enforce filter words e.g. a site must include the word “music” to be considered
31
31/47 Feature extraction summary Symbolic Recordings Audio Recordings Cultural Information jMIR jAudio jMIR jSymbolic jMIR jWebMiner Feature Extraction Low-Level Features High-Level Features Cultural Features Extracted Feature Values
32
32/47 Machine learning Musical Data Collection Basic Classification Tasks Feature Extraction Machine Learning Metadata Metadata Analysis Classifications Music Classifier Training
33
33/47 Some types of machine learning Supervised Learners trained on model labeled instances Unsupervised Examines instances in terms of internal similarities rather than externally provided labels Ensemble Multiple classifiers work together Hopefully perform better overall than individually Supervised Algorithms Machine Learning Unsupervised Algorithms Ensemble Algorithms
34
34/47 Input to machine learning systems Extracted feature values serve as the percepts of classifiers Ground truth needed by supervised learners A class ontology (structured set of relationships between classes) is sometimes used Some learners can capitalize on structuring Long-term goal is to allow arbitrary ontologies in jMIR Supervised Algorithms Machine Learning Extracted Features Ground Truth Unsupervised Algorithms Ensemble Algorithms Class Ontology
35
35/47 Training and testing sets Data segmented into training and testing sets if classifiers need to be trained To avoid overtraining (failure to generalize training instance features to those of the general instance population) Feature values are simply passed on if training is not needed Supervised Algorithms Machine Learning Extracted Features Ground Truth Unsupervised Algorithms Ensemble Algorithms Class Ontology Training Sets Testing Sets Features to Classify OR
36
36/47 Dimensionality reduction algorithms Too many features degrade classifier performance “Curse of dimensionality” Too few features can fail to encapsulate sufficient information Dimensionality reduction algorithms automatically find a good lower- dimensional subset or projection of the given features Supervised Algorithms Machine Learning Extracted Features Ground Truth Unsupervised Algorithms Ensemble Algorithms Dimensionality Reduction Algorithms Class Ontology Training Sets Testing Sets Features to Classify OR
37
37/47 Output of machine learning systems Classifications of instances are output if no supervised training is needed Metalearners can be used to choose appropriate classifier(s) Each algorithm has its own strengths and weaknesses Training output consists of evaluations of each algorithm as well as the trained classifiers Supervised Algorithms Machine Learning Extracted Features Ground Truth Classification Results Unsupervised Algorithms Ensemble Algorithms Dimensionality Reduction Algorithms Class Ontology Algorithm Evaluations Training Sets Testing Sets Features to Classify OR Trained Classifiers OR
38
38/47 jMIR ACE ACE is jMIR’s classifier and metalearner Automatically experiments with and selects classifier(s) Trains classifiers Classifies novel instances Supervised Algorithms Machine Learning Extracted Features Ground Truth Classification Results Unsupervised Algorithms Ensemble Algorithms jMIR ACE Dimensionality Reduction Algorithms Class Ontology Algorithm Evaluations Training Sets Testing Sets Features to Classify OR Trained Classifiers OR
39
39/47 Algorithms experimented with by ACE Classifiers: Induction trees, naive Bayes, k-nearest neighbour, neural networks, support vector machines Classifier parameters are also varied automatically Dimensionality reduction: Principal component analysis, exhaustive searches, feature selection using genetic algorithms Classifier ensembles: Bagging, boosting Additional algorithms will be added in the future: Including unsupervised learning algorithms Researchers are encouraged to add their own algorithms ACE, like all jMIR components, emphasizes extensibility ACE utilizes the Weka general pattern recognition library
40
40/47 Details of ACE ACE evaluates algorithms in terms of Classification accuracy Performance consistency Training complexity / time Classification complexity / time There are future plans to utilize distributed computing to spread out the computational burden Will also add the ability to impose limits on the time available for the ACE metalearner to come up algorithm selections
41
41/47 ACE’s interface Command line Java API GUI In alpha development
42
42/47 jMIR ACE XML files Allow jMIR components to communicate with each other Allow jMIR output to be used by other software To help ensure interoperability, jMIR components also produce and parse Weka ARFF files Supervised Algorithms Machine Learning Extracted Features Ground Truth Classification Results Unsupervised Algorithms Ensemble Algorithms jMIR ACE Dimensionality Reduction Algorithms jMIR ACE XML Files Class Ontology Algorithm Evaluations Training Sets Testing Sets Features to Classify OR Trained Classifiers OR
43
43/47 Details of the ACE XML files Information stored in ACE XML files: Feature values and information about features Model classifications and other metadata Class taxonomies Will be expanded to general ontologies in the future Advantages of ACE XML compared to general data mining file formats (e.g. Weka ARFF) Ability to assign multiple classes to individual instances Ability to classify both overall instances and their sub-sections Maintenance of logical groupings of multi-dimensional features Maintenance of internal identifying metadata about instances Ability to represent taxonomical class structures
44
44/47 Machine learning summary Supervised Algorithms Machine Learning Extracted Features Ground Truth Classification Results Unsupervised Algorithms Ensemble Algorithms jMIR ACE Dimensionality Reduction Algorithms jMIR ACE XML Files Class Ontology Algorithm Evaluations Training Sets Testing Sets Features to Classify OR Trained Classifiers OR
45
45/47 Overview of jMIR jAudiojSymbolicjWebMiner jMIR and its Components Codaich jMusicMetaManager Bodhidharma Audio MusicSymbolic Music Internet ACE XML Files ACE Classification Output Algorithm Evaluations Trained Classifiers OR Musical Data Collection Basic Classification Tasks Feature Extraction Machine Learning Metadata Metadata Analysis Classifications Music Classifier Training
46
46/47 Goals of jMIR Make sophisticated pattern recognition technologies accessible to music researchers with both technical and non-technical backgrounds Increase cooperation between research groups Enable objective comparisons of algorithms Eliminate redundant duplication of effort Facilitate iterative development and sharing of new MIR technologies Facilitate research combining all 3 feature types Limited intersection of information encapsulated by each type Significant potential to improve classification performance
47
47/47 Contact information Software available at: http://sourceforge.net/projects/jmir e-mail: cory.mckay@mail.mcgill.ca
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.