Automatic Music Classification Cory McKay. 2/47 Introduction Many areas of research in music information retrieval (MIR) involve using computers to classify.

Automatic Music Classification Cory McKay

2/47 Introduction Many areas of research in music information retrieval (MIR) involve using computers to classify music in various ways  Genre or style classification  Mood classification  Performer or composer identification  Music recommendation  Playlist generation  Hit prediction  Audio to symbolic transcription  etc. Such areas often share similar central procedures

3/47 Fundamental music classification tasks (1/3) Musical data collection  The instances (basic entities) to classify  Audio recordings, scores, cultural data, etc. Feature extraction  Features represent characteristic information about instances  Must provide sufficient information to segment instances among classes (categories) Machine learning  Algorithms (“classifiers” or “learners”) learn to associate feature patterns of instances with their classes Musical Data Collection Basic Classification Tasks Feature Extraction Machine Learning Classifications Music

4/47 Fundamental music classification tasks (2/3) Many classification tasks require metadata about instances  Title, composer, performer, genre, date, etc. Must be validated and corrected  Raw information found in ID3 tags, Gracenote CDDB, etc. often erroneous and inconsistent Musical Data Collection Basic Classification Tasks Feature Extraction Machine Learning Metadata Analysis Classifications Music

5/47 Fundamental music classification tasks (3/3) Supervised learning requires training  Correctly labeled model instances (“ground truth”) are used to teach classifiers to associate certain feature patterns with desired classes  Trained classifiers can then classify novel instances Success of classifiers is dependent on the quality of the ground truth  It is therefore essential that the metadata labeling of the musical data be accurate Musical Data Collection Basic Classification Tasks Feature Extraction Machine Learning Metadata Metadata Analysis Classifications Music Classifier Training

6/47 Consolidating fundamental tasks Properly performing these tasks requires significant effort and knowledge in (at least):  Data mining  Signal processing  Musicology Result:  Naïve or improperly performed research  Duplication of effort  Reluctance to use automatic music classification in musicological or other research where it could be useful Solution: standardized MIR research software  Makes automatic music classification technology available to researchers in many disciplines

7/47 Existing MIR software Only a few MIR software systems have been built for use by other researchers  e.g. Marsyas and M2K  Tend to focus primarily on particular sub-tasks e.g. audio feature extraction  Not typically well integrated with other systems  Do not sufficiently emphasize extensibility  Typically have usability problems Installation and licensing issues, poor documentation Result:  Emphasis on existing techniques rather than development of new approaches  Difficulties in integrating research between labs  Inaccessible to non-technical music researchers

8/47 jMIR has been developed to meet the need for standardized MIR research software  Has a separate software component to address each important aspect of automatic music classification Each component can be used independently Combinations of components can be used as an integrated whole  Architectural emphasis on providing an extensible platform for iteratively developing new techniques and algorithms Can also be used directly as is  Interfaces designed for both technical and non-technical users Well-documented  Free and open source Cross-platform Java implementation jMIR

9/47 Musical data collection Musical Data Collection Basic Classification Tasks Feature Extraction Machine Learning Metadata Metadata Analysis Classifications Music Classifier Training

10/47 Types of musical data Audio recordings  Sampled sound  Wave, MP3, AAC, etc. Symbolic recordings  Abstract musical instructions  Scores, MIDI, Humdrum, etc. Cultural information  Information external to musical content itself e.g. playlists, album reviews, Billboard stats, etc.  Based on web searches, surveys, expert opinion, etc. Symbolic Recordings MIDI, scores, Humdrum, etc. Audio Recordings MP3, AAC, Wave, etc. Cultural Information Web, surveys, experts, etc. Musical Data Collection

11/47 Connections between data types Automatic transcription technologies are increasingly making it possible to automatically generate symbolic recordings from audio Metadata annotations are necessary for linking cultural information with particular recordings Symbolic Recordings MIDI, scores, Humdrum, etc. Audio Recordings MP3, AAC, Wave, etc. Cultural Information Web, surveys, experts, etc. Metadata Transcription Musical Data Collection

12/47 jMIR Codaich A research database of labeled MP3 recordings  For use in training and testing algorithms There are plans to eventually include additional format types in Codaich  Including symbolic formats Symbolic Recordings MIDI, scores, Humdrum, etc. Audio Recordings MP3, AAC, Wave, etc. Cultural Information Web, surveys, experts, etc. Metadata Transcription jMIR Codaich Musical Data Collection

13/47 Sharing Codaich Codaich is intended to provide a common knowledge base that can be used by researchers in different labs to compare the effectiveness of their varying approaches Overcoming copyright limitations on distributing music:  On-demand Feature Extraction Network (OMEN) Implemented by Daniel McEnnis  Researchers use distributed computing and the jMIR jAudio feature extractor to request local feature extraction at sites (e.g., libraries) that have legal access to individual recordings  jAudio and OMEN allow custom original features and extraction parameters

14/47 Statistics on Codaich 27 305 MP3 recordings  Constantly growing 2247 artists 55 genres  Popular, classical, jazz and “world” 19 metadata fields

15/47 jMIR Bodhidharma MIDI Database Collection of labeled MIDI recordings 950 recordings 38 genres Symbolic Recordings MIDI, scores, Humdrum, etc. Audio Recordings MP3, AAC, Wave, etc. Cultural Information Web, surveys, experts, etc. Metadata Transcription jMIR Codaich jMIR Bodhidharma MIDI Database Musical Data Collection

16/47 jMIR jMusicMetaManager Metadata found with recordings is typically problematic  Inconsistent  Error-prone jMusicMetaManager is software that automatically analyzes metadata across recordings Is currently used to maintain Codaich  There are plans to adapt it to MIDI as well Symbolic Recordings MIDI, scores, Humdrum, etc. Audio Recordings MP3, AAC, Wave, etc. Cultural Information Web, surveys, experts, etc. Metadata Transcription jMIR Codaich jMIR jMusicMeta- Manager jMIR Bodhidharma MIDI Database Musical Data Collection

17/47 Tasks performed by jMusicMetaManager Detects differing metadata values that should in fact be the same  e.g. in an performer identification task, “Charlie Mingus” should not be misclassified as a different performer than “Mingus, Charles” Detects redundant copies of recordings  Could contaminate test sets Generates inventory and statistical profile reports  39 reports in all

18/47 How jMusicMetaManager works Calculates edit distance between pairs of field values  Threshold based on field lengths Performs 23 additional pre- processing equivalency operations Considers varied word orderings and word subsets Applies false error filtering

19/47 jMusicMetaManager’s I/O Parses metadata from Apple iTunes XML or MP3 ID3 tags  And Gracenote CDDB, indirectly  Can export to ACE XML or Weka ARFF Generates reports in frames-based HTML

20/47 Musical data collection summary Symbolic Recordings MIDI, scores, Humdrum, etc. Audio Recordings MP3, AAC, Wave, etc. Cultural Information Web, surveys, experts, etc. Metadata Transcription jMIR Codaich jMIR jMusicMeta- Manager jMIR Bodhidharma MIDI Database Musical Data Collection

21/47 Feature extraction Musical Data Collection Basic Classification Tasks Feature Extraction Machine Learning Metadata Metadata Analysis Classifications Music Classifier Training

22/47 Types of features Low-level  Associated with signal processing and basic auditory perception  e.g. spectral flux or RMS  Usually not intuitively musical High-level  Musical abstractions  e.g. meter or pitch class distributions Cultural  Sociocultural information outside the scope of auditory or musical content  e.g. playlist co-occurrence or purchase correlations Feature Extraction Low-Level Features High-Level Features Cultural Features

23/47 jMIR jAudio Implemented jointly with Daniel McEnnis Extracts features from audio files  MP3, WAV, AIFF, AU, SND 28 bundled core features  Mainly low-level  Some high-level Audio Recordings jMIR jAudio Feature Extraction Low-Level Features High-Level Features Cultural Features Extracted Feature Values

24/47 Developing features with jAudio Two general ways of using jAudio  Directly as an audio feature extractor  Platform for developing and sharing new features Can be independent features Can be based on existing features New features are added using a modular plugin interface  jAudio (like all jMIR feature extractors) automatically calculates feature dependencies and scheduling at runtime

25/47 Metafeatures and aggregators jAudio automatically calculates “metafeatures” of new or existing features  e.g. running means, standard deviations or derivatives across sample windows jAudio automatically calculates “aggregators” for new or existing features  Functions that collapse a sequence of feature vectors into a single vector or smaller sequence of vectors  Useful for representing in a low-dimensional way how different features change together  e.g. the Area of Moments aggregator transforms a set of feature vectors into a two-dimensional image matrix and calculates two-dimensional moments

26/47 Using jAudio Customizable extraction parameters  Window size and overlap  Normalization  Downsampling  Individual feature parameters Records and synthesizes audio Converts MIDI to audio Displays audio in both the time and frequency domains

27/47 jMIR jSymbolic Extracts high- level features from MIDI files 111 bundled features  Currently being expanded to 160  Many are original Symbolic Recordings Audio Recordings jMIR jAudio jMIR jSymbolic Feature Extraction Low-Level Features High-Level Features Cultural Features Extracted Feature Values

28/47 jSymbolic’s features Features fall into 7 broad categories  Instrumentation  Musical Texture  Rhythm  Dynamics  Pitch Statistics  Melody  Chords Histogram aggregators are often used  Rhythm, pitch, pitch class, melody, vertical interval and chord histograms

29/47 jMIR jWebMiner Extracts cultural features from the web using web services  Google  Yahoo! Calculates the coocurrence and cross tabulation of metadata fields  e.g. how often does Bach co-occur on a web page with Baroque, compared to Stravinsky? Currently in alpha development Symbolic Recordings Audio Recordings Cultural Information jMIR jAudio jMIR jSymbolic jMIR jWebMiner Feature Extraction Low-Level Features High-Level Features Cultural Features Extracted Feature Values

30/47 jWebMiner’s functionality Parses search terms from:  iTunes, ACE XML, Weka ARFF, text Can assign higher weights to particular sites  e.g. All Music, Wikipedia, Pitchfork, etc. Can enforce filter words  e.g. a site must include the word “music” to be considered

31/47 Feature extraction summary Symbolic Recordings Audio Recordings Cultural Information jMIR jAudio jMIR jSymbolic jMIR jWebMiner Feature Extraction Low-Level Features High-Level Features Cultural Features Extracted Feature Values

32/47 Machine learning Musical Data Collection Basic Classification Tasks Feature Extraction Machine Learning Metadata Metadata Analysis Classifications Music Classifier Training

33/47 Some types of machine learning Supervised  Learners trained on model labeled instances Unsupervised  Examines instances in terms of internal similarities rather than externally provided labels Ensemble  Multiple classifiers work together  Hopefully perform better overall than individually Supervised Algorithms Machine Learning Unsupervised Algorithms Ensemble Algorithms

34/47 Input to machine learning systems Extracted feature values serve as the percepts of classifiers Ground truth needed by supervised learners A class ontology (structured set of relationships between classes) is sometimes used  Some learners can capitalize on structuring  Long-term goal is to allow arbitrary ontologies in jMIR Supervised Algorithms Machine Learning Extracted Features Ground Truth Unsupervised Algorithms Ensemble Algorithms Class Ontology

35/47 Training and testing sets Data segmented into training and testing sets if classifiers need to be trained  To avoid overtraining (failure to generalize training instance features to those of the general instance population) Feature values are simply passed on if training is not needed Supervised Algorithms Machine Learning Extracted Features Ground Truth Unsupervised Algorithms Ensemble Algorithms Class Ontology Training Sets Testing Sets Features to Classify OR

36/47 Dimensionality reduction algorithms Too many features degrade classifier performance  “Curse of dimensionality” Too few features can fail to encapsulate sufficient information Dimensionality reduction algorithms automatically find a good lower- dimensional subset or projection of the given features Supervised Algorithms Machine Learning Extracted Features Ground Truth Unsupervised Algorithms Ensemble Algorithms Dimensionality Reduction Algorithms Class Ontology Training Sets Testing Sets Features to Classify OR

37/47 Output of machine learning systems Classifications of instances are output if no supervised training is needed Metalearners can be used to choose appropriate classifier(s)  Each algorithm has its own strengths and weaknesses  Training output consists of evaluations of each algorithm as well as the trained classifiers Supervised Algorithms Machine Learning Extracted Features Ground Truth Classification Results Unsupervised Algorithms Ensemble Algorithms Dimensionality Reduction Algorithms Class Ontology Algorithm Evaluations Training Sets Testing Sets Features to Classify OR Trained Classifiers OR

38/47 jMIR ACE ACE is jMIR’s classifier and metalearner  Automatically experiments with and selects classifier(s)  Trains classifiers  Classifies novel instances Supervised Algorithms Machine Learning Extracted Features Ground Truth Classification Results Unsupervised Algorithms Ensemble Algorithms jMIR ACE Dimensionality Reduction Algorithms Class Ontology Algorithm Evaluations Training Sets Testing Sets Features to Classify OR Trained Classifiers OR

39/47 Algorithms experimented with by ACE Classifiers:  Induction trees, naive Bayes, k-nearest neighbour, neural networks, support vector machines  Classifier parameters are also varied automatically Dimensionality reduction:  Principal component analysis, exhaustive searches, feature selection using genetic algorithms Classifier ensembles:  Bagging, boosting Additional algorithms will be added in the future:  Including unsupervised learning algorithms Researchers are encouraged to add their own algorithms  ACE, like all jMIR components, emphasizes extensibility  ACE utilizes the Weka general pattern recognition library

40/47 Details of ACE ACE evaluates algorithms in terms of  Classification accuracy  Performance consistency  Training complexity / time  Classification complexity / time There are future plans to utilize distributed computing to spread out the computational burden  Will also add the ability to impose limits on the time available for the ACE metalearner to come up algorithm selections

41/47 ACE’s interface Command line Java API GUI  In alpha development

42/47 jMIR ACE XML files Allow jMIR components to communicate with each other Allow jMIR output to be used by other software  To help ensure interoperability, jMIR components also produce and parse Weka ARFF files Supervised Algorithms Machine Learning Extracted Features Ground Truth Classification Results Unsupervised Algorithms Ensemble Algorithms jMIR ACE Dimensionality Reduction Algorithms jMIR ACE XML Files Class Ontology Algorithm Evaluations Training Sets Testing Sets Features to Classify OR Trained Classifiers OR

43/47 Details of the ACE XML files Information stored in ACE XML files:  Feature values and information about features  Model classifications and other metadata  Class taxonomies Will be expanded to general ontologies in the future Advantages of ACE XML compared to general data mining file formats (e.g. Weka ARFF)  Ability to assign multiple classes to individual instances  Ability to classify both overall instances and their sub-sections  Maintenance of logical groupings of multi-dimensional features  Maintenance of internal identifying metadata about instances  Ability to represent taxonomical class structures

44/47 Machine learning summary Supervised Algorithms Machine Learning Extracted Features Ground Truth Classification Results Unsupervised Algorithms Ensemble Algorithms jMIR ACE Dimensionality Reduction Algorithms jMIR ACE XML Files Class Ontology Algorithm Evaluations Training Sets Testing Sets Features to Classify OR Trained Classifiers OR

45/47 Overview of jMIR jAudiojSymbolicjWebMiner jMIR and its Components Codaich jMusicMetaManager Bodhidharma Audio MusicSymbolic Music Internet ACE XML Files ACE Classification Output Algorithm Evaluations Trained Classifiers OR Musical Data Collection Basic Classification Tasks Feature Extraction Machine Learning Metadata Metadata Analysis Classifications Music Classifier Training

46/47 Goals of jMIR Make sophisticated pattern recognition technologies accessible to music researchers with both technical and non-technical backgrounds Increase cooperation between research groups  Enable objective comparisons of algorithms  Eliminate redundant duplication of effort  Facilitate iterative development and sharing of new MIR technologies Facilitate research combining all 3 feature types  Limited intersection of information encapsulated by each type  Significant potential to improve classification performance

47/47 Contact information Software available at:  http://sourceforge.net/projects/jmir e-mail:  cory.mckay@mail.mcgill.ca

Automatic Music Classification Cory McKay. 2/47 Introduction Many areas of research in music information retrieval (MIR) involve using computers to classify.

Similar presentations

Presentation on theme: "Automatic Music Classification Cory McKay. 2/47 Introduction Many areas of research in music information retrieval (MIR) involve using computers to classify."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Automatic Music Classification Cory McKay. 2/47 Introduction Many areas of research in music information retrieval (MIR) involve using computers to classify.

Similar presentations

Presentation on theme: "Automatic Music Classification Cory McKay. 2/47 Introduction Many areas of research in music information retrieval (MIR) involve using computers to classify."— Presentation transcript:

Similar presentations

About project

Feedback