Tomás Pérez-García, Carlos Pérez-Sancho, José M. Iñesta

Harmonic and Instrumental Information Fusion for Musical Genre Classification
Tomás Pérez-García, Carlos Pérez-Sancho, José M. Iñesta Pattern Recognition and Artificial Intelligence Group Department of Software and Computing Systems University of Alicante I’m going to present a work on symbolic musical genre classification using a multimodal approach, by combining together harmonic and instrumental information.

Outline Introduction Methodology Previous work Experiments
Conclusions and future work First, I will introduce our classification problem and I’ll make a brief introduction into multimodal classification. Then, I will explain the methodology used in this work, along with the experimental results, compared to those obtained in previous works for the same problem but using single information sources. Finally, I will show the conclusions and some future lines of work. 25/10/2010 MML 2010

Conclusions and future work So let’s begin with it. 25/10/2010 MML 2010

The Task Musical genre classification using a multimodal approach and symbolic features Useful for extracting metadata from new songs for… automatic indexing of musical databases modeling of user’s musical taste other MIR related tasks… In this work we are facing the problem of musical genre classification. This is a problem that has been thoroughly studied, and it has been shown that it is a useful method for generating new metadata for new or unknown songs in a musical database, for example to perform the automatic indexing of the database, or maybe for modeling the user’s musical taste, among other applications. In most papers in the literature, this has been done by using a single source of information, which can be an audio signal, or a symbolic digital score. However, in this work we have studied the benefits of using two different sources of information, both of them symbolic, that offer complementary information about the song being classified. In particular, we have used the instruments present in the song and the harmonic structure of the musical score. Since neither of them can be considered to be directly related (or derived) from the other, this can be seen as a multimodal classification task. 25/10/2010 MML 2010

Multimodal classification
Data description is obtained from different sources (e.g. audio and image descriptors from a video file) FEATURE VECTOR MULTIMODAL DATA FEATURE EXTRACTION So, what is multimodality? Well, when we say some data are multimodal, what we are saying is that we have several different kinds of information related to one subject, and neither of them are directly derived from the other. For example, it can be the case of multimedia data, such as video files, where the audio and the images can be processed separately, and both of them reflect a different perspective of what is happening inside the video. Of course, from the classification point of view this is desirable because the more and more variate information we have, the more precise the classification can be performed. But there arises a problem, because we need to find a suitable method for combining these different kinds of information. FEATURE EXTRACTION FEATURE EXTRACTION FEATURE VECTOR FEATURE VECTOR 25/10/2010 MML 2010

Need for a strategy to combine the different sources Early scheme: features are combined and fed into a single classifier Late scheme: each feature set is classified independently and the individual decisions are combined There are two basic standard approaches: The simplest one is to combine the feature sets obtained from each source, so the combined feature set can be fed as the input to a single classifier, as we can see in the following diagram. 25/10/2010 MML 2010

+ ? Early scheme GENRE FEATURE VECTOR FEATURE VECTOR
COMBINED FEATURE VECTOR FEATURE VECTOR ? CLASSIFIER As you can see, this is a quite simple scheme, very easy to implement, but has the drawback that it can only be used when all the features are of the same nature, so they can be all accepted by the same classifier. METADATA GENRE 25/10/2010 MML 2010

Need for a strategy to combine the different sources Early scheme: features are combined and fed into a single classifier Late scheme: each feature set is classified independently and the individual decisions are combined And the second alternative, which is useful when the different feature sets are of different nature and cannot be combined, for example, is to classify each data source separately. 25/10/2010 MML 2010

+ Late scheme ? ? ? GENRE GENRE GENRE GENRE METADATA FEATURE VECTOR
CLASSIFIER GENRE FEATURE VECTOR ? CLASSIFIER GENRE FEATURE VECTOR ? CLASSIFIER GENRE This way we obtain an individual decision by a different classifier for each feature set, so we need to combine the decisions of the single classifiers by using some classifier combination technique, as for example the majority vote or some other more sophisticated technique. 25/10/2010 MML 2010

Conclusions and future work OK, so now we have a better understanding of the classification scheme I’m going to show the data and the classification method we used in this work. 25/10/2010 MML 2010

Musical data 9GDB (Pérez-Sancho, Rizo, and Iñesta, 2008)
Hierarchical structure 3 genres 9 subgenres 856 musical pieces available in MIDI and Band-in-a-Box formats Academic 235 Jazz 338 Popular 283 Baroque 56 Pre-bop 178 Blues 84 Classical 50 Bop 94 Pop 100 Romanticism 129 Bossanova 66 Celtic 99 The musical data we have used is a dataset we developed and presented in a previous work, which we call the “9 genres database”. It is a database of 856 musical pieces with a hierarchical structure, comprising 3 broad genres: academic, jazz, and popular music; and 9 subgenres, which are shown in this table. All the music files are available in two formats: as MIDI files, and also in the format of the software Band-in-a-box. 25/10/2010 MML 2010

Musical data Two different sources of information extracted from the data Instrumentation, metadata extracted from the MIDI files using the metamidi* tool (Pérez-García, Iñesta, and Rizo, 2009) Harmonic structure, chord sequences extracted from the Band-in-a-Box files (Pérez-Sancho, Rizo, and Iñesta, 2008) So we have extracted two different kinds of information, both unrelated to the other, so the problem stays multimodal. The first one is the instrumentation, i.e. the metadata regarding the instruments present in the piece. This has been extracted from the MIDI files using a tool developed by our group which is freely available. And the second one is the harmonic structure, encoded as chord sequences extracted from the Band-in-a-box files. * (Resources section) 25/10/2010 MML 2010

Classification method
Naïve Bayes classifier using a binomial distribution Songs are encoded as feature vectors, where each position represents the presence/absence of a feature in the song: Feature selection using Average Mutual Information Early scheme for multimodality 1 … In order to perform classification we have selected the naïve Bayes classifier, using a binomial distribution. This is a simple method that basically considers the presence or absence of each individual feature, and computes the probability for each class taking into account which features are present in each song in the dataset. This way, the songs are encoded as vectors of 1’s and 0’s, reflecting which instruments or chords, depending on the feature set, are present in the song. Note that this simple encoding allows to easily combine both feature sets, so we opted for an early scheme in order to perform the multimodal classification. Also, along with the classifier, we have used a feature selection technique called Average Mutual Information, in order to reduce the dimensionality of the feature vectors. 25/10/2010 MML 2010

Conclusions and future work The methodology used in this work has been also used in two previous papers, using the same classification method but using the two information sources, instruments and chords, independently. So, before proceeding to the multimodal experiments, I will show a summary of the results obtained in those works in order to assess the benefit obtained when combining both sources. 25/10/2010 MML 2010

Harmonic classification
Songs encoded as harmonic sequences using standard musical notation (Pérez, Rizo, and Iñesta, 2008) C G7 Am F G7 … Songs encoded as vectors H = 312 different chords (12 chord roots x 26 extensions) In the first experiments we used harmonic information alone, encoding the songs as chord sequences using the standard musical notation. This way, each song is encoded as a vector of 312 positions, which is the total chord vocabulary we considered, using 26 different extensions for each chord root. 25/10/2010 MML 2010

Harmonic classification
Classification results (10-fold cross-validation) Most errors in 9 genres classification between close subgenres (e.g. classical vs. baroque music) % success 3 genres 83 ± 3 9 genres 64 ± 4 These are the result obtained in this experiment. As expected, the results were better when classifying between the 3 broad genres only, than when performing classification between all the 9 subgenres. This is mainly due to the big number of errors committed between close subgenres inside a broad genre, as for example between classical and baroque music. 25/10/2010 MML 2010

Instrument-based classification
Genre classification using instrumental information (Pérez-García, Iñesta, and Rizo, 2009) Songs encoded as vectors I = 131 (128 instruments percussion sets) (General MIDI standard) In the second work we used the same dataset, but using instrumental information alone. This time each song was encoded as a vector of 131 positions, referring to the presence or absence of a set of 128 instruments, taken from the General MIDI standard, and 3 percussion sets defined by us, which are the standard drum kit, a latin percussion set, and the last set comprising any other percussion instruments. 25/10/2010 MML 2010

Instrument-based classification
Classification results using Naïve Bayes (10-fold cross-validation) Results improved over harmonic features Again, most errors in 9 genres classification between close subgenres Harmonic Instrumental 3 genres 83 ± 3 93 ± 2 9 genres 64 ± 4 68 ± 5 Here we can see a comparison of the results obtained in both experiments. It is remarkable the great precision obtained for the 3-classes experiment using instruments, but again, we found many errors between close subgenres, which lead to a similar result when using 9 classes. 25/10/2010 MML 2010

Conclusions and future work 25/10/2010 MML 2010

Harmonic and instrumental classification
Fusion of harmonic and instrumental information Early scheme: both feature sets are combined as a single feature vector In the experiments performed in this work, we combined both feature sets, instrumental and harmonic. Since in both cases the feature vectors are made of 1’s and 0’s, and also the naïve Bayes classifier assumes independence for each individual feature, this can be done in a straightforward way by simply joining together all the features in a single vector, and training and classification is done in the traditional way. 1 … 131 132 443 … Instrumental information Harmonic information 25/10/2010 MML 2010

Harmonic and instrumental classification
Classification results (10-fold cross-validation) Instrumental Harmonic Instrumental + harmonic 3 genres 93 ± 2 83 ± 3 95 ± 2 9 genres 68 ± 5 64 ± 4 79 ± 3 Instrumental Harmonic Instrumental + harmonic 3 genres 93 ± 2 83 ± 3 95 ± 2 9 genres 68 ± 5 64 ± 4 79 ± 3 Here are the results obtained in this experiment. We can see that there was a slight improvement in the results for the 3-classes problem, although it cannot be considered statistically significant. But here you can see that there has been a significant improvement in the 9-classes problem, with more than a 10% increment in classification accuracy. Significant improvement in the 9 genres classification task 25/10/2010 MML 2010

Feature selection Classification using different vocabulary sizes
So it seems obvious that the combination of both information sources led to this improvement. However, we performed an study of the features selected by the feature selection algorithm, in order to test the contribution of each kind of information to these results. Here we can see the evolution of the classification accuracy according to the number of features selected. This number is a parameter that can be adjusted by hand, so we tested all the possible values from 1 to the maximum number of features present in the dataset. We can see that there is a point, around 50 features, where the precision gets more or less stable, and there seems to be a small benefit in using more features. 25/10/2010 MML 2010

Feature selection Near-optimal results with 50 features
Harmonic and instrumental features have the same relevance in the 50-features subset Features from both sets appear in the same proportion in this subset So we looked inside this subset of 50 features and we found that both instrumental and harmonic information had the same relevance, in the sense that both of them were present in the same proportion, and neither of them prevailed over the other. So we could say that both of them provide complementary information. 25/10/2010 MML 2010

Conclusions and future work So, to conclude… 25/10/2010 MML 2010

Conclusions A combination of different kinds of information has increased classification accuracy over a limit we couldn’t improve using single information sources Both kinds of information have equally collaborated in this improvement We can expect better results adding additional sources of information I have shown a multimodal classification task for musical information, and that classification results have improved over previous works by using a combination of harmonic and instrumental features. And also that both kinds of information have contributed equally to this improvement, so we could expect that adding additional sources of information would lead to even better results. 25/10/2010 MML 2010

Future work Obtain additional features from different sources of information Explore the relationships between feature subsets Use different combination schemes And this is precisely one of the things we are considering for future works. We are also working on finding and also take profit of the hidden relationships there may be between the different feature sets, so we can for example desing more effective feature selection methods. And finally, we’d like to also test different classification schemes for combining the features or the classifier decisions. 25/10/2010 MML 2010

Harmonic and Instrumental Information Fusion for Musical Genre Classification
Tomás Pérez-García, Carlos Pérez-Sancho, José M. Iñesta Pattern Recognition and Artificial Intelligence Group Department of Software and Computing Systems University of Alicante

Tomás Pérez-García, Carlos Pérez-Sancho, José M. Iñesta

Similar presentations

Presentation on theme: "Tomás Pérez-García, Carlos Pérez-Sancho, José M. Iñesta"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Tomás Pérez-García, Carlos Pérez-Sancho, José M. Iñesta

Similar presentations

Presentation on theme: "Tomás Pérez-García, Carlos Pérez-Sancho, José M. Iñesta"— Presentation transcript:

Similar presentations

About project

Feedback