Tomás Pérez-García, Carlos Pérez-Sancho, José M. Iñesta

Slides:

Advertisements

Similar presentations

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Advertisements

Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Outline Introduction Music Information Retrieval Classification Process Steps Pitch Histograms Multiple Pitch Detection Algorithm Musical Genre Classification.

Active Learning and Collaborative Filtering

Dept. of Computer Science & Engineering, CUHK Pseudo Relevance Feedback with Biased Support Vector Machine in Multimedia Retrieval Steven C.H. Hoi 14-Oct,

Experimental Evaluation

On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

1 CSI5388 Data Sets: Running Proper Comparative Studies with Large Data Repositories [Based on Salzberg, S.L., 1997 “On Comparing Classifiers: Pitfalls.

Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.

1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute.

Performance of Recommender Algorithms on Top-N Recommendation Tasks RecSys 2010 Intelligent Database Systems Lab. School of Computer Science & Engineering.

JSymbolic Cedar Wingate MUMT 621 Professor Ichiro Fujinaga 22 October 2009.

Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.

Artificial Intelligence Techniques Multilayer Perceptrons.

Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.

1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.

Today Ensemble Methods. Recap of the course. Classifier Fusion

ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.

Visual Information Systems Recognition and Classification.

Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.

Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.

1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.

Feature selection with Neural Networks Dmitrij Lagutin, T Variable Selection for Regression

Issues in Automatic Musical Genre Classification Cory McKay.

Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -

Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.

Classification of melody by composer using hidden Markov models Greg Eustace MUMT 614: Music Information Acquisition, Preservation, and Retrieval.

BASS TRACK SELECTION IN MIDI FILES AND MULTIMODAL IMPLICATIONS TO MELODY gPRAI Pattern Recognition and Artificial Intelligence Group Computer Music Laboratory.

Metamidi: a tool for automatic metadata extraction from MIDI files Tomás Pérez-García, Jose M. Iñesta, and David Rizo Computer Music Laboratory University.

Stochastic Text Models for Music Categorization Carlos Pérez-Sancho, José M. Iñesta, David Rizo Pattern Recognition and Artificial Intelligence group Department.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.

A shallow description framework for musical style recognition Pedro J. Ponce de León, Carlos Pérez-Sancho and José Manuel Iñesta Departamento de Lenguajes.

Genre Classification of Music by Tonal Harmony Carlos Pérez-Sancho, David Rizo Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante,

Automatic Classification of Audio Data by Carlos H. L. Costa, Jaime D. Valle, Ro L. Koerich IEEE International Conference on Systems, Man, and Cybernetics.

Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.

1 Tempo Induction and Beat Tracking for Audio Signals MUMT 611, February 2005 Assignment 3 Paul Kolesnik.

Neural Network Architecture Session 2

Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance Hello everyone,

LECTURE 09: BAYESIAN ESTIMATION (Cont.)

An Artificial Intelligence Approach to Precision Oncology

Objectives of the Course and Preliminaries

Can Computer Algorithms Guess Your Age and Gender?

کاربرد نگاشت با حفظ تنکی در شناسایی چهره

Big-Data Fundamentals

Hybrid Features based Gender Classification

Chapter 8: Inference for Proportions

Data Mining Lecture 11.

Multimedia Information Retrieval

Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.

LEARNING VECTOR QUANTIZATION Presentation By : Mihajlo Grbovic

Hidden Markov Models Part 2: Algorithms

Objective of This Course

Presented by Steven Lewis

of the Artificial Neural Networks.

Discriminative Frequent Pattern Analysis for Effective Classification

Chi Square (2) Dr. Richard Jackson

CONTEXT DEPENDENT CLASSIFICATION

SDMX Information Model: An Introduction

Beyond Description: Metadata for Catalogers in the 21st Century

EE 492 ENGINEERING PROJECT

Learning From Observed Data

Psych 231: Research Methods in Psychology

Evolutionary Ensembles with Negative Correlation Learning

ECE 352 Digital System Fundamentals

Evaluating Classifiers for Disease Gene Discovery

Lecture 16. Classification (II): Practical Considerations

Random Neural Network Texture Model

Machine Learning: Lecture 5

Presentation transcript:

Harmonic and Instrumental Information Fusion for Musical Genre Classification Tomás Pérez-García, Carlos Pérez-Sancho, José M. Iñesta Pattern Recognition and Artificial Intelligence Group Department of Software and Computing Systems University of Alicante I’m going to present a work on symbolic musical genre classification using a multimodal approach, by combining together harmonic and instrumental information.

Outline Introduction Methodology Previous work Experiments Conclusions and future work First, I will introduce our classification problem and I’ll make a brief introduction into multimodal classification. Then, I will explain the methodology used in this work, along with the experimental results, compared to those obtained in previous works for the same problem but using single information sources. Finally, I will show the conclusions and some future lines of work. 25/10/2010 MML 2010

Outline Introduction Methodology Previous work Experiments Conclusions and future work So let’s begin with it. 25/10/2010 MML 2010

The Task Musical genre classification using a multimodal approach and symbolic features Useful for extracting metadata from new songs for… automatic indexing of musical databases modeling of user’s musical taste other MIR related tasks… In this work we are facing the problem of musical genre classification. This is a problem that has been thoroughly studied, and it has been shown that it is a useful method for generating new metadata for new or unknown songs in a musical database, for example to perform the automatic indexing of the database, or maybe for modeling the user’s musical taste, among other applications. In most papers in the literature, this has been done by using a single source of information, which can be an audio signal, or a symbolic digital score. However, in this work we have studied the benefits of using two different sources of information, both of them symbolic, that offer complementary information about the song being classified. In particular, we have used the instruments present in the song and the harmonic structure of the musical score. Since neither of them can be considered to be directly related (or derived) from the other, this can be seen as a multimodal classification task. 25/10/2010 MML 2010

Multimodal classification Data description is obtained from different sources (e.g. audio and image descriptors from a video file) FEATURE VECTOR MULTIMODAL DATA FEATURE EXTRACTION So, what is multimodality? Well, when we say some data are multimodal, what we are saying is that we have several different kinds of information related to one subject, and neither of them are directly derived from the other. For example, it can be the case of multimedia data, such as video files, where the audio and the images can be processed separately, and both of them reflect a different perspective of what is happening inside the video. Of course, from the classification point of view this is desirable because the more and more variate information we have, the more precise the classification can be performed. But there arises a problem, because we need to find a suitable method for combining these different kinds of information. FEATURE EXTRACTION FEATURE EXTRACTION FEATURE VECTOR FEATURE VECTOR 25/10/2010 MML 2010

Multimodal classification Need for a strategy to combine the different sources Early scheme: features are combined and fed into a single classifier Late scheme: each feature set is classified independently and the individual decisions are combined There are two basic standard approaches: The simplest one is to combine the feature sets obtained from each source, so the combined feature set can be fed as the input to a single classifier, as we can see in the following diagram. 25/10/2010 MML 2010

+ ? Early scheme GENRE FEATURE VECTOR FEATURE VECTOR COMBINED FEATURE VECTOR FEATURE VECTOR ? CLASSIFIER As you can see, this is a quite simple scheme, very easy to implement, but has the drawback that it can only be used when all the features are of the same nature, so they can be all accepted by the same classifier. METADATA GENRE 25/10/2010 MML 2010

Multimodal classification Need for a strategy to combine the different sources Early scheme: features are combined and fed into a single classifier Late scheme: each feature set is classified independently and the individual decisions are combined And the second alternative, which is useful when the different feature sets are of different nature and cannot be combined, for example, is to classify each data source separately. 25/10/2010 MML 2010

+ Late scheme ? ? ? GENRE GENRE GENRE GENRE METADATA FEATURE VECTOR CLASSIFIER GENRE FEATURE VECTOR ? CLASSIFIER GENRE FEATURE VECTOR ? CLASSIFIER GENRE This way we obtain an individual decision by a different classifier for each feature set, so we need to combine the decisions of the single classifiers by using some classifier combination technique, as for example the majority vote or some other more sophisticated technique. 25/10/2010 MML 2010

Outline Introduction Methodology Previous work Experiments Conclusions and future work OK, so now we have a better understanding of the classification scheme I’m going to show the data and the classification method we used in this work. 25/10/2010 MML 2010

Musical data 9GDB (Pérez-Sancho, Rizo, and Iñesta, 2008) Hierarchical structure 3 genres 9 subgenres 856 musical pieces available in MIDI and Band-in-a-Box formats Academic 235 Jazz 338 Popular 283 Baroque 56 Pre-bop 178 Blues 84 Classical 50 Bop 94 Pop 100 Romanticism 129 Bossanova 66 Celtic 99 The musical data we have used is a dataset we developed and presented in a previous work, which we call the “9 genres database”. It is a database of 856 musical pieces with a hierarchical structure, comprising 3 broad genres: academic, jazz, and popular music; and 9 subgenres, which are shown in this table. All the music files are available in two formats: as MIDI files, and also in the format of the software Band-in-a-box. 25/10/2010 MML 2010

Musical data Two different sources of information extracted from the data Instrumentation, metadata extracted from the MIDI files using the metamidi* tool (Pérez-García, Iñesta, and Rizo, 2009) Harmonic structure, chord sequences extracted from the Band-in-a-Box files (Pérez-Sancho, Rizo, and Iñesta, 2008) So we have extracted two different kinds of information, both unrelated to the other, so the problem stays multimodal. The first one is the instrumentation, i.e. the metadata regarding the instruments present in the piece. This has been extracted from the MIDI files using a tool developed by our group which is freely available. And the second one is the harmonic structure, encoded as chord sequences extracted from the Band-in-a-box files. * http://grfia.dlsi.ua.es (Resources section) 25/10/2010 MML 2010

Classification method Naïve Bayes classifier using a binomial distribution Songs are encoded as feature vectors, where each position represents the presence/absence of a feature in the song: Feature selection using Average Mutual Information Early scheme for multimodality 1 … In order to perform classification we have selected the naïve Bayes classifier, using a binomial distribution. This is a simple method that basically considers the presence or absence of each individual feature, and computes the probability for each class taking into account which features are present in each song in the dataset. This way, the songs are encoded as vectors of 1’s and 0’s, reflecting which instruments or chords, depending on the feature set, are present in the song. Note that this simple encoding allows to easily combine both feature sets, so we opted for an early scheme in order to perform the multimodal classification. Also, along with the classifier, we have used a feature selection technique called Average Mutual Information, in order to reduce the dimensionality of the feature vectors. 25/10/2010 MML 2010

Outline Introduction Methodology Previous work Experiments Conclusions and future work The methodology used in this work has been also used in two previous papers, using the same classification method but using the two information sources, instruments and chords, independently. So, before proceeding to the multimodal experiments, I will show a summary of the results obtained in those works in order to assess the benefit obtained when combining both sources. 25/10/2010 MML 2010

Harmonic classification Songs encoded as harmonic sequences using standard musical notation (Pérez, Rizo, and Iñesta, 2008) C G7 Am F G7 … Songs encoded as vectors H = 312 different chords (12 chord roots x 26 extensions) In the first experiments we used harmonic information alone, encoding the songs as chord sequences using the standard musical notation. This way, each song is encoded as a vector of 312 positions, which is the total chord vocabulary we considered, using 26 different extensions for each chord root. 25/10/2010 MML 2010

Harmonic classification Classification results (10-fold cross-validation) Most errors in 9 genres classification between close subgenres (e.g. classical vs. baroque music) % success 3 genres 83 ± 3 9 genres 64 ± 4 These are the result obtained in this experiment. As expected, the results were better when classifying between the 3 broad genres only, than when performing classification between all the 9 subgenres. This is mainly due to the big number of errors committed between close subgenres inside a broad genre, as for example between classical and baroque music. 25/10/2010 MML 2010

Instrument-based classification Genre classification using instrumental information (Pérez-García, Iñesta, and Rizo, 2009) Songs encoded as vectors I = 131 (128 instruments + 3 percussion sets) (General MIDI standard) In the second work we used the same dataset, but using instrumental information alone. This time each song was encoded as a vector of 131 positions, referring to the presence or absence of a set of 128 instruments, taken from the General MIDI standard, and 3 percussion sets defined by us, which are the standard drum kit, a latin percussion set, and the last set comprising any other percussion instruments. 25/10/2010 MML 2010

Instrument-based classification Classification results using Naïve Bayes (10-fold cross-validation) Results improved over harmonic features Again, most errors in 9 genres classification between close subgenres Harmonic Instrumental 3 genres 83 ± 3 93 ± 2 9 genres 64 ± 4 68 ± 5 Here we can see a comparison of the results obtained in both experiments. It is remarkable the great precision obtained for the 3-classes experiment using instruments, but again, we found many errors between close subgenres, which lead to a similar result when using 9 classes. 25/10/2010 MML 2010

Outline Introduction Methodology Previous work Experiments Conclusions and future work 25/10/2010 MML 2010

Harmonic and instrumental classification Fusion of harmonic and instrumental information Early scheme: both feature sets are combined as a single feature vector In the experiments performed in this work, we combined both feature sets, instrumental and harmonic. Since in both cases the feature vectors are made of 1’s and 0’s, and also the naïve Bayes classifier assumes independence for each individual feature, this can be done in a straightforward way by simply joining together all the features in a single vector, and training and classification is done in the traditional way. 1 … 131 132 443 … Instrumental information Harmonic information 25/10/2010 MML 2010

Harmonic and instrumental classification Classification results (10-fold cross-validation) Instrumental Harmonic Instrumental + harmonic 3 genres 93 ± 2 83 ± 3 95 ± 2 9 genres 68 ± 5 64 ± 4 79 ± 3 Instrumental Harmonic Instrumental + harmonic 3 genres 93 ± 2 83 ± 3 95 ± 2 9 genres 68 ± 5 64 ± 4 79 ± 3 Here are the results obtained in this experiment. We can see that there was a slight improvement in the results for the 3-classes problem, although it cannot be considered statistically significant. But here you can see that there has been a significant improvement in the 9-classes problem, with more than a 10% increment in classification accuracy. Significant improvement in the 9 genres classification task 25/10/2010 MML 2010

Feature selection Classification using different vocabulary sizes So it seems obvious that the combination of both information sources led to this improvement. However, we performed an study of the features selected by the feature selection algorithm, in order to test the contribution of each kind of information to these results. Here we can see the evolution of the classification accuracy according to the number of features selected. This number is a parameter that can be adjusted by hand, so we tested all the possible values from 1 to the maximum number of features present in the dataset. We can see that there is a point, around 50 features, where the precision gets more or less stable, and there seems to be a small benefit in using more features. 25/10/2010 MML 2010

Feature selection Near-optimal results with 50 features Harmonic and instrumental features have the same relevance in the 50-features subset Features from both sets appear in the same proportion in this subset So we looked inside this subset of 50 features and we found that both instrumental and harmonic information had the same relevance, in the sense that both of them were present in the same proportion, and neither of them prevailed over the other. So we could say that both of them provide complementary information. 25/10/2010 MML 2010

Outline Introduction Methodology Previous work Experiments Conclusions and future work So, to conclude… 25/10/2010 MML 2010

Conclusions A combination of different kinds of information has increased classification accuracy over a limit we couldn’t improve using single information sources Both kinds of information have equally collaborated in this improvement We can expect better results adding additional sources of information I have shown a multimodal classification task for musical information, and that classification results have improved over previous works by using a combination of harmonic and instrumental features. And also that both kinds of information have contributed equally to this improvement, so we could expect that adding additional sources of information would lead to even better results. 25/10/2010 MML 2010

Future work Obtain additional features from different sources of information Explore the relationships between feature subsets Use different combination schemes And this is precisely one of the things we are considering for future works. We are also working on finding and also take profit of the hidden relationships there may be between the different feature sets, so we can for example desing more effective feature selection methods. And finally, we’d like to also test different classification schemes for combining the features or the classifier decisions. 25/10/2010 MML 2010

Harmonic and Instrumental Information Fusion for Musical Genre Classification Tomás Pérez-García, Carlos Pérez-Sancho, José M. Iñesta Pattern Recognition and Artificial Intelligence Group Department of Software and Computing Systems University of Alicante