Presentation is loading. Please wait.

Presentation is loading. Please wait.

Postgraduate Department of Electrical Engineering PPGEE UFPR - Federal University of Paraná Luis Gustavo Weigert Machado

Similar presentations


Presentation on theme: "Postgraduate Department of Electrical Engineering PPGEE UFPR - Federal University of Paraná Luis Gustavo Weigert Machado"— Presentation transcript:

1 Postgraduate Department of Electrical Engineering PPGEE UFPR - Federal University of Paraná Luis Gustavo Weigert Machado luis.gustavo.weigert@gmail.com Supervisor: Prof. PhD Alessandro Lameiras Koerich Hierarchical Classifiers Combination for Automatic Musical Information Retrieval

2 Abstract The most aggravating problem in the automatic classification of music is the true rates which is considerably low. We present a hierarchical combination of classifiers for increasing the strength in the musical styles classification employing different features extracted from music. To solve this problem, some classification stages will be built with the aim of taking different features extracted from each music sample. In the first stage, the music samples will be trained with a neural network, and the probabilities results found will be evaluated to create thresholds set by the overall result, and also a list of confusion classes will be defined. Before, the confusion classes and the thresholds will be presented to the second stage to generate binary classifiers for each confusion using other features extracted of the same music. And finally, we will create a third stage to combine the results using the first and second stages. 2

3 MSD Dataset The Million Song Dataset (MSD) – 1 million contemporary popular music tracks with 280GB of data. – Metadata (trackid, artist, date). – Features (pitches, timbre and loudness) extracted using The Echonest API. 3

4 TU-WIEN MSD Benchmarks Same audio samples of MSD linked with the unique IDs. Mostly containing 30 or 60 seconds snippets. Extracted several features, splitting into different datasets. Ground Truth assignments provided by allmusic.com. – Genre Dataset (MAGD) 422,714 labels. – Top Genre Dataset (Top-MAGD) 406,427 labels. – Style Dataset(MASD) 273,936 labels. Data splitted into train (90%, 80%, 66%, 50%) and test sets. Stratified and non stratified datasetes: Artists, album and time filters. Avoiding to have the same characteristic in both the Training and test set. 4

5 TU-WIEN MSD Benchmarks Genre NameNumber of Songs Big Band3,115 Blues Contemporary6,874 Country Traditional11,164 Dance15,114 Electronica10,987 Experimental12,139 Folk International9,849 Gospel6,974 Grunge Emo6,256 Hip Hop Rap16,100 Jazz Classic10,024 Metal Alternative14,009 Metal Death9,851 Metal Heavy10,784 Pop Contemporary13,624 Pop Indie18,138 Pop Latin7,699 Punk9,610 Reggae5,232 RnB Soul6,238 Rock Alternative12,717 Rock College16,575 Rock Contemporary16,530 Rock Hard13,276 Rock Neo Psychedelia11,057 Total273,936 Feature SetExtractorDimDeriv. 1MFCCsMARSAYS52 2ChromaMARSAYS48 3TimbralMARSAYS124 4MFCCsjAudio26156 5 Low-level spectral features (Spectral Centroid, Spectral Rolloff Point, Spectral Flux,Compactness, and Spectral Variability, Root Mean Square, Zero Crossings, and Fraction of Low Energy Windows) jAudio1696 6Method of MomentsjAudio1060 7Area Method of MomentsjAudio20120 8Linear Predictive CodingjAudio20120 9Rhythm Patternsrp extract1440 10Statistical Spectrum Descriptorsrp extract168 11Rhythm Histogramsrp extract60 12Modulation Frequency Variance Descriptorrp extract420 13Temporal Statistical Spectrum Descriptorsrp extract1176 14Temporal Rhythm Histogramsrp extract420 Features extracted from the MSD samples. Style Dataset(MASD) Alexander Schindler, Rudolf Mayer, and Andreas Rauber. FACILITATING COMPREHENSIVE BENCHMARKING EXPERIMENTS ON THE MILLION SONG DATASET. ISMIR 2012 5

6 Datasets Used Assignments : MSD Allmusic Guide Style (273,936 patterns). Partitions: stratified 66% for train and 33% for test. Features: – First Stage: Statistical Spectrum Descriptors (168 features). – Second Stage: Area Method of Moments (20 features). 6

7 Proposal Training – First Stage: Train a MLP NN with the style assignment outputs. Calculate thresholds for each class using the output probabilities. Find the most confused classes using the confusion matrix and also build a list of confused classes. – Second Stage: Train SVM binary classifiers using the list of confused classes with a different dataset. – Third Stage: Train binary classifiers, but now using 2-class MLP NN, with the same configuration of the second stage. Evaluating – First Stage: Get MAX 1 and MAX 2 output probabilities. Compare MAX 1 with the threshold for reject, classify or send to second stage. – Second Stage: Get MAX 3. Search for a binary classifier, and compare with the threshold and MAX 1 for reject, classify or send to third stage. – Third Stage: Get MAX 4 and combine the probabilities with MAX 3. Using the threshold to reject or classify. 7

8 Training the First Stage Classifier: MLP Neural Network with 168 inputs, 100 hidden layer units, and 25 outputs. Features: Statistical Spectrum Descriptors. Partition: 66% of the dataset. 8

9 Training the First Stage 9

10 Training the Second Stage Classifier: 2-class SVM with gridsearch to estimate the cost and  parameters. Features: Area Method of Moments. Partition: 66% of the dataset. 10

11 Training the Second Stage Train each binary classifier in  list of binary classifiers). 11

12 Training the Third Stage Classifier: 2-class MLP NN, and 2-class SVM, the same used in the second stage. Features: Area Method of Moments, same of the second stage. 2-class MLP NN: Train each binary classifier in  The same as the Training method adopted in the second stage. 12

13 Evaluating the First Stage 13

14 Evaluating the Second Stage 14

15 Evaluating the Third Stage 15

16 Results First Stage (%)Second Stage (%) ClassifiedRejectedSent to 2nd StageClassifiedRejectedSent to 3rd Stage ClassTPFPTPFPTPFPTPFPTPFPTPFP Big Band0,0000,3450,0000,3320,0000,4630,0000,1550,0050,0000,3030,000 Blues Contemporary0,1280,5750,0310,8540,0630,8620,0050,2630,0290,0000,6270,000 Country Traditional1,4300,7060,1880,5890,4190,7420,0260,2970,0250,0000,8010,012 Dance0,4812,4760,1590,6550,2291,5060,1300,3250,1540,0000,6990,427 Electronica0,0991,6480,0910,9180,1051,1210,0280,3310,0970,0000,7700,000 Experimental0,0231,4080,0131,3320,0191,6230,0090,6130,0340,0000,9870,000 Folk International0,0111,2170,0120,8790,0011,4810,0000,4540,0560,0000,9720,000 Gospel0,0001,2110,0000,4780,0000,8620,0000,2540,0380,0000,5700,000 Grunge Emo0,0001,2500,0000,4010,0000,6300,0000,3360,0130,0000,2810,000 Hip Hop Rap4,4650,2890,2430,1230,5140,2590,0510,1100,0000,0660,5350,011 Jazz Classic0,5950,5240,3560,5820,5321,0700,1510,3600,0500,0000,9920,049 Metal Alternative2,0751,0740,1960,5650,5290,6830,3970,1770,0160,0000,5480,074 Metal Death0,9641,2670,0170,3040,5490,5090,1040,3140,0020,0000,6310,008 Metal Heavy0,2711,9370,0240,4910,0941,0980,0670,4930,0090,0000,3500,274 Pop Contemporary0,4132,3080,0310,6240,2031,4100,0490,3790,1080,0000,8280,249 Pop Indie0,8381,9360,4591,1240,1952,0510,1290,6660,0550,0000,9460,450 Pop Latin0,0781,1720,0190,6050,0390,8970,0000,2040,0690,0000,6630,000 Punk0,4911,3410,1030,5570,1680,8540,0120,5190,0120,0000,4580,021 Reggae0,0260,9730,0140,4340,0120,4540,0000,1100,0410,0000,3150,000 RnB Soul0,0000,9950,0000,4490,0000,8440,0000,2390,0390,0000,5660,000 Rock Alternative0,0002,2090,0000,9640,0001,4680,0000,5470,0280,0000,8930,000 Rock College0,0792,5010,0041,4880,0251,9490,0090,7500,0340,0001,1820,000 Rock Contemporary1,1521,8210,1430,7920,3941,7300,2780,2620,0740,0000,4571,053 Rock Hard0,1611,7980,0121,1940,1111,5810,0750,6420,0420,0000,9330,000 Rock Neo Psychedelia0,0001,9900,0000,7960,0001,2610,0000,5630,0310,0000,6660,000 Total13,78034,9682,11617,5294,20027,4081,5189,3641,0610,06616,9742,625 The results are presented in percentage relative to the amount test patterns. Classified TP: Samples classified correctly. Classified FP: Samples classified wrong. Rejected TP: Samples rejected and would be classified wrong. Rejected FP: Samples rejected but would be classified right. Second Stage TP: Samples sent to the second stage and would be classified wrong. Second Stage FP: Samples sent to the second stage but would be classified right. Third Stage TP: Samples sent to the third stage and would be classified wrong. Third Stage FP: Samples sent to the third stage but would be classified right. 16


Download ppt "Postgraduate Department of Electrical Engineering PPGEE UFPR - Federal University of Paraná Luis Gustavo Weigert Machado"

Similar presentations


Ads by Google