Musical Style Classification

Musical Style Classification
Michael Broxton MAS.622J

The Problem How do we teach a machine to classify the genre or style of a piece of music? Useful for: Automatic playlist generation Metadata repair/retrieval Music Recommendation Engines Style examples: Dub, College Rock, IDM, Trip-Hop Related to audio fingerprinting and other clustering techniques.

Some Inherent Difficulties...
Musical Styles and genres are subjective, cultural labels. Example: Techno vs. Electronica Some styles are large and general while others are very small and specific. This may lead to a bias towards larger styles. The feature space is reasonably sized, but the number of data points in it may be very large. The audio spectrum of a song will cluster, but this is not necessarily true for all songs in an album, let alone in an entire style of music.

Data Set Class Dataset: Music Database (provided by Brian Whitman)
330 albums labeled with 141 unique styles Mel-Frequency Cepstral Coefficient (MFCC) Psycho-acoustically weighted DCT 13 dimensions, sampled at 5 Hz “Penny” (FFT of the MFCC) 26 dimensions, sampled at 1 Hz All told, this is ~350 MB of feature data!

Preprocessing Remove first and last minute of each song
Reduce the number of data points to about 200 per album Split data into three sets 1/3 of the albums are set aside in album_test 1/3 of the tracks in the remaining albums are set aside in song_test All remaining data is placed into train Normalize each feature

Classification Schemes
k - Nearest Neighbors non-parametric, simple to implement leaves most of the processing to the classification stage 3-layer Neural Network non-linear most of the processing occurs during the learning phase Both techniques easily handle multiple labels per data point

Results: MSE for k-NN 0.0263 0.0274 0.0273 0.0285 MFCC Penny song_test
album_test 0.0263 0.0274 0.0273 0.0285 MFCC Penny (k = 10)

Results: Neural Net 0.0228 0.0253 0.0235 0.0295 MFCC Penny song_test
album_test 0.0228 0.0253 0.0235 0.0295 MFCC Penny (100 hidden nodes)

(Using Penny features, song_test data set)
Does PCA Help? Dimensionality reduction (retaining 99% of the variance): MSE of the Classifier MFCC: 12/13 features Penny: 12/26 features Neural Net k-NN 0.0219 0.0265 0.0235 0.0273 PCA No PCA (Using Penny features, song_test data set)

Compared to a human, how does it do?
Ben Harper - Diamonds on the Inside k-NN Labels: Alternative Pop-Rock ( ) Indie Rock ( ) Electronic ( ) Trip-Hop ( ) Underground Rap ( ) Club-Dance ( ) Actual Labels: Jam Bands Adult Alternative Pop-Rock Singer-Songwriter Alternative Pop-Rock Neural-Net Labels: Alternative Pop-Rock ( ) Indie Rock ( ) (Weights are scaled from 0.0 to 1.0)

(Weights are scaled from 0.0 to 1.0)
Pretty bad. Rammstein - Sehnsucht Actual Labels: Heavy Metal Industrial Metal Alternative Metal Progressive Metal k-NN Labels: Indie Rock ( ) Alternative Pop-Rock ( ) IDM ( ) Electronic ( ) Post-Rock-Experimental ( ) Indie Pop ( ) Neural-Net Labels: Alternative Pop-Rock ( ) Indie Rock ( ) (Weights are scaled from 0.0 to 1.0)

Percentage of total classifications/labels
The Bias Problem Percentage of total classifications/labels

Conclusions k-NN and Neural-Network performed comparably on the classification task PCA led to some improvement with Penny features, mostly in algorithm speed. The classifiers achieved low MSE primarily by favoring the larger styles.

Future Work (i.e. if there had only been more time...)
Filter out/normalize dominant styles Gaussian Mixture Model Combine MFCC and Penny into one feature vector

Musical Style Classification

Similar presentations

Presentation on theme: "Musical Style Classification"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Musical Style Classification

Similar presentations

Presentation on theme: "Musical Style Classification"— Presentation transcript:

Similar presentations

About project

Feedback