Musical Style Classification

Slides:



Advertisements
Similar presentations
CSC321: Introduction to Neural Networks and Machine Learning Lecture 24: Non-linear Support Vector Machines Geoffrey Hinton.
Advertisements

Franz de Leon, Kirk Martinez Web and Internet Science Group  School of Electronics and Computer Science  University of Southampton {fadl1d09,
Salvatore giorgi Ece 8110 machine learning 5/12/2014
LYRIC-BASED ARTIST NETWORK METHODOLOGY Derek Gossi CS 765 Fall 2014.
F 鍾承道 Acoustic Features for Speech Recognition: From Mel-Frequency Cepstrum Coefficients (MFCC) to BottleNeck Features(BNF)
RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.
Berenzweig - Music Recommendation1 Music Recommendation Systems: A Progress Report Adam Berenzweig April 19, 2002.
Classifying Motion Picture Audio Eirik Gustavsen
Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.
1 Music Classification Using SVM Ming-jen Wang Chia-Jiu Wang.
CS Instance Based Learning1 Instance Based Learning.
Improving Musical Genre Classification with RBF Networks Douglas Turnbull Department of Computer Science and Engineering University of California, San.
Postgraduate Department of Electrical Engineering PPGEE UFPR - Federal University of Paraná Luis Gustavo Weigert Machado
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
ENN: Extended Nearest Neighbor Method for Pattern Recognition
Music retrieval Conventional music retrieval systems Exact queries: ”Give me all songs from J.Lo’s latest album” What about ”Give me the music that I like”?
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Jacob Zurasky ECE5526 – Spring 2011
MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.
Hidden Markov Classifiers for Music Genres. Igor Karpov Rice University Comp 540 Term Project Fall 2002.
Music Information Retrieval Information Universe Seongmin Lim Dept. of Industrial Engineering Seoul National University.
School of Engineering and Computer Science Victoria University of Wellington Copyright: Peter Andreae, VUW Image Recognition COMP # 18.
Look who’s talking? Project 3.1 Yannick Thimister Han van Venrooij Bob Verlinden Project DKE Maastricht University.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.
Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.
Musical Genre Categorization Using Support Vector Machines Shu Wang.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Automatic Classification of Audio Data by Carlos H. L. Costa, Jaime D. Valle, Ro L. Koerich IEEE International Conference on Systems, Man, and Cybernetics.
Audio Fingerprinting Wes Hatch MUMT-614 Mar.13, 2003.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Big data classification using neural network
Tenacious Deep Learning
Bayesian Semi-Parametric Multiple Shrinkage
Recognition of bumblebee species by their buzzing sound
Data Transformation: Normalization
Deep Feedforward Networks
School of Computer Science & Engineering
Instance Based Learning
Efficient Image Classification on Vertically Decomposed Data
Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.
Dimensionality reduction
Basic machine learning background with Python scikit-learn
Machine Learning Basics
RECURRENT NEURAL NETWORKS FOR VOICE ACTIVITY DETECTION
Brian Whitman Paris Smaragdis MIT Media Lab
IWPR18: LSTM music classification, WR58
Machine Learning Week 1.
In summary C1={skin} C2={~skin} Given x=[R,G,B], is it skin or ~skin?
Efficient Image Classification on Vertically Decomposed Data
K Nearest Neighbor Classification
CSSE463: Image Recognition Day 20
Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
An Improved Neural Network Algorithm for Classifying the Transmission Line Faults Slavko Vasilic Dr Mladen Kezunovic Texas A&M University.
Word Embedding Word2Vec.
Popular Music Vocal Analysis
Department of Electrical Engineering
Presenter: Simon de Leon Date: March 2, 2006 Course: MUMT611
CS4670: Intro to Computer Vision
John H.L. Hansen & Taufiq Al Babba Hasan
Neural networks (1) Traditional multi-layer perceptrons
Prediction Networks Prediction A simple example (section 3.7.3)
MAS 622J Course Project Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN Hyungil Ahn
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
Lecture 16. Classification (II): Practical Considerations
Support Vector Machines 2
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Goodfellow: Chapter 14 Autoencoders
Presentation transcript:

Musical Style Classification Michael Broxton MAS.622J

The Problem How do we teach a machine to classify the genre or style of a piece of music? Useful for: Automatic playlist generation Metadata repair/retrieval Music Recommendation Engines Style examples: Dub, College Rock, IDM, Trip-Hop Related to audio fingerprinting and other clustering techniques.

Some Inherent Difficulties... Musical Styles and genres are subjective, cultural labels. Example: Techno vs. Electronica Some styles are large and general while others are very small and specific. This may lead to a bias towards larger styles. The feature space is reasonably sized, but the number of data points in it may be very large. The audio spectrum of a song will cluster, but this is not necessarily true for all songs in an album, let alone in an entire style of music.

Data Set Class Dataset: Music Database (provided by Brian Whitman) 330 albums labeled with 141 unique styles Mel-Frequency Cepstral Coefficient (MFCC) Psycho-acoustically weighted DCT 13 dimensions, sampled at 5 Hz “Penny” (FFT of the MFCC) 26 dimensions, sampled at 1 Hz All told, this is ~350 MB of feature data!

Preprocessing Remove first and last minute of each song Reduce the number of data points to about 200 per album Split data into three sets 1/3 of the albums are set aside in album_test 1/3 of the tracks in the remaining albums are set aside in song_test All remaining data is placed into train Normalize each feature

Classification Schemes k - Nearest Neighbors non-parametric, simple to implement leaves most of the processing to the classification stage 3-layer Neural Network non-linear most of the processing occurs during the learning phase Both techniques easily handle multiple labels per data point

Results: MSE for k-NN 0.0263 0.0274 0.0273 0.0285 MFCC Penny song_test album_test 0.0263 0.0274 0.0273 0.0285 MFCC Penny (k = 10)

Results: Neural Net 0.0228 0.0253 0.0235 0.0295 MFCC Penny song_test album_test 0.0228 0.0253 0.0235 0.0295 MFCC Penny (100 hidden nodes)

(Using Penny features, song_test data set) Does PCA Help? Dimensionality reduction (retaining 99% of the variance): MSE of the Classifier MFCC: 12/13 features Penny: 12/26 features Neural Net k-NN 0.0219 0.0265 0.0235 0.0273 PCA No PCA (Using Penny features, song_test data set)

Compared to a human, how does it do? Ben Harper - Diamonds on the Inside k-NN Labels: Alternative Pop-Rock (0.647289) Indie Rock (0.360346) Electronic (0.323645) Trip-Hop (0.223548) Underground Rap (0.256914) Club-Dance (0.216875) Actual Labels: Jam Bands Adult Alternative Pop-Rock Singer-Songwriter Alternative Pop-Rock Neural-Net Labels: Alternative Pop-Rock (0.410141) Indie Rock (0.421306) (Weights are scaled from 0.0 to 1.0)

(Weights are scaled from 0.0 to 1.0) Pretty bad. Rammstein - Sehnsucht Actual Labels: Heavy Metal Industrial Metal Alternative Metal Progressive Metal k-NN Labels: Indie Rock (0.497604) Alternative Pop-Rock (0.425767) IDM (0.303118) Electronic (0.298738) Post-Rock-Experimental (0.261943) Indie Pop (0.250554) Neural-Net Labels: Alternative Pop-Rock (0.407634) Indie Rock (0.412457) (Weights are scaled from 0.0 to 1.0)

Percentage of total classifications/labels The Bias Problem Percentage of total classifications/labels

Conclusions k-NN and Neural-Network performed comparably on the classification task PCA led to some improvement with Penny features, mostly in algorithm speed. The classifiers achieved low MSE primarily by favoring the larger styles.

Future Work (i.e. if there had only been more time...) Filter out/normalize dominant styles Gaussian Mixture Model Combine MFCC and Penny into one feature vector