Song Genre and Artist Classification via Supervised Learning from Lyrics Adam Sadovsky Xing Chen CS 224N Final Project
Introduction Goal: develop a classifier that classifies songs by their genre and/or artist using only their lyrics Use EvilLyrics [1] program to build corpus of approximately fifteen popular albums per genre (rock, rap/hip-hop, and country) Use Maxent and SVM classifiers with cross validation to classify lyrics For Part-of-Speech (POS) features, we use the Stanford Log-linear Part-Of-Speech Tagger [2] to label all words in our corpus with a POS
Feature Selection Country Brooks & Dunn - Again ain't it funny, the turns life puts you through. don't know what's round the bend, man, you don't know where it's lead in' you. close your eyes, say a prayer, take it on the chin. it ' s a dawn sun, comes back again. baby, i thought that love was over and gone forever... never gonna come back to me. never gonna hold me again. Pearl Jam - Come Back If I keep holding out Will the light shine through? Under this broken roof It's only rain that I feel I've been wishin' out the days Oh oh oh Come back … Know that I still remain true I've been wishin' out the days Please say that if you hadn't have gone now I wouldn't have lost you another way From wherever you are Oh oh oh oh Come back Rock Bag-of-Words: artist diction and content Word Endings: artist style Line Length: song pattern and rhythm Number of Lines: song length Repetition: style and rhythm Punctuation: writing style Part-of-Speech statistics: writing style Look at differences between… {…PRP VBP VBG …}
Genre Classification Attempt to distinguish between rap, rock and country ClassifierAccuracy (%) Maxent76.45 SVM81.21 SVM Confusion Matrix {country, rap, rock} a b c <-- classified as | a = country | b = rap | c = rock Feature Performance Best Alone (Maxent/SVM) Bag-of-words (75% / 72%) Word endings (73% / 72%) POS tags (61% / 61%) Most Significant Ablations Bag-of-Words (3% / 4%) Word endings (3% / 3%) Performance
Classifying Artists Classifier might perform better when each group of lyrics is by the same artist Two new datasets: {beatles, u2, blink_182} (all rock) and {snoop_dogg, beatles, garth_brooks} (rap, rock, country) Results: DatasetClassifierAccuracy {beatles, u2, blink}Maxent {beatles, u2, blink}SVM {snoop, beatles, garth}Maxent {snoop, beatles, garth}SVM
References [1] [2]