Download presentation
Presentation is loading. Please wait.
Published byRodger Fletcher Modified over 9 years ago
1
Improving Music Genre Classification Using Collaborative Tagging Data Ling Chen, Phillip Wright *, Wolfgang Nejdl Leibniz University Hannover * Georgia Institute of Technology WSDM 2009
2
Introduction – Music Information Retrieval People need to search music by music content. Music genre A top-level description of content Ex: Jazz, Rock, Country etc Critical for music information retrieval Microsoft required 30 musicologists over one year to manually label a “few hundred thousand songs”.
3
Introduction – Music Genre Classification Challenge: music is an evolving art. Past works trained with low-level features from signals. Timbral texture, rhythmic content, melodic and harmonic content Tags of music tracks provide high-level features. Utilizing tags is trivial? tags may be useful information or noise.
4
Problem Description A set of music tracks X = {x 1, x 2, …, x n } A set of music tracks C = {c 1, c 2, …, c k } Classification: assign the label of x i C(x i ) C Γ(x i ) = audio signal features of x i T (x i ) = a set of tags of x i
5
Graph of Tracks Adjacent nodes are semantically similar tracks, in terms of tags. Goal: using the tag information indirectly due to the data sparsity problem Sim( x i, x j ): cosine & TF-IDF weighting x i and x j are adjacent if Sim( x i, x j ) > the threshold ε
6
Single-layer Classification Assuming the audio content of a track has no direct coupling with its neighbors’ genres:
7
Double-layer Classificaiton Idea: learning from unknown tracks whose genre labels need to be predicted. Relaxation labeling technique is adopted. Δ k = all of the known information Audio content of all tracks and genre labels of known tracks Find the class c i for x i to maximize Pr(c i | Δ k )
8
Framework of Double-layer Classification Naïve Bayes Classifier using audio content information
9
Iterative Process N u (x i ) = the set of unknown neighbors of x i N k (x i ) = the set of known neighbors of x i base classifier
10
Experiment Data Crawl MP3 files from the Last.fm Collect the ground truth genre data from All Music GuideAll Music Guide 2,262 tracks remaining in 6 genres Each track has at most 99 tags and at least 1 tag; 29.9 tags on average.
11
Baseline Performance
12
Performance of Single-layer Classification The similarity threshold ε is set to 0.2
13
Performance of Double-layer Classification
14
Misclassification Analysis The performance is limited when using a smaller set of training data Misclassification usually occurs among Rock, R&B, and Rap. Reason: many cross-class edges between tracks of the three genres Caused by the noise problem of tag data
15
Optimizing strategies Tag discrimination Tag augmentation Content combination
16
Tag Discrimination Idea: assign a higher weight to the tag with a lower class entropy: TF-IDF( t j, x i ) TF-IDF( t j, x i ) / E C ( t j ) The similarity values decrease ε is set to 0.05
17
Performance of Tag Discrimination
18
Tag Augmentation Idea: increase the number of in-class edges For each known track, its original tag vector is augmented by adding tags of its neighbors to its tag vector. Similarity between two tracks after augmentation:
19
Performance of Tag Augmentation α= 0.6, ε= 0.2
20
Content Combination Idea: augment features with other information sources S C ( x i, x j ) = content-based similarity between x i and x j Overall similarity
21
Performance of Content Combination β= 0.6, ε= 0.5
22
Conclusions While most of existing approaches on automatic music genre classification focus on finding better low-level features, here we explore the usage of social tags for this task. Tag information are used to construct a graph of tracks. Two classification methods are introduced and the Double-layer classifier performs better. Several strategies of feature processing are considered to improve the performance.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.