Download presentation
Presentation is loading. Please wait.
1
Semantic Similarity for Music Retrieval Luke Barrington, Doug Turnbull, David Torres & Gert Lanckriet Electrical & Computer Engineering University of California, San Diego lbarrington@ucsd.edu References Carneiro & Vasconcelos (2005). Formulating semantic image annotation as a supervised learning problem. IEEE CVPR. Rasiwasia, Vasconcelos & Moreno (2006). Query by Semantic Example. ACM ICIVR. Barrington, Chan, Turnbull & Lanckriet (2007). Audio Information Retrieval using Semantic Similarity. IEEE ICASSP Turnbull, Barrington, Torres & Lanckriet (2007). Towards Musical Query-by-Semantic-Description using the CAL500 Data Set. ACM SIGIR http://cosmal.ucsd.edu/cal/ Our models are trained on the CAL500 dataset, a heterogeneous data set of song / caption pairs: 500 popular western songs, 146-word vocabulary Each track has been annotated by at least 3 humans Audio content is represented as a bag of feature vectors: MFCC features plus 1 st and 2 nd time deltas 10,000 feature vectors per minute of audio Annotations are represented as a bag of words: Binary document vector of length 146 Audio & Text Features We represent every song as a semantic distribution: a point in a semantic space. A natural similarity measure in this space is the Kullback-Leibler (KL) divergence; Given a query song, we retrieve the database songs that minimize the KL divergence with the query. Semantic Similarity Using learned word-level GMMs P(a|w i ), compute the posterior probability of word w i, given song Assume x m and x n are conditionally independent, given w i : Estimate the song prior, by summing over all words: Normalizing posteriors of all words, we represent songs as semantic multinomial distributions over the vocabulary: Sounds → Semantics Semantic understanding of audio signals enables retrieval of songs that, while acoustically different, are semantically similar to a query song. Given a query with a high-pitched, wailing electric guitar solo, a system based on acoustics might retrieve songs with screechy violins or a screaming female singer. Our system retrieves songs with semantically similar content: acoustic, classical or distorted electric guitars. It’s All Semantics… Semantic Models Each word, w, is represented as a probability distribution, P(a|w), over the same audio feature space. The training data for word-level GMM is the set of all song-level GMMs from songs labeled with word w. Song-level GMMs are combined to train word-level GMMs using the mixture-hierarchies EM algorithm. The semantic model - a set of word-level GMMs - is used as the basis for song similarity. “Romantic” song-level GMMs “Romantic” word-level GMM Mixture- Hierarchies EM p(a|s 5 ) p(a|s 1 ) p(a|s 6 ) p(a|s 3 )p(a|s 4 ) p(a|s 2 ) p(a|“romantic”) Each song, s, is represented as a probability distribution, P(a|s), over the audio feature space, approximated as a Gaussian mixture model (GMM): A bag-of-features extracted from the song’s audio content and the expectation maximization (EM) algorithm are used to train song-level GMMs. Acoustic Models bag-of-features + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + EM Semantic Multinomials Query Similar Not Similar
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.