Semantic Similarity for Music Retrieval Luke Barrington, Doug Turnbull, David Torres & Gert Lanckriet Electrical & Computer Engineering University of California,

Slides:

Advertisements

Similar presentations

LEARNING SEMANTICS OF WORDS AND PICTURES TEJASWI DEVARAPALLI.

Advertisements

A probabilistic model for retrospective news event detection

Yansong Feng and Mirella Lapata

Information retrieval – LSI, pLSI and LDA

Franz de Leon, Kirk Martinez Web and Internet Science Group  School of Electronics and Computer Science  University of Southampton {fadl1d09,

Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.

Audio Retrieval David Kauchak cs160 Fall 2009 Thanks to Doug Turnbull for some of the slides.

G. Valenzise *, L. Gerosa, M. Tagliasacchi *, F. Antonacci *, A. Sarti * IEEE Int. Conf. On Advanced Video and Signal-based Surveillance, 2007 * Dipartimento.

Assuming normally distributed data! Naïve Bayes Classifier.

Personalized Search Result Diversification via Structured Learning

Unsupervised Image Clustering using Probabilistic Continuous Models and Information Theoretic Principles Shiri Gordon Electrical Engineering – System,

Distributional Clustering of Words for Text Classification Authors: L.Douglas Baker Andrew Kachites McCallum Presenter: Yihong Ding.

Results Audio Information Retrieval using Semantic Similarity Luke Barrington, Antoni Chan, Douglas Turnbull & Gert Lanckriet Electrical & Computer Engineering.

Lecture 5: Learning models using EM

Expectation Maximization Method Effective Image Retrieval Based on Hidden Concept Discovery in Image Database By Sanket Korgaonkar Masters Computer Science.

People use words to describe music

Language and Speaker Identification using Gaussian Mixture Model Prepare by Jacky Chau The Chinese University of Hong Kong 18th September, 2002.

Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.

A Supervised Approach for Detecting Boundaries in Music using Difference Features and Boosting Douglas Turnbull Computer Audition Lab UC San Diego, USA.

Identifying Words that are Musically Meaningful David Torres, Douglas Turnbull, Luke Barrington, Gert Lanckriet Computer Audition Lab UC San Diego ISMIR.

Formulating Semantic Image Annotation as a Supervised Learning Problem Gustavo Carneiro and Nuno Vasconcelos CVPR ‘05 Presentation by: Douglas Turnbull.

Region Based Image Annotation Through Multiple-Instance Learning By: Changbo Yang Wayne State University Department of Computer Science.

Towards Musical Query-by-Semantic-Description using the CAL500 Dataset Douglas Turnbull Computer Audition Lab UC San Diego Work with Luke Barrington, David.

Advisor: Prof. Tony Jebara

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING MARCH 2010 Lan-Ying Yeh

POTENTIAL RELATIONSHIP DISCOVERY IN TAG-AWARE MUSIC STYLE CLUSTERING AND ARTIST SOCIAL NETWORKS Music style analysis such as music classification and clustering.

Approximating The Kullback- Leibler Divergence Between Gaussian Mixture Models ICASSP 2007 John R. Hershey and Peder A. Olsen IBM T. J. Watson Research.

Modeling Music with Words a multi-class naïve Bayes approach Douglas Turnbull Luke Barrington Gert Lanckriet Computer Audition Laboratory UC San Diego.

Introduction to Machine Learning for Information Retrieval Xiaolong Wang.

EM and expected complete log-likelihood Mixture of Experts

Text Classification, Active/Interactive learning.

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.

Lecture note for Stat 231: Pattern Recognition and Machine Learning 4. Maximum Likelihood Prof. A.L. Yuille Stat 231. Fall 2004.

SVCL 1 Content based Image Retrieval (at SVCL) Nikhil Rasiwasia, Nuno Vasconcelos Statistical Visual Computing Laboratory University of California, San.

MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.

G AUSSIAN M IXTURE M ODELS David Sears Music Information Retrieval October 8, 2009.

TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.

LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.

Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.

A Game-Based Approach for Collecting Semantic Music Annotations Douglas Turnbull, Rouran Liu, Luke Barrington, Gert Lanckriet Computer Audition Lab UC.

Music Information Retrieval Information Universe Seongmin Lim Dept. of Industrial Engineering Seoul National University.

Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.

Combining Audio Content and Social Context for Semantic Music Discovery José Carlos Delgado Ramos Universidad Católica San Pablo.

Object Recognition a Machine Translation Learning a Lexicon for a Fixed Image Vocabulary Miriam Miklofsky.

Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang

A DISTRIBUTION BASED VIDEO REPRESENTATION FOR HUMAN ACTION RECOGNITION Yan Song, Sheng Tang, Yan-Tao Zheng, Tat-Seng Chua, Yongdong Zhang, Shouxun Lin.

Singer similarity / identification Francois Thibault MUMT 614B McGill University.

Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.

Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.

Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.

Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.

Guest lecture: Feature Selection Alan Qi Dec 2, 2004.

Latent Dirichlet Allocation

Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -

Lecture 3: MLE, Bayes Learning, and Maximum Entropy

Statistical Models for Automatic Speech Recognition Lukáš Burget.

1.Learn appearance based models for concepts 2.Compute posterior probabilities or Semantic Multinomial (SMN) under appearance models. -But, suffers from.

The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.

Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science.

Large-Scale Content-Based Audio Retrieval from Text Queries

Reading Notes Wang Ning Lab of Database and Information Systems

Classification of unlabeled data:

Statistical Models for Automatic Speech Recognition

Statistical Models for Automatic Speech Recognition

Lecture 11: Mixture of Gaussians

Michal Rosen-Zvi University of California, Irvine

Content Based Image Retrieval

Learning to Rank with Ties

Combination of Feature and Channel Compensation (1/2)

Presentation transcript:

Semantic Similarity for Music Retrieval Luke Barrington, Doug Turnbull, David Torres & Gert Lanckriet Electrical & Computer Engineering University of California, San Diego References Carneiro & Vasconcelos (2005). Formulating semantic image annotation as a supervised learning problem. IEEE CVPR. Rasiwasia, Vasconcelos & Moreno (2006). Query by Semantic Example. ACM ICIVR. Barrington, Chan, Turnbull & Lanckriet (2007). Audio Information Retrieval using Semantic Similarity. IEEE ICASSP Turnbull, Barrington, Torres & Lanckriet (2007). Towards Musical Query-by-Semantic-Description using the CAL500 Data Set. ACM SIGIR Our models are trained on the CAL500 dataset, a heterogeneous data set of song / caption pairs: 500 popular western songs, 146-word vocabulary Each track has been annotated by at least 3 humans Audio content is represented as a bag of feature vectors: MFCC features plus 1 st and 2 nd time deltas 10,000 feature vectors per minute of audio Annotations are represented as a bag of words: Binary document vector of length 146 Audio & Text Features We represent every song as a semantic distribution: a point in a semantic space. A natural similarity measure in this space is the Kullback-Leibler (KL) divergence; Given a query song, we retrieve the database songs that minimize the KL divergence with the query. Semantic Similarity Using learned word-level GMMs P(a|w i ), compute the posterior probability of word w i, given song Assume x m and x n are conditionally independent, given w i : Estimate the song prior, by summing over all words: Normalizing posteriors of all words, we represent songs as semantic multinomial distributions over the vocabulary: Sounds → Semantics Semantic understanding of audio signals enables retrieval of songs that, while acoustically different, are semantically similar to a query song. Given a query with a high-pitched, wailing electric guitar solo, a system based on acoustics might retrieve songs with screechy violins or a screaming female singer. Our system retrieves songs with semantically similar content: acoustic, classical or distorted electric guitars. It’s All Semantics… Semantic Models Each word, w, is represented as a probability distribution, P(a|w), over the same audio feature space. The training data for word-level GMM is the set of all song-level GMMs from songs labeled with word w. Song-level GMMs are combined to train word-level GMMs using the mixture-hierarchies EM algorithm. The semantic model - a set of word-level GMMs - is used as the basis for song similarity. “Romantic” song-level GMMs “Romantic” word-level GMM Mixture- Hierarchies EM p(a|s 5 ) p(a|s 1 ) p(a|s 6 ) p(a|s 3 )p(a|s 4 ) p(a|s 2 ) p(a|“romantic”) Each song, s, is represented as a probability distribution, P(a|s), over the audio feature space, approximated as a Gaussian mixture model (GMM): A bag-of-features extracted from the song’s audio content and the expectation maximization (EM) algorithm are used to train song-level GMMs. Acoustic Models bag-of-features EM Semantic Multinomials Query Similar Not Similar