TEMPLATE DESIGN © 2008 www.PosterPresentations.com Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.

Slides:



Advertisements
Similar presentations
Active Appearance Models
Advertisements

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Biointelligence Laboratory, Seoul National University
Franz de Leon, Kirk Martinez Web and Internet Science Group  School of Electronics and Computer Science  University of Southampton {fadl1d09,
Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”
Lecture Pose Estimation – Gaussian Process Tae-Kyun Kim 1 EE4-62 MLCV.
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Automatic Identification of Bacterial Types using Statistical Image Modeling Sigal Trattner, Dr. Hayit Greenspan, Prof. Shimon Abboud Department of Biomedical.
IJCAI Wei Zhang, 1 Xiangyang Xue, 2 Jianping Fan, 1 Xiaojing Huang, 1 Bin Wu, 1 Mingjie Liu 1 Fudan University, China; 2 UNCC, USA {weizh,
Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.
Relevance Feedback Content-Based Image Retrieval Using Query Distribution Estimation Based on Maximum Entropy Principle Irwin King and Zhong Jin Nov
Hilbert Space Embeddings of Hidden Markov Models Le Song, Byron Boots, Sajid Siddiqi, Geoff Gordon and Alex Smola 1.
Results Audio Information Retrieval using Semantic Similarity Luke Barrington, Antoni Chan, Douglas Turnbull & Gert Lanckriet Electrical & Computer Engineering.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Expectation Maximization Method Effective Image Retrieval Based on Hidden Concept Discovery in Image Database By Sanket Korgaonkar Masters Computer Science.
Abstract We present a model of curvilinear grouping using piecewise linear representations of contours and a conditional random field to capture continuity.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.
Presented by Zeehasham Rasheed
Semantic Similarity for Music Retrieval Luke Barrington, Doug Turnbull, David Torres & Gert Lanckriet Electrical & Computer Engineering University of California,
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Scalable Text Mining with Sparse Generative Models
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
What’s Making That Sound ?
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
Graphical models for part of speech tagging
Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation
Presented by Tienwei Tsai July, 2005
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
Universit at Dortmund, LS VIII
Dan Rosenbaum Nir Muchtar Yoav Yosipovich Faculty member : Prof. Daniel LehmannIndustry Representative : Music Genome.
MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.
Similarity Matrix Processing for Music Structure Analysis Yu Shiu, Hong Jeng C.-C. Jay Kuo ACM Multimedia 2006.
1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
AUDIO TONALITY MODE CLASSIFICATION WITHOUT TONIC ANNOTATIONS Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), China.
Applying Statistical Machine Learning to Retinal Electrophysiology Matt Boardman January, 2006 Faculty of Computer Science.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
Song-level Multi-pitch Tracking by Heavily Constrained Clustering Zhiyao Duan, Jinyu Han and Bryan Pardo EECS Dept., Northwestern Univ. Interactive Audio.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
1 CRANDEM: Conditional Random Fields for ASR Jeremy Morris 11/21/2008.
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR.
Maximum Entropy Model, Bayesian Networks, HMM, Markov Random Fields, (Hidden/Segmental) Conditional Random Fields.
NTU & MSRA Ming-Feng Tsai
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Present by: Fang-Hui Chu Large Margin Gaussian Mixture Modeling for Phonetic Classification and Recognition Fei Sha*, Lawrence K. Saul University of Pennsylvania.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Ch3: Model Building through Regression
Statistical Models for Automatic Speech Recognition
Multimodal Learning with Deep Boltzmann Machines
Statistical Learning Dong Liu Dept. EEIS, USTC.
Statistical Models for Automatic Speech Recognition
PRAKASH CHOCKALINGAM, NALIN PRADEEP, AND STAN BIRCHFIELD
EE513 Audio Signals and Systems
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Measuring the Similarity of Rhythmic Patterns
Presentation transcript:

TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2. Department of Automation, Tsinghua University, Beijing, China. Summary Experiments Semantic Annotation Future Work  Two collective semantic annotation methods of music, modeling not only individual labels, but also label correlations. 50 musically relevant labels are manually selected for music annotation, covering 10 aspects of music perception. Normalized mutual information is employed to measure the correlation between two semantic labels.  Label pairs with strong correlation are selected and modeled. Generative: Gaussian Mixture Model (GMM)-based method Discriminative: Conditional Random Field (CRF)-based method  Experimental results show slight but consistent improvements compared with individual annotation methods. Results:  Per category performance: the performance for each category 1.CRF-based methods outperform GMM-based methods; 2.Collective annotation methods slightly but consistently improve the performance of their individual counterpart, both for GMM-based and CRF-based.  Per song performance: the average performance for a song 1.While the recalls are similar, the precision is improved significantly from the generative models to discriminative models; 2.The collective methods slightly outperform their individual counterparts. Open question:  The performance improvements from individual modeling to collective modeling is not so much. Possible reason: In individual modeling methods, the labels which are “correlated” share many songs in their training set (since each song has multiple labels). This makes the trained models of “correlated” labels are also “correlated”, or in other words, the correlation is implicitly modeled. Motivation  Semantic annotation of music is an important research direction. Semantic labels (text, words) is a more compact and efficient representation than raw audio or low-level features. Potentially facilitates applications, e.g. music retrieval and recommendation.  Disadvantages of previous methods: Vocabulary without structured labels -> annotation without sufficient musical aspects. Model audio-label relations only, without label-label relations. E.g. “hard rock” & “electronic guitar”, “ happy” & “minor key”  Therefore, we divide the semantic vocabulary into categories, and attempt to model label correlations. Semantic Vocabulary Properties: 0 <= NormMI( X ; Y ) <= 1; NormMI( X ; Y ) = 0 when X and Y is statistically independent; NormMI( X ; X ) = 1. 5.Only the label pairs whose NormMI values are larger than a threshold are selected to be modeled. Audio Feature Extraction A bag of beat-level feature vectors are used to represent a song: 1.Each song is divided into beat segments. 2.Each segment contains a number of frames of 20ms length and 10ms overlap. 3. Timbre features (94-d) and rhythm features (8-d) are extracted to compose a 102-d feature vector in each segment. 4.PCA to reduce the dimensionality to 65, reserving 95% energy.  Timbre features: means and standard deviations of 8-order MFCCs, spectral shape features and spectral contrast features  Rhythm features: average tempo, average onset frequency, rhythm regularity, rhythm contrast, rhythm strength, average drum frequency, amplitude and confidence [1] Proposed methods: consider the relations between labels. 1)Collective GMM-based method: approximates the posterior (4) where is the set of selected label pairs; and are labels of a pair; is a trade-off between label posterior and label pair posterior. The Likelihood and are estimated using a 8- kernel GMM from training data. 2)Collective CRF-based method: Conditional Random Field (CRF): an undirected graphical model, nodes: label variables; edges: relations between labels. Multi-label classification CRF model [2]: (5) where : a sample (a song), represented by an input feature vector; : an output label vector; : the normalizing factor. & : features of CRF, predefined real-value functions. & : parameters to be estimated using training data. Note: Different from the GMM-based method, “bag of features” cannot be used here; instead, each song is represented by a 115-d feature vector. 115-d = 65-d (mean of beat-level features) + 50-d (word likelihoods) Data set:  ~5,000 Western popular songs;  Manually annotated with semantic labels from the vocabulary in Table 1, according to the label number limitations;  25% for training, 75% for testing;  49 label pairs are selected to model, whose NormMI > 0.1. Compared Methods: 1.Collective GMM-based method 2.Individual GMM-based method 3.Collective CRF-based method 4.Individual CRF-based method : use the CRF framework in Eq. (5) without considering the “overall potential of edges”. 1.Consists of 50 labels, manually selected from web-parsed musically relevant words 2.10 semantic categories (aspects) 3.A label number limitation in each category for annotation Problem: find some semantic words to describe a song. It can be viewed as a multi-label binary classification problem. Input: a vocabulary consisting of labels (or words) ; a bag of feature vectors of a song. Output: an annotation vector, where is a binary variable of, 1: presence, -1: absence. Solution: Maximum A Posterior (MAP) Previous methods: labels are treated independent.  Individual GMM-based method: (2) where(3) The likelihood can be estimated using GMM from training data. The prior probability can be set to a uniform distribution. single label posterior label pair posterior overall potential of nodes overall potential of edges Table 1 Vocabulary Table 2 Selected pairs Table 3 Table 4 1.Further exploit better methods to model label correlations. 2.Exploit better features, especially the song-level feature vector for CRF-based methods. 3.Try to apply the obtained annotations in various applications, such as music similarity measure, music search and recommendation. single label posterior [1] Lu, L., Liu, D. and Zhang, H.J. ”Automatic mood detection and tracking of music audio signals”, IEEE Trans. on Audio, Speech and Lang. Process., vol. 14, no. 1, pp. 5-18, [2] Ghamrawi, N. and McCallum, A. “Collective multilabel classification,” in Proc. the 14th ACM International Conference on Information and Knowledge Management (CIKM), 2005, pp Normalized Mutual Information (NormMI) is used to measure the correlation of each label pair. (1) : entropy of X, : mutual information between X and Y. References Collective Annotationof Musicfrom MultipleSemantic Categories