A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,

Slides:



Advertisements
Similar presentations
LEARNING SEMANTICS OF WORDS AND PICTURES TEJASWI DEVARAPALLI.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Information retrieval – LSI, pLSI and LDA
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Inference Network Approach to Image Retrieval Don Metzler R. Manmatha Center for Intelligent Information Retrieval University of Massachusetts, Amherst.
Image Retrieval Basics Uichin Lee KAIST KSE Slides based on “Relevance Models for Automatic Image and Video Annotation & Retrieval” by R. Manmatha (UMASS)
Probabilistic Clustering-Projection Model for Discrete Data
Pattern Recognition and Machine Learning
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Automatic Image Annotation and Retrieval using Cross-Media Relevance Models J. Jeon, V. Lavrenko and R. Manmathat Computer Science Department University.
Expectation Maximization Method Effective Image Retrieval Based on Hidden Concept Discovery in Image Database By Sanket Korgaonkar Masters Computer Science.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Latent Dirichlet Allocation a generative model for text
WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES Prasad Gabbur, Kobus Barnard University of Arizona.
Switch to Top-down Top-down or move-to-nearest Partition documents into ‘k’ clusters Two variants “Hard” (0/1) assignment of documents to clusters “soft”
ICME 2004 Tzvetanka I. Ianeva Arjen P. de Vries Thijs Westerveld A Dynamic Probabilistic Multimedia Retrieval Model.
Scalable Text Mining with Sparse Generative Models
A Search Engine for Historical Manuscript Images Toni M. Rath, R. Manmatha and Victor Lavrenko Center for Intelligent Information Retrieval University.
Review of Lecture Two Linear Regression Normal Equation
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Image Annotation and Feature Extraction
Introduction to Machine Learning for Information Retrieval Xiaolong Wang.
Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation
Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
Memory Bounded Inference on Topic Models Paper by R. Gomes, M. Welling, and P. Perona Included in Proceedings of ICML 2008 Presentation by Eric Wang 1/9/2009.
Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.
TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.
CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.
HMM - Part 2 The EM algorithm Continuous density HMM.
Computer Vision Lecture 6. Probabilistic Methods in Segmentation.
ICIP 2004, Singapore, October A Comparison of Continuous vs. Discrete Image Models for Probabilistic Image and Video Retrieval Arjen P. de Vries.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Latent Dirichlet Allocation
Relevance-Based Language Models Victor Lavrenko and W.Bruce Croft Department of Computer Science University of Massachusetts, Amherst, MA SIGIR 2001.
Discovering Objects and their Location in Images Josef Sivic 1, Bryan C. Russell 2, Alexei A. Efros 3, Andrew Zisserman 1 and William T. Freeman 2 Goal:
NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.
CS246 Latent Dirichlet Analysis. LSI  LSI uses SVD to find the best rank-K approximation  The result is difficult to interpret especially with negative.
Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic Framework N 工科所 錢雅馨 2011/01/16 Li-Jia Li, Richard.
Visual Tracking by Cluster Analysis Arthur Pece Department of Computer Science University of Copenhagen
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
The topic discovery models
Classification of unlabeled data:
Maximum Likelihood Estimation
The topic discovery models
Matching Words with Pictures
The topic discovery models
Michal Rosen-Zvi University of California, Irvine
Junghoo “John” Cho UCLA
Topic Models in Text Processing
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Presentation transcript:

A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department, University of Massachusetts Amherst Neural Information Processing Systems Conference poster

Abstract Automatically annotate an image with keywords and to retrieve images based on text queries. We assume that every image is divided into regions, each described by a continuous-valued feature vector. Given a training set of images with annotations, we compute a joint probabilistic model of image features and words Experiments show that our model significantly outperforms the best of the previously reported results on the tasks of automatic image annotation and retrieval.

Introduction We propose a model which looks at the probability of associating words with image regions. The surrounding context often simplifies the interpretation of regions as a specific objects. –the association of a region with the word tiger is increased by the fact that there is a grass region and a water region in the same image and should be decreased if instead there is a region corresponding to the interior of an aircraft.

Introduction Thus the association of different regions provides context while the association of words with image regions provides meaning Continuous-space Relevance Model (CRM) –a statistical generative model related to relevance models in information retrieval –directly associates continuous features with words and does not require an intermediate clustering stage.

Related Work Mori et al. (1999) –Co-occurrence Model –regular grid Duygulu et al. (2002) –Translation Model –Segmentation, blobs Jeon et al. (2003) –Cross-media relevance model (CMRM) –analogous to the cross-lingual retrieval problem

Related Work Blei and Jordan (2003) –Correlation Latent Dirichlet Allocation (LDA) Model –This model assumes that a Dirichlet distribution can be used to generate a mixture of latent factors. This mixture of latent factors is then used to generate words and regions.

Related Work Differences between CRM and CMRM –CMRM is a discrete model and cannot take advantage of continuous features –CMRM relies on clustering of the feature vectors into blobs. Annotation quality of the CMRM is very sensitive to clustering errors, and depends on being able to a-priori select the right cluster granularity

A Model of Annotated Images Learning a joint probability distribution P(w,r) over the regions r of some image and the words w in its annotation. Image Annotation –compute a conditional likelihood P(w|r) which can then be used to guess the most likely annotation w for the image Image Retrieval –compute the query likelihood P(w qry |r J ) for every image J in the dataset. We can then rank images in the collection according to their likelihood of having the query as annotation

Representation of Images and Annotations Let C denote the finite set of all possible pixel colors –c 0 : transparent color, which will be handy when we have to layer image regions. Assume that all images are of a fixed size W*H Represent any image as an element of a finite set R=C^W*H –each image contains several distinct regions {r 1 …r n } –Each region is itself an element of R and contains the pixels of some prominent object in the image –all pixels around the object are set to be transparent

Representation of Images and Annotations

function g which maps image regions r  R to real-valued vectors The value g(r) represents a set of features, or characteristics of an image region. an annotation for a given image is a set of words {w 1 …w m } drawn from some finite vocabulary V modeling a joint probability for observing a set of image regions {r 1 …r n } together with the set of annotation words {w 1 …w m }

A Model for Generating Annotated Images Generated image J is based on three distinct probability distributions –Words in w J are an i.i.d. random sample from some underlying multinomial distribution Pv( ‧ |J) –regions r J are produced from a corresponding set of generator vectors g 1 …g n according to a process P R (r i |g i ) which is independent of J –The generator vectors g 1 …g n are themselves an i.i.d. random sample from some underlying multi-variate density function P g ( ‧ |J)

A Model for Generating Annotated Images Let r A ={r 1 …r nA } denote the regions of some image A, which is not in the training set T Let w B ={w 1 …w nB } be some arbitrary sequence of words P(r A,w A ), the joint probability of observing an image denoted by r A together with annotation words w B

A Model for Generating Annotated Images The overall process for jointly generating w B and r A is as follows: –Pick a training image J  T with some probability P T (J) –For b=1…n B Pick the annotation word w b from the multinomisl distribution P V (|J) –For a=1…n A Sample a generator vector g a from the probability density P g ( |J) Pick the image region r a according to the probability P R (r a |g a )

A Model for Generating Annotated Images The probability of a joint observation {r A,w B } is given by:

Estimating parameters on the model P T (J)=1/N T, where N T is the size of training set –where N g is the number of all regions r’ in R such that g(r’)= g

Estimating parameters on the model –r J ={r 1 …r n } be the set of regions of image J –placing a Gaussian kernel over the feature vector g(r i ) of every region of image J –Each kernel is parameterized by the feature covariance matrix Σ. we assumed Σ=βI, where I is the identity matrix. β plays the role of kernel bandwidth: it determines the smoothness of Pg around the support point g(r i ).

Estimating parameters on the model Let be the simplex of all multinomial distributions over V Assume a Dirichlet prior over that has parameters {μp v : v  V} μ is a constant, p v is the relative frequency of observing the word v in the training set Introducing the observation w J results in a Dirichlet posterior over with parameters {μp v +N v,J : v  V} N v,J is the number of times v occurs in the observation w J

Experimental Results Dataset: provided by Duygulu et al. – research/data/eccv 2002 –5,000 images from 50 Corel Stock Photo cds –Each image contains an annotation of 1-5 keywords. Overall there are 371 words. –every image in the dataset is pre-segmented into regions using general-purpose algorithms, such as normalized cuts pre-computed feature vector for every segmented region feature set consists of 36 features: 18 color features, 12 texture features and 6 shape features.

Experimental Results –divided the dataset into 3 parts - with 4,000 training set images, 500 evaluation set images and 500 images in the test set. The evaluation set is used to find system parameters. After fixing the parameters, we merged the 4,000 training set and 500 evaluation set images to make a new training set. This corresponds to the training set of 4500 images and the test set of 500 images used by Duygulu et al.

Results: Automatic Image Annotation Given a set of image regions r J we use equation (1) to arrive at the conditional distribution P(w| r J ) Take the top 5 words from that distribution and call them the automatic annotation of the image in question compute annotation recall and precision for every word in the testing set

Results: Automatic Image Annotation –Recall is the number of images correctly annotated with a given word, divided by the number of images that have that word in the human annotation. –Precision is the number of correctly annotated images divided by the total number of images annotated with that particular word –Recall and precision values are averaged over the set of testing words.

Results: Automatic Image Annotation

Results: Ranked Retrieval of Images Given a text query w qry and a testing collection of un-annotated images. For each testing image we use equation (1) to get the conditional probability P(w qry |r J ) use four sets of queries, constructed from all 1-, 2-, 3- and 4-word combinations of words that occur at least twice in the testing set. An image is considered relevant to a given query if its manual annotation contains all of the query words.

Results: Ranked Retrieval of Images evaluation metrics –precision at 5 retrieved images –non-interpolated average precision

Results: Ranked Retrieval of Images

Conclusions and Future Work Proposed a new statistical generative model for learning the semantics of images. –this model works significantly better than a number of other models for image annotation and retrieval. –Our model works directly on the continuous features. Future work will include the extension of this work to larger datasets