Matching Words with Pictures

Slides:

Advertisements

Similar presentations

Applications of one-class classification

Advertisements

Clustering Art & Learning the Semantics of Words and Pictures Manigantan Sethuraman.

LEARNING SEMANTICS OF WORDS AND PICTURES TEJASWI DEVARAPALLI.

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Hierarchical Dirichlet Process (HDP)

Weakly supervised learning of MRF models for image region labeling Jakob Verbeek LEAR team, INRIA Rhône-Alpes.

Aggregating local image descriptors into compact codes

Hierarchical Dirichlet Processes

Presented By: Vennela Sunnam

Hierarchical Dirichlet Trees for Information Retrieval Gholamreza Haffari Simon Fraser University Yee Whye Teh University College London NAACL talk, Boulder,

Random Sampling and Data Description

Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.

Li-Jia Li Yongwhan Lim Li Fei-Fei Chong Wang David M. Blei B UILDING AND U SING A S EMANTIVISUAL I MAGE H IERARCHY CVPR, 2010.

Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.

Adaptive Rao-Blackwellized Particle Filter and It’s Evaluation for Tracking in Surveillance Xinyu Xu and Baoxin Li, Senior Member, IEEE.

Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Object Recognition as Machine Translation Matching Words and Pictures Heather Dunlop : Advanced Perception April 17, 2006.

Information Retrieval in Practice

A Search Engine for Historical Manuscript Images Toni M. Rath, R. Manmatha and Victor Lavrenko Center for Intelligent Information Retrieval University.

Modeling (Chap. 2) Modern Information Retrieval Spring 2000.

Image Annotation and Feature Extraction

Autonomous Learning of Object Models on Mobile Robots Xiang Li Ph.D. student supervised by Dr. Mohan Sridharan Stochastic Estimation and Autonomous Robotics.

Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.

Shape-Based Human Detection and Segmentation via Hierarchical Part- Template Matching Zhe Lin, Member, IEEE Larry S. Davis, Fellow, IEEE IEEE TRANSACTIONS.

Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation

BACKGROUND LEARNING AND LETTER DETECTION USING TEXTURE WITH PRINCIPAL COMPONENT ANALYSIS (PCA) CIS 601 PROJECT SUMIT BASU FALL 2004.

Worked examples and exercises are in the text STROUD PROGRAMME 27 STATISTICS.

Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.

Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.

Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.

Eric Xing © Eric CMU, Machine Learning Latent Aspect Models Eric Xing Lecture 14, August 15, 2010 Reading: see class homepage.

The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.

Fundamentals of Data Analysis Lecture 3 Basics of statistics.

Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,

A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,

Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)

1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.

Towards Total Scene Understanding: Classiﬁcation, Annotation and Segmentation in an Automatic Framework N 工科所錢雅馨 2011/01/16 Li-Jia Li, Richard.

A Self-organizing Semantic Map for Information Retrieval Xia Lin, Dagobert Soergel, Gary Marchionini presented by Yi-Ting.

1 Tracking Dynamics of Topic Trends Using a Finite Mixture Model Satoshi Morinaga, Kenji Yamanishi KDD ’04.

Computer vision: models, learning and inference

The topic discovery models

A Forest of Sensors: Using adaptive tracking to classify and monitor activities in a site Eric Grimson AI Lab, Massachusetts Institute of Technology

An Additive Latent Feature Model

Lesson 8 Introduction to Statistics

Statistical Models for Automatic Speech Recognition

LECTURE 10: EXPECTATION MAXIMIZATION (EM)

Personalized Social Image Recommendation

Video Google: Text Retrieval Approach to Object Matching in Videos

Mean Shift Segmentation

Self-Organizing Maps for Content-Based Image Database Retrieval

Chapter 15 QUERY EXECUTION.

The topic discovery models

Distributions and Concepts in Probability Theory

6. Introduction to nonparametric clustering

MEASURES OF CENTRAL TENDENCY

Statistical Models for Automatic Speech Recognition

Brief Review of Recognition + Context

CSCI 5822 Probabilistic Models of Human and Machine Learning

The topic discovery models

Ying Dai Faculty of software and information science,

Matching Words and Pictures

Topic Models in Text Processing

Hierarchical Relational Models for Document Networks

Video Google: Text Retrieval Approach to Object Matching in Videos

Word embeddings (continued)

EM Algorithm and its Applications

Presentation transcript:

Matching Words with Pictures Chun Li Teng Chao Ji

Introduction Learning the joint distribution of image regions and words has many applications. While text and images are separately ambiguous, jointly they tend not to be.

Introduction The important points for images request 1. Users request images both by objects kinds and identities. 2.Users request images both by what they can see in the picture and what they are about. 3. Queries based on image histograms, texture, overall appearance. 4.Text associated with images are extremely useful

Introduction Several Practical Applications for methods that we can link text and images. 1. Automated image annotation: Archivists receive pictures and annotate them with useful keys. 2.Browsing support: Organize the collections for people to browse the collection. 3.Auto-illustrate: A tool that could automatically suggest images to illustrate blocks of text might expose value to casual users.

Introduction Two Main Task: 1. Annotation: attempt to predict annotations of the entire images using all information present. 2 Correspondence: attempt to associate particular words with particular image substructures. This is a peculiar feature of object recognition.

Input Representation and Preprocessing The features represent major visual properties: 1.Size is represented by the portion of the image covered by the region 2.Position is represented using the coordinates of the region center of mass normalized by the image dimensions. 3.Color is represented using the average and standard deviation of (R,G,B), (L,a,b) and (r=R/(R+G+B), g=G/(R+G+B)) 4. Texture is represented using the average and variance of 16 filter responses. 5. Shape is represented by the ratio of the area to perimeter squared, the moment of inertia, and ratio area to its convex hull.

Annotation Models Multi-Modal Hierarchical Aspect Models Mixture of Multi-Modal Latent Dirichlet Allocation

Multi-Modal Hierarchical Aspect Models Images and co-occuring text are generated by nodes arranged in a tree structure

Multi-Modal Hierarchical Aspect Models The nodes generate both image region using a Gaussian distribution and words using a multinomial distribution. Each cluster is associated with a path from a leaf to the root. Nodes close to the root are shared by many clusters, and nodes closer to leaves are shared by few clusters.

Multi-Modal Hierarchical Aspect Models c indexes clusters w indexes the words in document d, b indexes the image regions in document d , l indexes levels. D is the set of observations for the document, W is the set of words for the document, B is the set of blobs for the document The exponents are introduced to normalize for differing numbers of words and blobs in each image. N_wd denotes the number of words in document d, while N_w denotes the maximum number of words in any document.

Mixture of Multi-Modal Latent Dirichlet Allocation A graphical probabilistic model

Mixture of Multi-Modal Latent Dirichlet Allocation c is the parameter of the Dirichlet prior on the per- document topic distributions. θ is the topic distribution for document z is the topic for the word in document s is the topic for the blob in document b is the specific blob w is the specific word M is the set of blobs N is the set of words I is the entire document (entire image + words)

Mixture of Multi-Modal Latent Dirichlet Allocation Let φ denote the approximate posterior over mixture components, and γc denote the corresponding approximate posterior Dirichlet. The distribution over words given an image (that is, a collection of blobs) is:

Simple Correspondence Models Goal: to build models that can predict words for specific image regions. Method: step 1: vector-quantize representations of image regions step 2: exploit the analogy with statistical lexicon learning

Simple Correspondence Models discrete-translation use K-means to vector-quantize the set of features representing an image region label each region with a single label

Simple Correspondence Models

Simple Correspondence Models Problem: data missing Reason: we have to construct a joint probability table linking word tokens to blob tokens.However, the data set does not provide explicit correspondences

Simple Correspondence Models

Simple Correspondence Models

Simple Correspondence Models from a Hierarchical Clustering Model

Simple Correspondence Models hierarchical clustering models - not model the relationships - do encode correspondence to some extent through co-occurrence

Simple Correspondence Models Problem:

Simple Correspondence Models

Simple Correspondence Models Conclusion: None of the methods described above are wholly satisfactory for learning correspondence. why? solution(improvement): strengthen the relationship between words and image regions when building up the data set

Integrating Correspondence and Hierarchical Clustering First approach: linking word emission and region emission probabilities with mixture weights

Integrating Correspondence and Hierarchical Clustering Second approach: Paired Word and Region Emission at Nodes

Integrating Correspondence and Hierarchical Clustering

Integrating Correspondence and Hierarchical Clustering

Matching Words with Pictures Goal: to match word with pictures

Matching Words with Pictures Two Methods: Annotation Multi-Modal Hierarchical Aspect Models Mixture of Multi-Modal Latent Dirichlet Allocation Correspondence Discrete Data Translation Correspondence from a Hierarchical Clustering Model Problem: data missing

Matching Words with Pictures Two ways to improve (Hierarchical model): LinkingWord Emission and Region Emission Probabilities with Mixture Weights Paired Word and Region Emission at Nodes

Matching Words with Pictures The end THANKS