Matching Words with Pictures

Matching Words with Pictures
Chun Li Teng Chao Ji

Introduction Learning the joint distribution of image regions and words has many applications. While text and images are separately ambiguous, jointly they tend not to be.

Introduction The important points for images request
1. Users request images both by objects kinds and identities. 2.Users request images both by what they can see in the picture and what they are about. 3. Queries based on image histograms, texture, overall appearance. 4.Text associated with images are extremely useful

Introduction Several Practical Applications for methods that we can link text and images. 1. Automated image annotation: Archivists receive pictures and annotate them with useful keys. 2.Browsing support: Organize the collections for people to browse the collection. 3.Auto-illustrate: A tool that could automatically suggest images to illustrate blocks of text might expose value to casual users.

Introduction Two Main Task:
1. Annotation: attempt to predict annotations of the entire images using all information present. 2 Correspondence: attempt to associate particular words with particular image substructures. This is a peculiar feature of object recognition.

Input Representation and Preprocessing
The features represent major visual properties: 1.Size is represented by the portion of the image covered by the region 2.Position is represented using the coordinates of the region center of mass normalized by the image dimensions. 3.Color is represented using the average and standard deviation of (R,G,B), (L,a,b) and (r=R/(R+G+B), g=G/(R+G+B)) 4. Texture is represented using the average and variance of 16 filter responses. 5. Shape is represented by the ratio of the area to perimeter squared, the moment of inertia, and ratio area to its convex hull.

Annotation Models Multi-Modal Hierarchical Aspect Models
Mixture of Multi-Modal Latent Dirichlet Allocation

Multi-Modal Hierarchical Aspect Models
Images and co-occuring text are generated by nodes arranged in a tree structure

The nodes generate both image region using a Gaussian distribution and words using a multinomial distribution. Each cluster is associated with a path from a leaf to the root. Nodes close to the root are shared by many clusters, and nodes closer to leaves are shared by few clusters.

c indexes clusters w indexes the words in document d, b indexes the image regions in document d , l indexes levels. D is the set of observations for the document, W is the set of words for the document, B is the set of blobs for the document The exponents are introduced to normalize for differing numbers of words and blobs in each image. N_wd denotes the number of words in document d, while N_w denotes the maximum number of words in any document.

Mixture of Multi-Modal Latent Dirichlet Allocation
A graphical probabilistic model

c is the parameter of the Dirichlet prior on the per- document topic distributions. θ is the topic distribution for document z is the topic for the word in document s is the topic for the blob in document b is the specific blob w is the specific word M is the set of blobs N is the set of words I is the entire document (entire image + words)

Let φ denote the approximate posterior over mixture components, and γc denote the corresponding approximate posterior Dirichlet. The distribution over words given an image (that is, a collection of blobs) is:

Simple Correspondence Models
Goal: to build models that can predict words for specific image regions. Method: step 1: vector-quantize representations of image regions step 2: exploit the analogy with statistical lexicon learning

discrete-translation use K-means to vector-quantize the set of features representing an image region label each region with a single label

Problem: data missing Reason: we have to construct a joint probability table linking word tokens to blob tokens.However, the data set does not provide explicit correspondences

from a Hierarchical Clustering Model

hierarchical clustering models - not model the relationships - do encode correspondence to some extent through co-occurrence

Problem:

Conclusion: None of the methods described above are wholly satisfactory for learning correspondence. why? solution(improvement): strengthen the relationship between words and image regions when building up the data set

Integrating Correspondence and Hierarchical Clustering
First approach: linking word emission and region emission probabilities with mixture weights

Second approach: Paired Word and Region Emission at Nodes

Goal: to match word with pictures

Two Methods: Annotation Multi-Modal Hierarchical Aspect Models Mixture of Multi-Modal Latent Dirichlet Allocation Correspondence Discrete Data Translation Correspondence from a Hierarchical Clustering Model Problem: data missing

Two ways to improve (Hierarchical model): LinkingWord Emission and Region Emission Probabilities with Mixture Weights Paired Word and Region Emission at Nodes

The end THANKS

Matching Words with Pictures

Similar presentations

Presentation on theme: "Matching Words with Pictures"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Matching Words with Pictures

Similar presentations

Presentation on theme: "Matching Words with Pictures"— Presentation transcript:

Similar presentations

About project

Feedback