Presentation is loading. Please wait.

Presentation is loading. Please wait.

Matching Words with Pictures

Similar presentations


Presentation on theme: "Matching Words with Pictures"— Presentation transcript:

1 Matching Words with Pictures
Chun Li Teng Chao Ji

2 Introduction Learning the joint distribution of image regions and words has many applications. While text and images are separately ambiguous, jointly they tend not to be.

3 Introduction The important points for images request
1. Users request images both by objects kinds and identities. 2.Users request images both by what they can see in the picture and what they are about. 3. Queries based on image histograms, texture, overall appearance. 4.Text associated with images are extremely useful

4 Introduction Several Practical Applications for methods that we can link text and images. 1. Automated image annotation: Archivists receive pictures and annotate them with useful keys. 2.Browsing support: Organize the collections for people to browse the collection. 3.Auto-illustrate: A tool that could automatically suggest images to illustrate blocks of text might expose value to casual users.

5 Introduction Two Main Task:
1. Annotation: attempt to predict annotations of the entire images using all information present. 2 Correspondence: attempt to associate particular words with particular image substructures. This is a peculiar feature of object recognition.

6 Input Representation and Preprocessing
The features represent major visual properties: 1.Size is represented by the portion of the image covered by the region 2.Position is represented using the coordinates of the region center of mass normalized by the image dimensions. 3.Color is represented using the average and standard deviation of (R,G,B), (L,a,b) and (r=R/(R+G+B), g=G/(R+G+B)) 4. Texture is represented using the average and variance of 16 filter responses. 5. Shape is represented by the ratio of the area to perimeter squared, the moment of inertia, and ratio area to its convex hull.

7 Annotation Models Multi-Modal Hierarchical Aspect Models
Mixture of Multi-Modal Latent Dirichlet Allocation

8 Multi-Modal Hierarchical Aspect Models
Images and co-occuring text are generated by nodes arranged in a tree structure

9 Multi-Modal Hierarchical Aspect Models
The nodes generate both image region using a Gaussian distribution and words using a multinomial distribution. Each cluster is associated with a path from a leaf to the root. Nodes close to the root are shared by many clusters, and nodes closer to leaves are shared by few clusters.

10 Multi-Modal Hierarchical Aspect Models
c indexes clusters w indexes the words in document d, b indexes the image regions in document d , l indexes levels. D is the set of observations for the document, W is the set of words for the document, B is the set of blobs for the document The exponents are introduced to normalize for differing numbers of words and blobs in each image. N_wd denotes the number of words in document d, while N_w denotes the maximum number of words in any document.

11 Mixture of Multi-Modal Latent Dirichlet Allocation
A graphical probabilistic model

12 Mixture of Multi-Modal Latent Dirichlet Allocation
c is the parameter of the Dirichlet prior on the per- document topic distributions. θ is the topic distribution for document z is the topic for the word in document s is the topic for the blob in document b is the specific blob w is the specific word M is the set of blobs N is the set of words I is the entire document (entire image + words)

13 Mixture of Multi-Modal Latent Dirichlet Allocation
Let φ denote the approximate posterior over mixture components, and γc denote the corresponding approximate posterior Dirichlet. The distribution over words given an image (that is, a collection of blobs) is:

14 Simple Correspondence Models
Goal: to build models that can predict words for specific image regions. Method: step 1: vector-quantize representations of image regions step 2: exploit the analogy with statistical lexicon learning

15 Simple Correspondence Models
discrete-translation use K-means to vector-quantize the set of features representing an image region label each region with a single label

16 Simple Correspondence Models

17 Simple Correspondence Models
Problem: data missing Reason: we have to construct a joint probability table linking word tokens to blob tokens.However, the data set does not provide explicit correspondences

18 Simple Correspondence Models

19 Simple Correspondence Models

20 Simple Correspondence Models
from a Hierarchical Clustering Model

21 Simple Correspondence Models
hierarchical clustering models - not model the relationships - do encode correspondence to some extent through co-occurrence

22 Simple Correspondence Models
Problem:

23 Simple Correspondence Models

24 Simple Correspondence Models
Conclusion: None of the methods described above are wholly satisfactory for learning correspondence. why? solution(improvement): strengthen the relationship between words and image regions when building up the data set

25 Integrating Correspondence and Hierarchical Clustering
First approach: linking word emission and region emission probabilities with mixture weights

26 Integrating Correspondence and Hierarchical Clustering
Second approach: Paired Word and Region Emission at Nodes

27 Integrating Correspondence and Hierarchical Clustering

28 Integrating Correspondence and Hierarchical Clustering

29 Matching Words with Pictures
Goal: to match word with pictures

30 Matching Words with Pictures
Two Methods: Annotation Multi-Modal Hierarchical Aspect Models Mixture of Multi-Modal Latent Dirichlet Allocation Correspondence Discrete Data Translation Correspondence from a Hierarchical Clustering Model Problem: data missing

31 Matching Words with Pictures
Two ways to improve (Hierarchical model): LinkingWord Emission and Region Emission Probabilities with Mixture Weights Paired Word and Region Emission at Nodes

32 Matching Words with Pictures
The end THANKS


Download ppt "Matching Words with Pictures"

Similar presentations


Ads by Google