Unsupervised learning of visual sense models for Polysemous words

Unsupervised learning of visual sense models for Polysemous words
By David Arredondo

Polysemy: a problem Many English words have multiple meanings. In particular, many are “visually” polysemous--their sense differences can be explained visually. The paper analyzes five polysemous words: “bass”, “face”, “mouse”, “speaker”, and “watch”.

Method and Model: Intuition
The paper ranks images as a preprocessing step for classification. The basis of the reranking is the probability of the “sense” of each image. The sense of these images is derived from text surrounding the image link. Unfortunately, the quality and quantity of data surrounding the the image link is lacking, and need to be supported rom text-only web pages from a regular web search Latent dimensions or categories in a topic are formulated from this bag of words approach using Latent Dirichlet Allocation. LDA finds hidden topics, or distributions, over discrete data using a bayesian formulation.

Method and Model:LDA for topics
The LDA model used in particular is pictured below. Each document is a mixture of z topics, 1 to K. Each document can be thought of as a bag of N words, which is assumed to be generated from the product of two multinomial distributions with dirichlet priors. Note that w represents a word, and that phi and theta are the distributions for the words and topics respectively. Note that only the distribution of topics is used rank the images according to sense.

Method and Model: Dictionary Sense
In order to rank images by sense, they calculated the probability of a sense given a topic by using each dictionary entry in Wordnet, plus synonyms (pitch for “bass”), any hyponyms, and first levle hypernyms (sound property for “bass”) as a bag of words. This involved summing up the probabilities of each word in the bag given the topic, as below With d as the associated text of a web image, the probability of a sense is calculated as: Where P(z|d) is calculated as follows: Note: they also normalising for the length of the text context.

Visual Sense Model The full model uses images ranked by the probability of a given sense. In particular, it uses the N highest-ranked images as positive training data for their given classifier of choice, and RBF kernel SVM. A baseline model was also run; it uses the images returned from a search of a sense, a sense and its synonyms, and a sense and its first level hypernyms (“mouse”, “computer mouse”, “mouse electronic device”). These images are then run through the svm.

Data Note: keyword data was collected via Yahoo Image Search with keywords: “bass”, “face”, “speaker”, “mouse”, and “watch”. These images were human labeled as 0:unrelated, 1:partial, or 2:good, but only for testing.

Features Text data is pruned of: all HTML tags, words that appear only once, stop words and the actual query word. A Porter stemmer is then applied Image features come after resizing all images to 300x300 pixels in grayscale. Both edge features (Canny edge) and scale-invariant salient points (Harris-Laplace) were used, with a dimensional SIFT descriptor used to describe the area around each interest point.

Experiments: Running LDA
Note:The number of topics K is set to 8, which roughly matches the average number of senses per keyword.

Classification 1-SENSE has negative class as only the ground truth of the other objects. MIX-SENSE includes the other non-applicable senses of the given keyword in the negative class.

Critique and Discussion
As stated in the paper, not all senses have a clear visual interpretation. This might explain the varying degree of improvement in their model over the baseline for some keywords. They only use results from one search engine, and they did not use the search results from google, the dominant in the industry. Their method compared applied and compared to other search engines would have more thoroughly grounded the relevance of their work Future work could explore other unsupervised learning algorithms, such as Non-negative Matrix Factorization (NMF) Applying this same idea with modern search engine results adn RNNS as classifiers

Unsupervised learning of visual sense models for Polysemous words

Similar presentations

Presentation on theme: "Unsupervised learning of visual sense models for Polysemous words"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Unsupervised learning of visual sense models for Polysemous words

Similar presentations

Presentation on theme: "Unsupervised learning of visual sense models for Polysemous words"— Presentation transcript:

Similar presentations

About project

Feedback