Download presentation
Presentation is loading. Please wait.
1
Experiments on Using Semantic Distances Between Words in Image Caption Retrieval Presenter: Cosmin Adrian Bejan Alan F. Smeaton and Ian Quigley School of Computer Applications Dublin City University
2
2 IR implementation - traditional approach Represent: a user query = a bag of query terms document = a bag of index terms Compute: a degree of similarity between a document and a query based on the overlap or number of query terms in common between them.
3
3 Problems in IR implementation caused by same words describing different things (“bar”, “bank”) different words describing same thing (“stomach pain” – “belly ache”) natural language is fraught with ambiguities at all levels leading to multiple interpretations of words, phrases, etc. Common way to address these problems: query expansion The approach in this paper: when computing the degree of similarity between query and document instead of basing similarity on the terms in common between the two incorporate a quantitative measure of the semantic similarity between index terms into the measure.
4
4 Measuring semantic distance between words knowledge base – hierarchical concept graphs (HCGs) automatically constructed from WordNet The similarity of two classes or synsets: Computing the similarity between two word senses (nouns) can only be done if both are in the same HCG, otherwise they are regarded as being dissimilar. information content of the class c i P(c i ) the class probability of class ci
5
5 Experimental Set-up Hand-caption 2714 images Manually disambiguate polysemous words in caption Manually built a collection of 60 queries Compute various query-caption similarity measure using word-word semantic distances.
6
6 Retrieval Strategies [1-2] Notation query Q={q 1, q 1, … q m }. caption C={c 1, c 1 … c n } where a q i or a c j is the original term used only as a representation for its synset. Sim(t i, t j ) is the similarity between the sense- disambiguated form of two terms t i and t j. Run1 Run2 straightforward statistically-based tf*IDF match between the word forms or strings, i.e. not using word sense disambiguated captions or queries. where terms in caption in query are both expanded to include other word strings from their sense disambi- guated sysnsets (query expansion).
7
7 Retrieval Strategies [3-5] Run3 Run4 Run5 when considering different threshold values for each HCG, given that there is a concentration of usage of concepts from some HCGs (like entity) and hardly any use of others (like shape).
8
8 Retrieval Strategies [6-8] Run6 Run7 Run8
9
9 Experimental Results
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.