Download presentation
Presentation is loading. Please wait.
Published byKelley Stanley Modified over 8 years ago
1
Scalable Image Annotation with ConceptRank Petra Budíková, Michal Batko, Pavel Zezula
2
Slide 2 Outline Search-based annotation Motivation Problem formalization Challenges ConceptRank Idea Semantic network construction PageRank and ConceptRank Image annotation with ConceptRank MUFIN Image Annotation Framework description Current implementation and parameters Examples Experimental evaluation Future work
3
What and why?
4
Slide 4 Motivation What is in the image? Why do I care? Keyword-based image retrieval Impaired users Data summarization Scientific data classification …… Yellow flower Flower, yellow, dandelion, detail, close- up, nature, plant, beautiful Taraxacum officinale The first dandelion that bloomed this year in front of the White House. nature dandelion
5
Slide 5 Problem formalization The annotation task is defined by a query image I and a vocabulary V of target concepts The annotation function f A assigns to each concept c ∈ V a value from that expresses the probability of the concept c being relevant for I Depending on the application, only a subset of V can be returned to the user a fixed number of the most probable concepts concepts with probability higher than a given threshold some advanced selection of interesting concepts V = {flower, animal, person, building}
6
Slide 6 How can we describe the image? Option 1: Classifiers Option 2: Search-based approach PrinciplesLearning phase: use reliable training data to create classifiers for selected concepts Annotation phase: run classifiers Learning phase: none Annotation phase: similarity search over annotated data + postprocessing Main advantages mature technologies available (e.g. neural networks) fast high precision and recall reducing the reliance on cleanly labeled data, utilization of web data no costly learning phase, annotation phase can be easily adjusted to user’s preferences scalability w.r.t. vocabulary size Use casesAnnotations with fixed vocabulary and reliable training data identification of people classification of cancer cells … Annotations with open/adaptable vocabulary proposing keyword annotations for web image databases – need to be rich, adapt to the changing vocabulary of users Option 1: Classifiers Option 2: Search-based approach PrinciplesLearning phase: use reliable training data to create classifiers for selected concepts Annotation phase: run classifiers Learning phase: none Annotation phase: similarity search over annotated data + postprocessing Main advantages mature technologies available (e.g. neural networks) fast high precision and recall reducing the reliance on cleanly labeled data, utilization of web data no costly learning phase, annotation phase can be easily adjusted to user’s preferences scalability w.r.t. vocabulary size Option 1: Classifiers Option 2: Search-based approach PrinciplesLearning phase: use reliable training data to create classifiers for selected concepts Annotation phase: run classifiers Learning phase: none Annotation phase: similarity search over annotated data + postprocessing Option 1: Classifiers Option 2: Search-based approach
7
Slide 7 Search-based approach: basic scheme V = {flower, animal, person, building} Annotated image collection Content-based image retrieval Similar annotated images Yellow, bloom, pretty Meadow, outdoors, dandelion Mary’s garden, summer Text processing Semantic resources Selection of the final annotation flower Candidate keywords with probabilities/scores Plant 0.3 Flower 0.3 Garden 0.15 Animal 0.05 Human 0.1 Park 0.1
8
Slide 8 Search-based approach: challenges Selection and preprocessing of underlying database of annotated images Size vs. quality Effective and efficient image search Descriptors, indexing technique Image search results processing Baseline: word cloud Advanced: semantic analysis, annotation with hierarchic structure Selection of output (user?)selected level of the hierarchic structure
9
ConceptRank
10
Slide 10 Baseline word cloud solution ??? What would a person do? Search for semantic connections between candidate keywords Flowers bloom; dandelion is a flower; there are usually flowers in a garden; … Based on the connections, estimate probabilities of vocabulary terms “Flower” is rather likely Idea Content-based image retrieval ? V = {flower, animal, person, building} Similar annotated images Yellow, bloom, pretty Meadow, outdoors, dandelion Mary’s garden, summer
11
Slide 11 What can the computer do? Search for semantic connections between candidate keywords? Yes! Ontologies, WordNet, image dataset statistics, web, … Based on the connections, estimate probabilities of vocabulary terms? Yes! Based on the connections, add new candidates and/or adjust the score of existing candidates So, lets try it! Tasks: find a suitable source of semantic information propose an algorithm that uses the selected resource to discover semantic connections between candidate concepts and performs score recomputation We want a generic and theoretically sound solution Idea (cont.) ConceptRank
12
Slide 12 ConceptRank overview Let us asume we have some semantic resource S that contains Semantic objects Relationships between semantic objects Mapping from English words to semantic objects For ConceptRank, we need to Transform the input keywords into semantic objects from S Lets call the result “initial candidate objects” Retrieve relationships between candidate objects and if suitable, add new candidate objects We need a suitable representation for this: semantic networks Compute the probability of candidate objects The actual ConceptRank algorithm
13
Slide 13 Graph representation of semantic relationships Nodes: candidate objects Node probability: current probability of the respective candidate concept Edges: relationships between candidate objects Edge weight: “relevance transfer” capacity the weight of edge from A to B expresses the ratio of probability which node A contributes to node B Semantic network for annotations dog cat animal mouse computer keyboard 1 1 0.5 1 0.33 0.5 0.2 0.1 0.2 0.1 0.2
14
Slide 14 Building the semantic network Input: initObjectsWithProb – set of initial objects with probabilities, S - semantic resource, rels – set of interesting relationships Output: semanticNet – the semantic network begin queue <- initObjectsWithProb.getObjects(); for (o : queue) do semanticNet.addNode(o); queue.remove(o); for (r : rels) do for (o2 : S.getConnectedObjects(o,r)) do if (semanticNet.contains(o2)) then semanticNet.addEdge(o,o2,r,computeWeight(r,…)); else if (r.isExpandingRel) then queue.add(o2); semanticNet.addNode(o2); semanticNet.addEdge(o,o2,r,computeWeight(r,…)); fi done end
15
Slide 15 ConceptRank algorithm Task: Using the probabilities of initial concepts (which were obtained from previous annotation phases) and the semantic network, compute the probability of each node in the network Observations: The nodes in the network mutually influence each other’s probability The computation of node probabilities needs to be an iterative process Goal: theoretically sound algorithm that finds a balanced state of the iterative process Inspiration: Google PageRank algorithm dog cat animal mouse computer 1 1 0.5 keyboard 1 0.33 0.5 0.2 0.1 0.2 0.1 0.2 dog cat animal mouse computer 1 1 0.5 keyboard 1 0.33 0.5 0.066 0.35 0.166 0.1 0.25
16
Slide 16 PageRank Input: Web pages and links represented in a graph Output: Importance score of pages Algorithm idea: In its simplest form, PageRank is a solution to the recursive equation “a page is important if important pages link to it.” The importance of any node is computed as the probability that this node is reached by a random surfer who starts in an arbitrary node of the network graph and moves for a long time. Network graph construction: Pages are represented by nodes, hyperlinks by oriented edges. For each node in the graph, the sum of weights of all outgoing edges is 1. A C B 0.5 1
17
Slide 17 PageRank (cont.) Some math behind: Since the probability of reaching a node depends solely on the probabilities of referencing nodes, the random surfer model is a Markov process. For Markov processes, it is known that the distribution of the surfer approaches a limiting distribution, provided two conditions are met: the graph is strongly connected (it is possible to get from any node to any other n.) there are no dead ends (nodes that have no outgoing edges) To meet these conditions, the random surfer can perform random restarts – with a probability P restart, he can restart at any moment in any node Computation of scores: eigenvector computation over the matrix representation of the adjusted graph P restart =0.3 A C B 0.35 0.7 0.35 0.33 0.1 0.33 0.1 0.33 0.1 A C B 0.5 1 A C B 0.35 0.7 0.35
18
Slide 18 ConceptRank vs. PageRank Input: PageRank: web pages and hyperlinks ConceptRank: candidate concepts and semantic links Output: PageRank: importance score of pages ConceptRank: importance score of candidate concepts Similarities: We have nodes and links that can be used to form a graph/network The network can be modelled as a Markov process The random walk intuition makes sense for both problems Random walk with internet: simulates randomly surfing user Random walk with keywords: simulates user’s thinking while looking for relevant concepts Differences: For ConceptRank, we want to consider initial probabilities associated with nodes
19
Slide 19 Adaptation of initial probabilities into the model Random restarts will not be uniformly random Instead, the probability that the walk will restart in a given node will correspond to the initial probability of that node The initial probability is determined by previous steps of the annotation process For concepts found among the keywords of similar images, the initial probability corresponds to the frequency of the concept For concepts that were added during the semantic network building, the initial probability is 0 dog cat animal mouse computer keyboard 0.4 0.35 0.2 0.4 0.35 0 0.2 0.05 0 0.4 0.05 0.35 0.2 0.05 dog cat animal mouse computer 1 1 0.5 keyboard 1 0.33 0.5
20
Slide 20 ConceptRank algorithm Input: initObjectsWithProb – initial concepts and their probabilities, semanticNet – the semantic network, rels – selected relationships and their weights, restartProb – probability of random surfer restart Output: nodeProbs – probabilities of network nodes begin //construct the restart vector and matrix restartVector <- constant vector of 0 values; for (n : semanticNet.getNodes()) do if (initObjectsWithProb.contains(n)) then restartVector[semanticNet.indexOf(n)] <- initObjectsWithProb.get(n); fi done restartM <- unityVector*restartVector; // construct the transition matrix, normalize, solve dead ends transitionM <- new Matrix; for (r : rels.getRelationshipTypes()) do relM = constructTypeMatrix(semanticNet.getNodes,semanticNet.getEdges(r)); transitionM.add(relM*rels.getWeight(r)); done transitionM.normalize(); for (i=0; i<transitionM.getColumnDimension(); i++) do if (transitionM.getColumn(i).getSum() == 0) then transitionM.replaceColumn(i, restartVector); fi done // compute the eigenvector completeMatrix <- (1-restartProb)*transitionM + restartProb*restartM; nodeProbs <- completeMatrix.getPrincipalEigenvector(); end
21
Slide 21 Efficiency issues For larger sets of similar images, the number of initial keywords and subsequentially the number of nodes in the network may get high (1000+) Costly construction of the semantic network Costly computation of the ConceptRank Therefore, approximations can be used For semantic network construction: limiting the number of initial nodes For ConceptRank computation: limited number of multiplications by the transfer matrix instead of the exact mathematic computation of the eigenvector Approximation used by Google, known to work very well
22
Putting theory to use
23
Slide 23 The basic annotation scheme again V = {flower, animal, person, building} Annotated image collection Content-based image retrieval Similar annotated images Yellow, bloom, pretty Meadow, outdoors, dandelion Mary’s garden, summer Text processing Semantic resources Selection of the final annotation flower Candidate keywords with probabilities/scores Plant 0.3 Flower 0.3 Garden 0.15 Animal 0.05 Human 0.1 Park 0.1 ConceptRank
24
Slide 24 MUFIN Image Annotation Framework Modular architecture for image annotation There is an extensible set of modules that implement the same interface Can be arbitrarily combined into an “annotation pipeline” There is an “annotation record” object that is passed from one module to another Carries information about query and candidate keywords, current estimate of probabilities, and any other knowledge deemed relevant by individual modules Clear structure, easy adaptability Upgrade from MPEG7 to DeCAF descriptors = replacing one module without disturbing others MUFIN Image Annotation application
25
Slide 25 MUFIN Image Annotation – current version Objective: Annotation with semantic relationships evaluated by ConceptRank Basic decisions: Reference dataset: 20M Profiset 20M high-quality images with rich and systematic annotation 20 keywords per image on average Obtained from a commercial web-site selling stock images Evaluation of visual similarity: DeCAF descriptors State-of-the-art for image content description Indexing: PPP-codes Source of semantic information: WordNet Lexical database of English Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms – synsets Synsets are interlinked by conceptual-semantic and lexical relations Hypernyms, hyponyms, …
26
Slide 26 WordNet ConceptRank details Basic objects for semantic analysis: synset Step 1: Transformation of keywords to synsets For keywords with multiple meanings, there exist more synsets (e.g. mouse). How do we decide which synset(s) to pick? There is an additional resourse that for most English words lists the possible synsets together with a score that corresponds to the frequency of use of the keyword in the meaning described by the given synset We take a fixed number of the most probable synsets for each keyword There may be many synsets retrieved by the previous step, which could lead to costly processing of the semantic network Therefore, only a fixed number of the most probable synsets are used to build the network
27
Slide 27 WordNet ConceptRank details (cont.) Step 2: Construction of WordNet-based semantic network Which relationships are interesting? For now: Hypernyms, hyponyms, holonyms, meronyms Which relationships should be used to extend the network and which should be used only to add edges between existing nodes? Extending mode: bottom-up relationships (hypernyms, maybe holonyms) How shall we compute the weights of semantic network edges for each relationship? Bottom-up relationships: edge weight 1 Top-down relationships: edge weight 1/(number of child nodes) dog cat animal mouse computer 1 1 1 1 keyboard 1 0.33 0.5
28
Slide 28 The complete annotation pipeline Similarity search Extraction of the DeCAF descriptor from the query image Retrieval of k visual nearest neighbors Semantic analysis Frequency analysis of keywords + normalization Transformation of keywords to synsets Construction of WordNet-based semantic network Computation of ConceptRank Selection of the final annotation Mapping synsets with probabilities to vocabulary concepts
29
Slide 29 Overview of annotation parameters Similarity search # of similar images Transformation of keywords to synsets # of most probable synsets per keyword # of most probable synsets that enter the network construction Construction of WordNet-based semantic network types of relationships for extending network for adding edges weights of edges for individual relationships Computation of ConceptRank restart probability weights of individual relationship matrices
30
Slide 30 Annotation query example Input: ? Vocabulary: all English words
31
Slide 31 Example: kNN search and initial synsets kNN search: k=5 Keywords to synsets: at most 3 most probable synsets per keyword Merge synsets: 20 synsets with the highest probability beak, cotswolds, flamingoes (2), head, janes (2), pink, site, slimbridge (2), trust, water, wetlands, wildfowl beak, cotswolds, flamingoes (2), head, janes (2), pink, preening, site, slimbridge (2), trust, water, wetlands, wildfowl american, birds, darwin, flamingo (2), flap, flapping (2), galapagos, greater (2), islands, markings, phoenicopterus, race, ruber, south, wing, wings (2) aythya, drake, duck, sv, swimming ? flamingo0,185 greater0,062 wildfowl0,062 Cotswolds0,062 Aythya0,062 wetland0,062 site0,058 head0,049 pink0,047 water0,046 trust0,037 wings0,037 duck0,034 Drake0,031 drake0,031 swimming0,031 Galapagos_Islands0,031 beak0,025 American0,023 Initial synsets:
32
Slide 32 Example: semantic network – hypernyms
33
Slide 33 Example: annotation results Top 5 keywords – demonstration settings Flamingoes (4.15) Duck (2.44) Wildfowl (1.74) Birds (1.48) Wetlands (1.41) Top 5 keywords – 70 images, 7 synsets/kw, 100 init. synsets, all relationships Animal (2.68) Bird (2.42) Travel (2.30) Vertebrates (2.04) Swimming (1.42)
34
Slide 34 Experimental evaluation ImageCLEF 2014: Scalable Concept Image Annotation Focus on concept-wise scalability No reasonable training data Provided development queries, GT and evaluation scripts Vocabulary: aerial airplane baby beach bicycle bird boat bridge building car cartoon castle cat chair child church cityscape closeup cloud cloudless coast countryside daytime desert diagram dog drink drum elder embroidery fire firework fish flower fog food footwear furniture garden grass guitar harbor hat helicopter highway horse indoor instrument lake lightning logo monument moon motorcycle mountain nighttime overcast painting park person plant portrait protest rain rainbow reflection river road sand sculpture sea shadow sign silhouette smoke snow soil space spectacles sport sun sunrise/sunset table teenager toy traffic train tricycle truck underwater unpaved wagon water GT: countryside daytime grass horse plant
35
Slide 35 Development data results Processing time: 1500 ms on average for parameters used in the table 1000 ms for descriptor extraction (can be improved) 300 ms for similarity search Competition results: a close 2 nd place Experimental evaluation (cont.) MP-cMR-cMF-cMP-sMR-sMF-sMAP-s Random baseline 2.791.031.173.151.912.238.78 DISA baseline – freq. analysis, 1 synset per kw 20.9634.2222.8737.3043.1438.0740.59 DISA baseline with multiple synsets per kw 31.2036.7627.7944.3051.0045.0050.03 DISA with hyper-hypo 30.1036.5728.7548.4258.2250.2658.35 DISA with hyper-hypo-holo-mero 30.2936.6328.9849.0859.1151.0059.34
36
What next?
37
Slide 37 Summary and Future work Already done The ConceptRank algorithm Working annotation system Good results in the ImageCLEF competition Near future More evaluations Influence of dataset size and quality, approximation params, … Google ground truth Publish or perish More distant future Other resources of semantic relationships Ontologies, Word2Vec Relevance feedback Combined architecture: search-based approach and modern NN classifiers
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.