Scalable Image Annotation with ConceptRank Petra Budíková, Michal Batko, Pavel Zezula.

Scalable Image Annotation with ConceptRank Petra Budíková, Michal Batko, Pavel Zezula

Outline  Search-based annotation  Motivation  Problem formalization  Challenges  ConceptRank  Idea  Semantic network construction  PageRank and ConceptRank  Image annotation with ConceptRank  MUFIN Image Annotation  Framework description  Current implementation and parameters  Examples  Experimental evaluation  Future work

What and why?

Motivation  What is in the image?  Why do I care?  Keyword-based image retrieval  Impaired users  Data summarization  Scientific data classification …… Yellow flower Flower, yellow, dandelion, detail, closeup, nature, plant, beautiful Taraxacum officinale The first dandelion that bloomed this year in front of the White House. nature dandelion

Problem formalization  The annotation task is defined by a query image I and a vocabulary V of target concepts  The annotation function f A assigns to each concept c ∈ V a value from that expresses the probability of the concept c being relevant for I  Depending on the application, only a subset of V can be returned to the user  a fixed number of the most probable concepts  concepts with probability higher than a given threshold  some advanced selection of interesting concepts V = {flower, animal, person, building}

How can we describe the image? Option 1: Classifiers Option 2: Search-based approach PrinciplesLearning phase: use reliable training data to create classifiers for selected concepts Annotation phase: run classifiers Learning phase: none Annotation phase: similarity search over annotated data + postprocessing Main advantages  mature technologies available (e.g. neural networks)  fast  high precision and recall  reducing the reliance on cleanly labeled data, utilization of web data  no costly learning phase, annotation phase can be easily adjusted to user’s preferences  scalability w.r.t. vocabulary size Use casesAnnotations with fixed vocabulary and reliable training data  identification of people  classification of cancer cells  … Annotations with open/adaptable vocabulary  proposing keyword annotations for web image databases – need to be rich, adapt to the changing vocabulary of users Option 1: Classifiers Option 2: Search-based approach PrinciplesLearning phase: use reliable training data to create classifiers for selected concepts Annotation phase: run classifiers Learning phase: none Annotation phase: similarity search over annotated data + postprocessing Main advantages  mature technologies available (e.g. neural networks)  fast  high precision and recall  reducing the reliance on cleanly labeled data, utilization of web data  no costly learning phase, annotation phase can be easily adjusted to user’s preferences  scalability w.r.t. vocabulary size Option 1: Classifiers Option 2: Search-based approach PrinciplesLearning phase: use reliable training data to create classifiers for selected concepts Annotation phase: run classifiers Learning phase: none Annotation phase: similarity search over annotated data + postprocessing Option 1: Classifiers Option 2: Search-based approach

Search-based approach: basic scheme V = {flower, animal, person, building} Annotated image collection Content-based image retrieval Similar annotated images Yellow, bloom, pretty Meadow, outdoors, dandelion Mary’s garden, summer Text processing Semantic resources Selection of the final annotation flower Candidate keywords with probabilities/scores Plant 0.3 Flower 0.3 Garden 0.15 Animal 0.05 Human 0.1 Park 0.1

Search-based approach: challenges  Selection and preprocessing of underlying database of annotated images  Size vs. quality  Effective and efficient image search  Descriptors, indexing technique  Image search results processing  Baseline: word cloud  Advanced: semantic analysis, annotation with hierarchic structure  Selection of output  (user?)selected level of the hierarchic structure

ConceptRank

 Baseline word cloud solution  ???  What would a person do?  Search for semantic connections between candidate keywords  Flowers bloom; dandelion is a flower; there are usually flowers in a garden; …  Based on the connections, estimate probabilities of vocabulary terms  “Flower” is rather likely Idea Content-based image retrieval ? V = {flower, animal, person, building} Similar annotated images Yellow, bloom, pretty Meadow, outdoors, dandelion Mary’s garden, summer

 What can the computer do?  Search for semantic connections between candidate keywords?  Yes! Ontologies, WordNet, image dataset statistics, web, …  Based on the connections, estimate probabilities of vocabulary terms?  Yes! Based on the connections, add new candidates and/or adjust the score of existing candidates  So, lets try it!  Tasks:  find a suitable source of semantic information  propose an algorithm that uses the selected resource to discover semantic connections between candidate concepts and performs score recomputation  We want a generic and theoretically sound solution Idea (cont.) ConceptRank

ConceptRank overview  Let us asume we have some semantic resource S that contains  Semantic objects  Relationships between semantic objects  Mapping from English words to semantic objects  For ConceptRank, we need to  Transform the input keywords into semantic objects from S  Lets call the result “initial candidate objects”  Retrieve relationships between candidate objects and if suitable, add new candidate objects  We need a suitable representation for this: semantic networks  Compute the probability of candidate objects  The actual ConceptRank algorithm

 Graph representation of semantic relationships  Nodes: candidate objects  Node probability: current probability of the respective candidate concept  Edges: relationships between candidate objects  Edge weight: “relevance transfer” capacity  the weight of edge from A to B expresses the ratio of probability which node A contributes to node B Semantic network for annotations dog cat animal mouse computer keyboard 1 1 0.5 1 0.33 0.5 0.2 0.1 0.2 0.1 0.2

Building the semantic network Input: initObjectsWithProb – set of initial objects with probabilities, S - semantic resource, rels – set of interesting relationships Output: semanticNet – the semantic network begin queue <- initObjectsWithProb.getObjects(); for (o : queue) do semanticNet.addNode(o); queue.remove(o); for (r : rels) do for (o2 : S.getConnectedObjects(o,r)) do if (semanticNet.contains(o2)) then semanticNet.addEdge(o,o2,r,computeWeight(r,…)); else if (r.isExpandingRel) then queue.add(o2); semanticNet.addNode(o2); semanticNet.addEdge(o,o2,r,computeWeight(r,…)); fi done end

ConceptRank algorithm  Task: Using the probabilities of initial concepts (which were obtained from previous annotation phases) and the semantic network, compute the probability of each node in the network  Observations:  The nodes in the network mutually influence each other’s probability  The computation of node probabilities needs to be an iterative process  Goal: theoretically sound algorithm that finds a balanced state of the iterative process  Inspiration: Google PageRank algorithm dog cat animal mouse computer 1 1 0.5 keyboard 1 0.33 0.5 0.2 0.1 0.2 0.1 0.2 dog cat animal mouse computer 1 1 0.5 keyboard 1 0.33 0.5 0.066 0.35 0.166 0.1 0.25

PageRank  Input: Web pages and links represented in a graph  Output: Importance score of pages  Algorithm idea: In its simplest form, PageRank is a solution to the recursive equation “a page is important if important pages link to it.”  The importance of any node is computed as the probability that this node is reached by a random surfer who starts in an arbitrary node of the network graph and moves for a long time.  Network graph construction:  Pages are represented by nodes, hyperlinks by oriented edges.  For each node in the graph, the sum of weights of all outgoing edges is 1. A C B 0.5 1

PageRank (cont.)  Some math behind:  Since the probability of reaching a node depends solely on the probabilities of referencing nodes, the random surfer model is a Markov process.  For Markov processes, it is known that the distribution of the surfer approaches a limiting distribution, provided two conditions are met:  the graph is strongly connected (it is possible to get from any node to any other n.)  there are no dead ends (nodes that have no outgoing edges)  To meet these conditions, the random surfer can perform random restarts – with a probability P restart, he can restart at any moment in any node  Computation of scores: eigenvector computation over the matrix representation of the adjusted graph P restart =0.3 A C B 0.35 0.7 0.35 0.33 0.1 0.33 0.1 0.33 0.1 A C B 0.5 1 A C B 0.35 0.7 0.35

ConceptRank vs. PageRank  Input:  PageRank: web pages and hyperlinks  ConceptRank: candidate concepts and semantic links  Output:  PageRank: importance score of pages  ConceptRank: importance score of candidate concepts  Similarities:  We have nodes and links that can be used to form a graph/network  The network can be modelled as a Markov process  The random walk intuition makes sense for both problems  Random walk with internet: simulates randomly surfing user  Random walk with keywords: simulates user’s thinking while looking for relevant concepts  Differences:  For ConceptRank, we want to consider initial probabilities associated with nodes

Adaptation of initial probabilities into the model  Random restarts will not be uniformly random  Instead, the probability that the walk will restart in a given node will correspond to the initial probability of that node  The initial probability is determined by previous steps of the annotation process  For concepts found among the keywords of similar images, the initial probability corresponds to the frequency of the concept  For concepts that were added during the semantic network building, the initial probability is 0 dog cat animal mouse computer keyboard 0.4 0.35 0.2 0.4 0.35 0 0.2 0.05 0 0.4 0.05 0.35 0.2 0.05 dog cat animal mouse computer 1 1 0.5 keyboard 1 0.33 0.5

ConceptRank algorithm Input: initObjectsWithProb – initial concepts and their probabilities, semanticNet – the semantic network, rels – selected relationships and their weights, restartProb – probability of random surfer restart Output: nodeProbs – probabilities of network nodes begin //construct the restart vector and matrix restartVector <- constant vector of 0 values; for (n : semanticNet.getNodes()) do if (initObjectsWithProb.contains(n)) then restartVector[semanticNet.indexOf(n)] <- initObjectsWithProb.get(n); fi done restartM <- unityVector*restartVector; // construct the transition matrix, normalize, solve dead ends transitionM <- new Matrix; for (r : rels.getRelationshipTypes()) do relM = constructTypeMatrix(semanticNet.getNodes,semanticNet.getEdges(r)); transitionM.add(relM*rels.getWeight(r)); done transitionM.normalize(); for (i=0; i<transitionM.getColumnDimension(); i++) do if (transitionM.getColumn(i).getSum() == 0) then transitionM.replaceColumn(i, restartVector); fi done // compute the eigenvector completeMatrix <- (1-restartProb)*transitionM + restartProb*restartM; nodeProbs <- completeMatrix.getPrincipalEigenvector(); end

Efficiency issues  For larger sets of similar images, the number of initial keywords and subsequentially the number of nodes in the network may get high (1000+)  Costly construction of the semantic network  Costly computation of the ConceptRank  Therefore, approximations can be used  For semantic network construction: limiting the number of initial nodes  For ConceptRank computation: limited number of multiplications by the transfer matrix instead of the exact mathematic computation of the eigenvector  Approximation used by Google, known to work very well

Putting theory to use

The basic annotation scheme again V = {flower, animal, person, building} Annotated image collection Content-based image retrieval Similar annotated images Yellow, bloom, pretty Meadow, outdoors, dandelion Mary’s garden, summer Text processing Semantic resources Selection of the final annotation flower Candidate keywords with probabilities/scores Plant 0.3 Flower 0.3 Garden 0.15 Animal 0.05 Human 0.1 Park 0.1 ConceptRank

MUFIN Image Annotation Framework  Modular architecture for image annotation  There is an extensible set of modules that implement the same interface  Can be arbitrarily combined into an “annotation pipeline”  There is an “annotation record” object that is passed from one module to another  Carries information about query and candidate keywords, current estimate of probabilities, and any other knowledge deemed relevant by individual modules  Clear structure, easy adaptability  Upgrade from MPEG7 to DeCAF descriptors = replacing one module without disturbing others  MUFIN Image Annotation application

MUFIN Image Annotation – current version  Objective:  Annotation with semantic relationships evaluated by ConceptRank  Basic decisions:  Reference dataset: 20M Profiset  20M high-quality images with rich and systematic annotation  20 keywords per image on average  Obtained from a commercial web-site selling stock images  Evaluation of visual similarity: DeCAF descriptors  State-of-the-art for image content description  Indexing: PPP-codes  Source of semantic information: WordNet  Lexical database of English  Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms – synsets  Synsets are interlinked by conceptual-semantic and lexical relations  Hypernyms, hyponyms, …

WordNet ConceptRank details  Basic objects for semantic analysis: synset  Step 1: Transformation of keywords to synsets  For keywords with multiple meanings, there exist more synsets (e.g. mouse). How do we decide which synset(s) to pick?  There is an additional resourse that for most English words lists the possible synsets together with a score that corresponds to the frequency of use of the keyword in the meaning described by the given synset  We take a fixed number of the most probable synsets for each keyword  There may be many synsets retrieved by the previous step, which could lead to costly processing of the semantic network  Therefore, only a fixed number of the most probable synsets are used to build the network

WordNet ConceptRank details (cont.)  Step 2: Construction of WordNet-based semantic network  Which relationships are interesting?  For now: Hypernyms, hyponyms, holonyms, meronyms  Which relationships should be used to extend the network and which should be used only to add edges between existing nodes?  Extending mode: bottom-up relationships (hypernyms, maybe holonyms)  How shall we compute the weights of semantic network edges for each relationship?  Bottom-up relationships: edge weight 1  Top-down relationships: edge weight 1/(number of child nodes) dog cat animal mouse computer 1 1 1 1 keyboard 1 0.33 0.5

The complete annotation pipeline  Similarity search  Extraction of the DeCAF descriptor from the query image  Retrieval of k visual nearest neighbors  Semantic analysis  Frequency analysis of keywords + normalization  Transformation of keywords to synsets  Construction of WordNet-based semantic network  Computation of ConceptRank  Selection of the final annotation  Mapping synsets with probabilities to vocabulary concepts

Overview of annotation parameters  Similarity search  # of similar images  Transformation of keywords to synsets  # of most probable synsets per keyword  # of most probable synsets that enter the network construction  Construction of WordNet-based semantic network  types of relationships  for extending network  for adding edges  weights of edges for individual relationships  Computation of ConceptRank  restart probability  weights of individual relationship matrices

Annotation query example  Input: ? Vocabulary: all English words

Example: kNN search and initial synsets  kNN search: k=5  Keywords to synsets: at most 3 most probable synsets per keyword  Merge synsets: 20 synsets with the highest probability  beak, cotswolds, flamingoes (2), head, janes (2), pink, site, slimbridge (2), trust, water, wetlands, wildfowl  beak, cotswolds, flamingoes (2), head, janes (2), pink, preening, site, slimbridge (2), trust, water, wetlands, wildfowl  american, birds, darwin, flamingo (2), flap, flapping (2), galapagos, greater (2), islands, markings, phoenicopterus, race, ruber, south, wing, wings (2)  aythya, drake, duck, sv, swimming ? flamingo0,185 greater0,062 wildfowl0,062 Cotswolds0,062 Aythya0,062 wetland0,062 site0,058 head0,049 pink0,047 water0,046 trust0,037 wings0,037 duck0,034 Drake0,031 drake0,031 swimming0,031 Galapagos_Islands0,031 beak0,025 American0,023 Initial synsets:

Example: semantic network – hypernyms

Example: annotation results Top 5 keywords – demonstration settings Flamingoes (4.15) Duck (2.44) Wildfowl (1.74) Birds (1.48) Wetlands (1.41) Top 5 keywords – 70 images, 7 synsets/kw, 100 init. synsets, all relationships Animal (2.68) Bird (2.42) Travel (2.30) Vertebrates (2.04) Swimming (1.42)

Experimental evaluation  ImageCLEF 2014: Scalable Concept Image Annotation  Focus on concept-wise scalability  No reasonable training data  Provided development queries, GT and evaluation scripts Vocabulary: aerial airplane baby beach bicycle bird boat bridge building car cartoon castle cat chair child church cityscape closeup cloud cloudless coast countryside daytime desert diagram dog drink drum elder embroidery fire firework fish flower fog food footwear furniture garden grass guitar harbor hat helicopter highway horse indoor instrument lake lightning logo monument moon motorcycle mountain nighttime overcast painting park person plant portrait protest rain rainbow reflection river road sand sculpture sea shadow sign silhouette smoke snow soil space spectacles sport sun sunrise/sunset table teenager toy traffic train tricycle truck underwater unpaved wagon water GT: countryside daytime grass horse plant

 Development data results  Processing time: 1500 ms on average for parameters used in the table  1000 ms for descriptor extraction (can be improved)  300 ms for similarity search  Competition results: a close 2 nd place Experimental evaluation (cont.) MP-cMR-cMF-cMP-sMR-sMF-sMAP-s Random baseline 2.791.031.173.151.912.238.78 DISA baseline – freq. analysis, 1 synset per kw 20.9634.2222.8737.3043.1438.0740.59 DISA baseline with multiple synsets per kw 31.2036.7627.7944.3051.0045.0050.03 DISA with hyper-hypo 30.1036.5728.7548.4258.2250.2658.35 DISA with hyper-hypo-holo-mero 30.2936.6328.9849.0859.1151.0059.34

What next?

Summary and Future work  Already done  The ConceptRank algorithm  Working annotation system  Good results in the ImageCLEF competition  Near future  More evaluations  Influence of dataset size and quality, approximation params, …  Google ground truth  Publish or perish  More distant future  Other resources of semantic relationships  Ontologies, Word2Vec  Relevance feedback  Combined architecture: search-based approach and modern NN classifiers

Scalable Image Annotation with ConceptRank Petra Budíková, Michal Batko, Pavel Zezula.

Similar presentations

Presentation on theme: "Scalable Image Annotation with ConceptRank Petra Budíková, Michal Batko, Pavel Zezula."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Scalable Image Annotation with ConceptRank Petra Budíková, Michal Batko, Pavel Zezula.

Similar presentations

Presentation on theme: "Scalable Image Annotation with ConceptRank Petra Budíková, Michal Batko, Pavel Zezula."— Presentation transcript:

Similar presentations

About project

Feedback