Semantics of words and images Presented by Gal Zehavi & Ilan Gendelman.

Slides:



Advertisements
Similar presentations
Clustering Art & Learning the Semantics of Words and Pictures Manigantan Sethuraman.
Advertisements

LEARNING SEMANTICS OF WORDS AND PICTURES TEJASWI DEVARAPALLI.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Presented by, Biswaranjan Panda and Moutupsi Paul Beyond Nouns -Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers Ref.
PARTITIONAL CLUSTERING
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Inference Network Approach to Image Retrieval Don Metzler R. Manmatha Center for Intelligent Information Retrieval University of Massachusetts, Amherst.
Image Retrieval Basics Uichin Lee KAIST KSE Slides based on “Relevance Models for Automatic Image and Video Annotation & Retrieval” by R. Manmatha (UMASS)
Segmentation and Fitting Using Probabilistic Methods
1 Content-Based Retrieval (CBR) -in multimedia systems Presented by: Chao Cai Date: March 28, 2006 C SC 561.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Image Search Presented by: Samantha Mahindrakar Diti Gandhi.
Automatic Image Annotation and Retrieval using Cross-Media Relevance Models J. Jeon, V. Lavrenko and R. Manmathat Computer Science Department University.
Expectation Maximization Method Effective Image Retrieval Based on Hidden Concept Discovery in Image Database By Sanket Korgaonkar Masters Computer Science.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Visual Querying By Color Perceptive Regions Alberto del Bimbo, M. Mugnaini, P. Pala, and F. Turco University of Florence, Italy Pattern Recognition, 1998.
WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES Prasad Gabbur, Kobus Barnard University of Arizona.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Switch to Top-down Top-down or move-to-nearest Partition documents into ‘k’ clusters Two variants “Hard” (0/1) assignment of documents to clusters “soft”
Presented by Zeehasham Rasheed
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Object Class Recognition using Images of Abstract Regions Yi Li, Jeff A. Bilmes, and Linda G. Shapiro Department of Computer Science and Engineering Department.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Probabilistic Latent Semantic Analysis
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
김덕주 (Duck Ju Kim). Problems What is the objective of content-based video analysis? Why supervised identification has limitation? Why should use integrated.
Object Recognition as Machine Translation Matching Words and Pictures Heather Dunlop : Advanced Perception April 17, 2006.
SIEVE—Search Images Effectively through Visual Elimination Ying Liu, Dengsheng Zhang and Guojun Lu Gippsland School of Info Tech,
Image Annotation and Feature Extraction
Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation
CS654: Digital Image Analysis Lecture 3: Data Structure for Image Analysis.
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement and Relevance Feedback.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Querying Structured Text in an XML Database By Xuemei Luo.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
Presenter: Shanshan Lu 03/04/2010
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
IEEE Int'l Symposium on Signal Processing and its Applications 1 An Unsupervised Learning Approach to Content-Based Image Retrieval Yixin Chen & James.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
Object Recognition a Machine Translation Learning a Lexicon for a Fixed Image Vocabulary Miriam Miklofsky.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Chapter 23: Probabilistic Language Models April 13, 2004.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
Object Recognition Part 2 Authors: Kobus Barnard, Pinar Duygulu, Nado de Freitas, and David Forsyth Slides by Rong Zhang CSE 595 – Words and Pictures Presentation.
Information Retrieval and Organisation Chapter 16 Flat Clustering Dell Zhang Birkbeck, University of London.
Image Classification for Automatic Annotation
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Exploiting Ontologies for Automatic Image Annotation Munirathnam Srikanth, Joshua Varner, Mitchell Bowden, Dan Moldovan Language Computer Corporation SIGIR.
MindReader: Querying databases through multiple examples Yoshiharu Ishikawa (Nara Institute of Science and Technology, Japan) Ravishankar Subramanya (Pittsburgh.
Yixin Chen and James Z. Wang The Pennsylvania State University
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
A Patent Document Retrieval System Addressing Both Semantic and Syntactic Properties Liang Chen*,Naoyuki Tokuda+, Hisahiro Adachi+ *University of Northern.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Nearest Neighbour and Clustering. Nearest Neighbour and clustering Clustering and nearest neighbour prediction technique was one of the oldest techniques.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Automated Information Retrieval
Image Retrieval and Annotation via a Stochastic Modeling Approach
Content-Based Image Retrieval
Content-Based Image Retrieval Readings: Chapter 8:
Project Implementation for ITCS4122
Matching Words with Pictures
MIS2502: Data Analytics Clustering and Segmentation
Presentation transcript:

Semantics of words and images Presented by Gal Zehavi & Ilan Gendelman

What is semantics?  Semantics – “a branch of philosophy dealing with the relations between signs and what they refer to (their meaning)” (Webster)

Content 1) Motivation

Motivation waterfall bear fish Motivation  Object recognition  Better segments aggregation  Image reconstruction

Motivation More motivation…  Data mining: Content Based Image Retrieval (CBIR) higher precision / higher recall / quicker search Applications  Biomedicine (X-ray, Pathology, CT, MRI, …)  Government (radar, aerial, trademark, …)  Commercial (fashion catalog, journalism, …)  Cultural (Museums, art galleries, …)  Education and training  Entertainment, WWW (100 billions?!), …

Motivation Moby Dick Even more motivation…  Auto illustrator  Auto annotator Ocean Helicopter Shark Man Bridge Mountain

Content 1) Motivation 2)Introduction to Semantics

Introduction to Semantics Different aspects of semantics…

Introduction to Semantics Specific Objects  The elephant played chess

Introduction to Semantics Family of Objects  The tree stood alone on the hill.  This car is as fast as the wind.

Introduction to Semantics Scenarios  A couple on the beach but variety is large… Easy to imagine (statistically clear)

Introduction to Semantics Semantics from Context The captain was on the bridge.

Introduction to Semantics Abstract Semantics  Vacation  Strength  Experience ?

Content 1) Motivation 2) Introduction to Semantics 3)Difficulties

Difficulties Difficulties in Text-Image Semantics  Ambiguity  Level of Abstraction coralscouple, beach, sunset men, bottle, liquid competitors, race, alcohol M. Schumacher, Formula 1, Dom-Perignon drivers, car race, champagne celebration, winning, happiness

Content 1) Motivation 2) Introduction to Semantics 3) Difficulties 4)Possible Approaches

Possible approaches Possible Approaches  Search by segment features  Search by image feature similarity + low complexity - missing real image semantics  Semantics through user interaction + refined visualisation + higher level of abstraction - a complex user interface is required  Query by example + relating to user’s visualisation - missing real image semantics  Searching images by text only + use existing text search infrastructure + no specific processing - images must be in textual context - missing real image semantics and features

Content 1) Motivation 2) Introduction to Semantics 3) Difficulties 4) Possible Approaches 5)Models

Models Models  Object recognition as machine translation  “Object Recognition as Machine translation: Learning a Lexicon for a Fixed Image Vocabulary” P.Duygulu, K.Barnard, N.Freitas and D.Forsyth (2002)  Learning semantics by hierarchical clustering of words and image regions  "Learning the semantics of words and pictures“ K.Barnard, D.Forsyth (2001)  "Clustering Art” K.Barnard, P.Duygulu, D.Forsyth (2001)

Content 1) Motivation 2) Introduction to Semantics 3) Difficulties 4) Possible Approaches 5) Models Model #1 :Model #1 : Object recognition as machine translation Object recognition as machine translation

Model #1 Object recognition as machine translation  Description: learning a lexicon for a fixed image vocabulary

Model #1 Our goal is describing a scenery using the lexicon we learned. sand sky sea mountain/forest rock

Model #1 How do we do it?  By applying a similar method to a language translator  translating words by using many sentences correspondence, and creating a statistic.

Model #1 The blob notion  First we segment the image into regions using min-cut method.

Model #1 Assigning Regions to Blobs  How do we assign a region to a blob? Eliminate regions < threshold Define a set of features Discretise each feature distribution Cluster the finite dimension vectors Cluster = Blob

Model #1 In the article experiment  We discretise the 33 different features using the k – mean algorithm  the flowing features are included *region color *convexity *standard deviation *first moment *region orientation energy *region size *location

Model #1  For applying the method to images we need to discretise the image  Using the Corel data base with 371 words and 4500 images that have an 5 to 10 and 4500 images that have an 5 to 10 segment each. segment each. ~35000 segments – 500 blobs ~35000 segments – 500 blobs

Model #1 The K-means algorithm  Matching the best k values to a continues histogram  Given a distribution of a n – dimensional vector we can create a cluster

Model #1 K-means Output : discrete data Input : continues data Some Data Acknowledgments to Andrew W. Moore, Carnegie Mellon University

Model #1 K-means  Iterative algorithm Iterate… Choose k (i.e 5) Randomly guess k center locations Each data point “belongs” to one center Each center finds the centroid of its points Centroid defines the new center Acknowledgments to Andrew W. Moore, Carnegie Mellon University

Model #1 K-means  An example Acknowledgments to Andrew W. Moore, Carnegie Mellon University

Model #1  Now for every image we have a set of blobs  Match with a set of words  So the translation can begin

Model #1 Notation The set of words w = (w n 1 w n 2 …. w n m ) where “m” is the size of the words string The event The event a n j =i means that the j th word in the possible translation translates the i th blob The aliment a n = ( a n 2 ……. a n m ) where “n” is the n th image The aliment a n = (a n 1 a n 2 ……. a n m ) where “n” is the n th image The set of blobs b = (b n 1 b n 2 ……. b n l ) where “l” is the size of the blobs string where “l” is the size of the blobs string

Model #1 More about aliment space A(w,) More about aliment space A(w,b)  For b = (b 1 b 2 ……. b l ) and w =(w 1 w 2 …. w m ) a = (a 1 a 2 …. a m ) is a series taking the values 0 – l So a = (a 1 a 2 …. a m ) is a series taking the values 0 – l representing a discreet function w 1 w 2 w w 4 …. w m w 1 w 2 w 3 w 4 …. w m b 1 b 2 b 3 …. b l b 1 b 2 b 3 …. b l Possible a

Model #1 The likelihood function How do we generate it ? Since it gives us the probability distribution that a set of words is the translation given a set of blobs. We call the conditional probability p(w|b) the likelihood function

Model #1 lexicon: (book) (chair) (sky) (tree) (sun) (fish) (ship) (ring) (sea) (cloud) … w1w1 w2w2 w3w3 b1b1 b2b2 b3b3b Possible translation Set of Blobs given P(a 1 |b,m) P(a 2 |a 1, w 1,b,m) P(a 3 |a 2, w 2,b,m) sky P(w 1 |a 1, b, m) P(w 2 |a 2, w 1,b,m) cloud sun P(w 3 |a 3, w 2,b,m) String size m P(m|b) P(w|b,a ) = P(m|b) P(a 1 |b,m) P(w 1 |a 1,b,m) P(a 3 |a 2,w 2,b,m) P(w 3 |a 3,w 2,b,m) P(a 2 |a 1,w 1,b,m)P(w 2 |a 2,w 1,b,m)

Model #1 Finally we get, without loss of generality : Problem with this formulation is the enormous number of system parameters

Model #1 A more simple model Assumptions : 1)Disregarding the context of the blobs and the words 2) Assume that the aliment is affected only from the position of the translating word 3) Assuming that translating strings could have any length

Model #1 lexicon: (book) (chair) (sky) (tree) (sun) (fish) (ship) (ring) (sea) (cloud) … w1w1 w2w2 w3w3 Possible word translation b1b1 b2b2 b3b3b Set of Blobs given String size m P(m|b)=const sun t(w 3 |b a 3 )=P(w 3 |a 3, w 2,b,m) P(a 3 | 3, b,m)=P(a 3 |a 2, w 2,b,m)

Model #1  Our mathematical goal is to manufacture a probability table that represents the probability for a given blob the distribution of all the possible translations blob 3 circle blob 1 0.9sun 0.95ball 0.01man 0.89earth

Model #1

 E-step: defining the expectation of the complete-data log likelihood And computing it And computing it

Model #1 Taking into consideration the following constraints and We can use the LaGrange multipliers to maximize the likelihood function The obtained lagrngian is And the equations needed to be solved for maximization are : M-step: maximizing the expectation we computed

Model #1 Solving this set of equations yields a new set that converges iteratively

Model #1 Further refinements  Words may not be predicted with highest probability for any blob. Choosing smaller lexicon Rerunning process  Assigning NULL words when P(word|blob) > threshold

Model #1  Visually indistinguishable Indistinguishable words Practically indistinguishable Entangled correspondence polar – bearmare/foals - horse Clustering similar words Rerunning process

Model #1 - Results Experimental Results Settings:  4500 Corel images, with 4-5 keywords each  371 words vocabulary  typically 5-10 regions each image  500 blobs  33 features for each region

Model #1 - Results Annotation Recall / Precision precision recall Original words Original words Refitted words Refitted words Clustered words Clustered words 500 test images Only 80 words predicted

Model #1 - Results Correspondence 100 test images Prediction rate Original words Prediction rate Clustered words Null threshold 0.2 Dark blue – total # of time a blob predicts the word, which is one of the image keywords Light blue – average # of times a blob predicts the word correctly in the right place

Model #1 - Results Some Results  Successful results

Model #1 - Results Some Results  Non-successful results

Model #1 - Results Some Results  Assigning null

Model #1 - Results Some Results  Clustering words – 1 st iteration

Model #1 - Results Some Results  Clustering words – 2 nd iteration

Model #1 - Results Some Results  Clustering words – 3 rd iteration

Content 1) Motivation 2) Introduction to Semantics 3) Difficulties 4) Possible Approaches 5) Models Model #1 :Model #1 : Object recognition as machine translation Object recognition as machine translation Model #2 :Model #2 : Learning semantics by hierarchical clustering of words and image regions Learning semantics by hierarchical clustering of words and image regions

Model #2 Learning semantics by hierarchical clustering of words and image segments  Description: Statistical modeling of words and image feature occurrence and co-occurrence, organizing image collections into clusters using a hierarchical model

Model #2 Objective  Indexing image databases, by integrating semantic information provided by visual information and associated text  Organize images in a way that exposes as much semantic structure to the user as possible

Model #2 Given: Set of Images + Associated text to each image Processing Indexed data Browsing Query items Search Image / text Auto-Annotate / Auto-Illustrate

Model #2 Hierarchical Structure  Encourages semantic perception - levels of generalization levels of generalization general  specific general  specific  Useful structure for browsing  Natural data organization – coarse  fine coarse  fine

Model #2 Hierarchical Structure  Hierarchy of occurrences

Model #2 Initial Processing Image segmentation Segments  Blobs Items countDistribution histogram Per image

Model #2 Hierarchy Creation  Histogram discretization Items Occurrences K-means

Model #2 Hierarchy Creation  Building tree graph levels 4 levels tree

Model #2 Hierarchy Creation  Creating hierarchy trees 3 levels tree P(co-occurrences) Eliminating: P < threshold or fixed tree fan-out

Model #2 Hierarchy Creation  Cluster = leaves  leaves = path Cluster of items

Model #2 Hierarchy Creation  Modelling data as being generated by the nodes along a path sky sun, sea waves sky sun sea waves

Model #2 Hierarchy Creation  Adjacent clusters sun, sea waves sky sun sea waves sky rocks sky sun sea rocks

Model #2 Indexing  Document indexing = # of document items in cluster # of document items Normalized

Model #2 Indexing  Calculating P(c) n d = total # of documents

Model #2 Using the Model  Browsing Ocean Dolphins Whales Corals Etc… Ocean Dolphins Tale Head Etc…  Using the tree structure and nodes items

model #2 Using the Model  Search Given a set of observations Q Probability value P(Q|d) for each document d in the database Threshold A set of documents matching the observations Q

Model #2 Using the Model  Search = probability of item in node = # of items from node in document # of document items Conditional probability p(Q|d), (likelihood function) :

Model #2 - Results Experimental Results Settings:  Corel database:  300 Coral images, with 4-5 keywords each  64 clusters  SF Fine Art Museum database:  Training on 8405 museum images, with attached text.  3319 words vocabulary  256 clusters  typically 5-10 regions each image  Processing ~ Hours

Model #2 - Results Browsing Results  Do the clusters found, make sense to humans? 64 clusters64 random sets 94% accuracy

Model #2 - Results Browsing Results  Successful clusters

Model #2 - Results Browsing Results  Non-successful cluster

Model #2 - Results Browsing Results  Does clustering on image segments and words has an advantage over either alone? Clustering by text only Clustering by image features only

Model #2 - Results Browsing Results Clustering by both text and image features only

Model #2 - Results Search Results query: tiger, river tiger, cat, water, grass tiger, cat, water, trees tiger, cat, water, grass tiger, cat, grass, forest tiger, cat, water, grass

Model #2 - Results Auto - Annotation  Associating words with images grass, tiger, cat, forest hippo, bull, mouth, walk flower, coralberry, leaves, plant tiger, grass, cat, people, water, Bengal, buildings water, hippos, rhino, river, grass, reflection, plain fish, reef, church, wall, people, water, landscape

Content 1) Motivation 2) Introduction to Semantics 3) Difficulties 4) Possible Approaches 5) Models Model #1 :Model #1 : Object recognition as machine translation Object recognition as machine translation Model #2 :Model #2 : Learning semantics by clustering of words and image regions Learning semantics by clustering of words and image regions Summery

The End