Presented by, Biswaranjan Panda and Moutupsi Paul Beyond Nouns -Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers Ref.

Slides:



Advertisements
Similar presentations
A Probabilistic Representation of Systemic Functional Grammar Robert Munro Department of Linguistics, SOAS, University of London.
Advertisements

Context-based Visual Concept Detection Using Domain Adaptive Semantic Diffusion Yu-Gang Jiang, Jun Wang, Shih-Fu Chang, Chong-Wah Ngo VIREO Research Group.
LEARNING SEMANTICS OF WORDS AND PICTURES TEJASWI DEVARAPALLI.
Presenter: Duan Tran (Part of slides are from Pedro’s)
Weakly supervised learning of MRF models for image region labeling Jakob Verbeek LEAR team, INRIA Rhône-Alpes.
Exploiting Big Data via Attributes (Offline Contd.)
Context-based object-class recognition and retrieval by generalized correlograms by J. Amores, N. Sebe and P. Radeva Discussion led by Qi An Duke University.
Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.
Multi-Document Person Name Resolution Michael Ben Fleischman (MIT), Eduard Hovy (USC) From Proceedings of ACL-42 Reference Resolution workshop 2004.
A Unified Framework for Context Assisted Face Clustering
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Constrained Approximate Maximum Entropy Learning (CAMEL) Varun Ganapathi, David Vickrey, John Duchi, Daphne Koller Stanford University TexPoint fonts used.
Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,
Inference Network Approach to Image Retrieval Don Metzler R. Manmatha Center for Intelligent Information Retrieval University of Massachusetts, Amherst.
Patch to the Future: Unsupervised Visual Prediction
INTRODUCTION Heesoo Myeong, Ju Yong Chang, and Kyoung Mu Lee Department of EECS, ASRI, Seoul National University, Seoul, Korea Learning.
LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: From EMNLP.
Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs Roozbeh Mottaghi 1, Sanja Fidler 2, Jian Yao 2, Raquel Urtasun 2, Devi Parikh 3 1 UCLA.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Automatic Image Annotation and Retrieval using Cross-Media Relevance Models J. Jeon, V. Lavrenko and R. Manmathat Computer Science Department University.
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
Segmentation by Clustering Reading: Chapter 14 (skip 14.5) Data reduction - obtain a compact representation for interesting image data in terms of a set.
WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES Prasad Gabbur, Kobus Barnard University of Arizona.
Unsupervised Category Modeling, Recognition and Segmentation Sinisa Todorovic and Narendra Ahuja.
Beyond Nouns Abhinav Gupta and Larry S. Davis University of Maryland, College Park Exploiting Prepositions and Comparative Adjectives for Learning Visual.
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Image Annotation and Feature Extraction
Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation
2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.
Languages and Images Virginia Tech ECE /04/25 Stanislaw Antol.
Professor: S. J. Wang Student : Y. S. Wang
HANOLISTIC: A HIERARCHICAL AUTOMATIC IMAGE ANNOTATION SYSTEM USING HOLISTIC APPROACH Özge Öztimur Karadağ & Fatoş T. Yarman Vural Department of Computer.
MRFs and Segmentation with Graph Cuts Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 02/24/10.
Recognition using Regions (Demo) Sudheendra V. Outline Generating multiple segmentations –Normalized cuts [Ren & Malik (2003)] Uniform regions –Watershed.
ALIP: Automatic Linguistic Indexing of Pictures Jia Li The Pennsylvania State University.
TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.
Reading Between The Lines: Object Localization Using Implicit Cues from Image Tags Sung Ju Hwang and Kristen Grauman University of Texas at Austin Jingnan.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
INTRODUCTION Heesoo Myeong and Kyoung Mu Lee Department of ECE, ASRI, Seoul National University, Seoul, Korea Tensor-based High-order.
Beyond Nouns Exploiting Preposition and Comparative adjectives for learning visual classifiers.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
Object Recognition a Machine Translation Learning a Lexicon for a Fixed Image Vocabulary Miriam Miklofsky.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Tell Me What You See and I will Show You Where It Is Jia Xu 1 Alexander G. Schwing 2 Raquel Urtasun 2,3 1 University of Wisconsin-Madison 2 University.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.
Hierarchical Clustering for POS Tagging of the Indonesian Language Derry Tanti Wijaya and Stéphane Bressan.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
Object Recognition Part 2 Authors: Kobus Barnard, Pinar Duygulu, Nado de Freitas, and David Forsyth Slides by Rong Zhang CSE 595 – Words and Pictures Presentation.
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Exploiting Ontologies for Automatic Image Annotation Munirathnam Srikanth, Joshua Varner, Mitchell Bowden, Dan Moldovan Language Computer Corporation SIGIR.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Yixin Chen and James Z. Wang The Pennsylvania State University
Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic Framework N 工科所 錢雅馨 2011/01/16 Li-Jia Li, Richard.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Object Recognition by Integrating Multiple Image Segmentations Caroline Pantofaru, Cordelia Schmid, Martial Hebert ECCV 2008 E.
Sung Ju Hwang and Kristen Grauman University of Texas at Austin.
Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
Nonparametric Semantic Segmentation
Context-Aware Modeling and Recognition of Activities in Video
RGB-D Image for Scene Recognition by Jiaqi Guo
Matching Words with Pictures
Presented by Wanxue Dong
Word embeddings (continued)
Using simple machine learning for image segmentation
Presentation transcript:

Presented by, Biswaranjan Panda and Moutupsi Paul Beyond Nouns -Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers Ref :

Outline Richer linguistic descriptions of images makes learning of object appearance models from weakly labeled images more reliable. Constructing visually-grounded models for parts of speech other than nouns provides contextual models that make labeling new images more reliable. So, this talk is about simultaneous learning of object appearance models and context models for scene analysis. car officer road A officer on the left of car checks the speed of other cars on the road A B Larger (B, A) Larger (tiger, cat) cat tiger BearWaterField A B Larger (A, B) A B Above (A, B) Ref :

Co-occurrence Relationship (Problems) RoadCarRoad Car Road Car RoadCarRoad Car RoadCar Road Car Hypothesis 1 Hypothesis 2 CarRoad Ref :

Beyond Nouns – Exploit Relationships Use annotated text to extract nouns and relationships between nouns. road.officer on the left of carchecks the speed of other cars on theA On (car, road) Left (officer, car) car officer road Constrain the correspondence problem using the relationships On (Car, Road) Road Car Road Car More Likely Less Likely Ref :

Beyond Nouns - Overview Learn classifiers for both Nouns and Relationships simultaneously. – Classifiers for Relationships based on differential features. Learn priors on possible relationships between pairs of nouns – Leads to better Labeling Performance above (sky, water) above (water, sky) sky water sky water Ref :

Representation Each image is first segmented into regions. Regions are represented by feature vectors based on: – Appearance (RGB, Intensity) – Shape (Convexity, Moments) Models for nouns are based on features of the regions Relationship models are based on differential features: – Difference of avg. intensity – Difference in location Assumption: Each relationship model is based on one differential feature for convex objects. Learning models of relationships involves feature selection. Each image is also annotated with nouns and a few relationships between those nouns. B B A A B below A Ref :

Learning the Model – Chicken Egg Problem Learning models of nouns and relationships requires solving the correspondence problem. To solve the correspondence problem we need some model of nouns and relationships. Chicken-Egg Problem: We treat assignment as missing data and formulate an EM approach. Road Car Road Assignment Problem Learning Problem On (car, road) Ref :

EM Approach- Learning the Model E-Step: Compute the noun assignment for a given set of object and relationship models from previous iteration ( ). M-Step: For the noun assignment computed in the E-step, we find the new ML parameters by learning both relationship and object classifiers. For initialization of the EM approach, we can use any image annotation approach with localization such as the translation based model described in [1]. [1] Duygulu, P., Barnard, K., Freitas, N., Forsyth, D.: Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. ECCV (2002) Ref :

Inference Model Image segmented into regions. Each region represented by a noun node. Every pair of noun nodes is connected by a relationship edge whose likelihood is obtained from differential features. n1n1 n2n2 n3n3 r 12 r 13 r 23

Experimental Evaluation – Corel 5k Dataset Evaluation based on Corel5K dataset [1]. Used 850 training images with tags and manually labeled relationships. Vocabulary of 173 nouns and 19 relationships. We use the same segmentations and feature vector as [1]. Quantitative evaluation of training based on 150 randomly chosen images. Quantitative evaluation of labeling algorithm (testing) was based on 100 test images. Ref :

Resolution of Correspondence Ambiguities Evaluate the performance of our approach for resolution of correspondence ambiguities in training dataset. Evaluate performance in terms of two measures [2]: – Range Semantics Counts the percentage of each word correctly labeled by the algorithm Sky treated the same as Car – Frequency Correct Counts the number of regions correctly labeled by the algorithm Sky occurs more frequently than Car [2] Barnard, K., Fan, Q., Swaminathan, R., Hoogs, A., Collins, R., Rondot, P., Kaufold, J.: Evaluation of localized semantics: data, methodology and experiments. Univ. of Arizona, TR-2005 (2005) Duygulu et. al [1]Our Approach [1] Duygulu, P., Barnard, K., Freitas, N., Forsyth, D.: Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. ECCV (2002) below(birds,sun) above(sun, sea) brighter(sun,sea) below(waves,sun) above(statue,rocks); ontopof(rocks, water); larger(water,statue) below(flowers,horses); ontopof(horses,field); below(flowers,foals) Ref :

Resolution of Correspondence Ambiguities Compared the performance with IBM Model 1[3] and Duygulu et. al[1] Show importance of prepositions and comparators by bootstrapping our EM- algorithm. (b) Semantic Range (a) Frequency Correct

Examples of labeling test images Duygulu (2002) Our Approach Ref :

Evaluation of labeling test images Evaluate the performance of labeling based on annotation from Corel5K dataset Set of Annotations from Ground Truth from Corel Set of Annotations provided by the algorithm Choose detection thresholds to make the number of missed labels approximately equal for two approaches, then compare labeling accuracy

Precision-Recall RecallPrecision [1]Ours[1]Ours Water Grass Clouds Buildings Sun Sky Tree

Conclusions Richer natural language descriptions of images make it easier to build appearance models for nouns. Models for prepositions and adjectives can then provide us contextual models for labeling new images.