Learning Models for Object Recognition from Natural Language Descriptions Presenters: Sagardeep Mahapatra – 108771077 Keerti Korrapati - 108694316.

Slides:



Advertisements
Similar presentations
CVPR2013 Poster Modeling Actions through State Changes.
Advertisements

Road-Sign Detection and Recognition Based on Support Vector Machines Saturnino, Sergio et al. Yunjia Man ECG 782 Dr. Brendan.
Evaluating Color Descriptors for Object and Scene Recognition Koen E.A. van de Sande, Student Member, IEEE, Theo Gevers, Member, IEEE, and Cees G.M. Snoek,
Human Identity Recognition in Aerial Images Omar Oreifej Ramin Mehran Mubarak Shah CVPR 2010, June Computer Vision Lab of UCF.
Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”
IIIT Hyderabad Pose Invariant Palmprint Recognition Chhaya Methani and Anoop Namboodiri Centre for Visual Information Technology IIIT, Hyderabad, INDIA.
Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,
Adviser:Ming-Yuan Shieh Student:shun-te chuang SN:M
Breaking an Animated CAPTCHA Scheme
Improving Chinese handwriting Recognition by Fusing speech recognition Zhang Xi-Wen CSE, CUHK and HCI Lab., ISCAS
Landmark Classification in Large- scale Image Collections Yunpeng Li David J. Crandall Daniel P. Huttenlocher ICCV 2009.
Recognition using Regions CVPR Outline Introduction Overview of the Approach Experimental Results Conclusion.
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
The College of Saint Rose CIS 460 – Search and Information Retrieval David Goldschmidt, Ph.D. from Search Engines: Information Retrieval in Practice, 1st.
A Study of Approaches for Object Recognition
Processing Digital Images. Filtering Analysis –Recognition Transmission.
5/30/2006EE 148, Spring Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray.
Smart Traveller with Visual Translator. What is Smart Traveller? Mobile Device which is convenience for a traveller to carry Mobile Device which is convenience.
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
© 2013 IBM Corporation Efficient Multi-stage Image Classification for Mobile Sensing in Urban Environments Presented by Shashank Mujumdar IBM Research,
SIEVE—Search Images Effectively through Visual Elimination Ying Liu, Dengsheng Zhang and Guojun Lu Gippsland School of Info Tech,
Exercise Session 10 – Image Categorization
Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)
1 Template-Based Classification Method for Chinese Character Recognition Presenter: Tienwei Tsai Department of Informaiton Management, Chihlee Institute.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
2 Outline Introduction –Motivation and Goals –Grayscale Chromosome Images –Multi-spectral Chromosome Images Contributions Results Conclusions.
Presented by Tienwei Tsai July, 2005
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A semantic approach for question classification using.
Hands segmentation Pat Jangyodsuk. Motivation Alternative approach of finding hands Instead of finding bounding box, classify each pixel whether they’re.
Why Categorize in Computer Vision ?. Why Use Categories? People love categories!
Building local part models for category-level recognition C. Schmid, INRIA Grenoble Joint work with G. Dorko, S. Lazebnik, J. Ponce.
Svetlana Lazebnik, Cordelia Schmid, Jean Ponce
Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Instance Filtering for Entity Recognition Advisor : Dr.
Unsupervised Learning of Visual Sense Models for Polysemous Words Kate Saenko Trevor Darrell Deepak.
Chao-Yeh Chen and Kristen Grauman University of Texas at Austin Efficient Activity Detection with Max- Subgraph Search.
Human pose recognition from depth image MS Research Cambridge.
Scene Completion Using Millions of Photographs James Hays, Alexei A. Efros Carnegie Mellon University ACM SIGGRAPH 2007.
1 Research Question  Can a vision-based mobile robot  with limited computation and memory,  and rapidly varying camera positions,  operate autonomously.
2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )
Levels of Image Data Representation 4.2. Traditional Image Data Structures 4.3. Hierarchical Data Structures Chapter 4 – Data structures for.
Kylie Gorman WEEK 1-2 REVIEW. CONVERTING AN IMAGE FROM RGB TO HSV AND DISPLAY CHANNELS.
From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.
Page 1 NAACL-HLT 2010 Los Angeles, CA Training Paradigms for Correcting Errors in Grammar and Usage Alla Rozovskaya and Dan Roth University of Illinois.
Face Image-Based Gender Recognition Using Complex-Valued Neural Network Instructor :Dr. Dong-Chul Kim Indrani Gorripati.
Team Members Ming-Chun Chang Lungisa Matshoba Steven Preston Supervisors Dr James Gain Dr Patrick Marais.
Objectives: Terminology Components The Design Cycle Resources: DHS Slides – Chapter 1 Glossary Java Applet URL:.../publications/courses/ece_8443/lectures/current/lecture_02.ppt.../publications/courses/ece_8443/lectures/current/lecture_02.ppt.
PoS tagging and Chunking with HMM and CRF
Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic Framework N 工科所 錢雅馨 2011/01/16 Li-Jia Li, Richard.
Scanned Documents INST 734 Module 10 Doug Oard. Agenda Document image retrieval  Representation Retrieval Thanks for David Doermann for most of these.
Object Recognition by Discriminative Combinations of Line Segments and Ellipses Alex Chia ^˚ Susanto Rahardja ^ Deepu Rajan ˚ Maylor Leung ˚ ^ Institute.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Multi-view Traffic Sign Detection, Recognition and 3D Localisation Radu Timofte, Karel Zimmermann, and Luc Van Gool.
Peter Matthews, Cliff C. Zou University of Central Florida AsiaCCS 2010.
1 A Statistical Matching Method in Wavelet Domain for Handwritten Character Recognition Presented by Te-Wei Chiang July, 2005.
Optical Character Recognition
Student Gesture Recognition System in Classroom 2.0 Chiung-Yao Fang, Min-Han Kuo, Greg-C Lee, and Sei-Wang Chen Department of Computer Science and Information.
Deep Learning Amin Sobhani.
Improving Chinese handwriting Recognition by Fusing speech recognition
Supervised Time Series Pattern Discovery through Local Importance
Using Transductive SVMs for Object Classification in Images
Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science
A New Approach to Track Multiple Vehicles With the Combination of Robust Detection and Two Classifiers Weidong Min , Mengdan Fan, Xiaoguang Guo, and Qing.
Attributes and Simile Classifiers for Face Verification
Text Detection in Images and Video
RGB-D Image for Scene Recognition by Jiaqi Guo
Fine-Grained Visual Categorization
PROJECTS SUMMARY PRESNETED BY HARISH KUMAR JANUARY 10,2018.
Outline Background Motivation Proposed Model Experimental Results
Presentation transcript:

Learning Models for Object Recognition from Natural Language Descriptions Presenters: Sagardeep Mahapatra – Keerti Korrapati

. Goal Learning models for visual object recognition from natural language descriptions alone Why learn model from natural language? Manually collecting and labeling large image sets is difficult New training set needs to be created for each new category Finding images for fined grained object categories is tough Ex- species of plants and animals But detailed visual descriptions may be readily available

. Outline Datasets for training and testing Natural Language Processing methods Template Filling Extraction of visual attributes from test images Score an image against the learnt template models Results Observations

. Dataset Text descriptions associated with ten species of butterflies from the eNature guide to construct the template model Butterflies, because they have distinctive visual features like wing colors, spots, etc Images downloaded from google for each of the ten butterfly categories form the testing set » Danaus plexippus Heliconius charitonius Heliconius erato Junonia coenia Lycaena phlaeas Nymphalis antiopa Papilio cresphontes Pieris rapae Vanessa atalanta Vanessa cardui

. Natural Language Processing Goal: Convert unstructured data in descriptions into structured templates Factual but unstructured data in text Information Extraction ……….. …….…. ………..

. Template Filling Text is tokenized into words Tokens are tagged with parts of speech (using C&C tagger) Custom transformations are performed to correct known mistakes Required because eNature guide tends to suppress some information Chunks of texts matching pre-defined tag sequence are extracted Ex- noun phrases (‘wings have blue spots’), adjective phrases (‘wings are black’) Extracted phrases are filtered through a list of colors, patterns and positions to fill the template slots Tokenization Part-of-Speech Tagging Custom Transformation ChunkingTemplate Filling

Visual Processing Performed based on two attributes of butterflies Dominant Wing Color Colored Spots 1) Image Segmentation Variation in the background can pose challenges during image classification Hence, the butterfly image was segmented from the background using the ‘star shape’ graph cut approach

2) Spot Detection (Using a spot classifier) Hand marked butterfly images with no prior class information form the training set for the spot classifier Candidate regions likely to be spots are extracted by using Difference-of-Gaussians interest point operator Image descriptors (SIFT features) are extracted around the candidate spot to classify it as a spot or non-spot 3) Color Modelling Required to connect color names of dominant wing colors and spot colors in learnt templates to image observations For each color name c i, probability distribution p(z|c i ) was learnt from training butterfly images,where z is a pixel color observation in the L*a*b* color space

Generative Model Given an input image I the probability of the image given a butterfly category Bi as a product over the spot and wing observations: Spot color name prior Equal priors to all spot colors Dominant color name prior

. Experimental Results Two set of experiments were performed Performance of human beings in recognizing butterflies from textual descriptions Because this may be reasonably considered as an upper bound Performance of the proposed method

Human Performance

Performance of proposed method

Observations Accuracy of proposed method was comparable to accuracy of non-native English speakers Accuracy of proposed method was more than 80 percent for four categories Classification of ‘Heliconius charitonius’ was the toughest for humans and also with the ground-truth and learnt templates Performance with ground-truth templates was comparable to that with the learnt templates Errors in templates due to NLP methods did not have much impact

Thank You