Pictures and Words. Vision and language in human brain FFA LOC V1 PPA Broca Area Wernicke Area LanguageVision.

Slides:



Advertisements
Similar presentations
LEARNING SEMANTICS OF WORDS AND PICTURES TEJASWI DEVARAPALLI.
Advertisements

Presented by, Biswaranjan Panda and Moutupsi Paul Beyond Nouns -Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers Ref.
Weakly supervised learning of MRF models for image region labeling Jakob Verbeek LEAR team, INRIA Rhône-Alpes.
Computer Vision - A Modern Approach Set: Recognition by relations Slides by D.A. Forsyth Matching by relations Idea: –find bits, then say object is present.
Features Accurate annotation – Average key point number for each object is more than 100. (see counts histogram on website) Multiple annotation methods.
Simultaneous Image Classification and Annotation Chong Wang, David Blei, Li Fei-Fei Computer Science Department Princeton University Published in CVPR.
A Novel Approach for Recognizing Auditory Events & Scenes Ashish Kapoor.
1 Overview of Image Retrieval Hui-Ying Wang. 2/42 Reference Smeulders, A. W., Worring, M., Santini, S., Gupta, A.,, and Jain, R “Content-based.
›SIFT features [3] were computed for 100 images (from ImageNet [4]) for each of our 48 subordinate-level categories. ›The visual features of an image were.
SUPER: Towards Real-time Event Recognition in Internet Videos Yu-Gang Jiang School of Computer Science Fudan University Shanghai, China
Computer and Robot Vision I
Visiwords John Tait Chief Scientific Officer. Warning A few half formed ideas from the world of image and video indexing which may be of interest to MT.
Li-Jia Li Yongwhan Lim Li Fei-Fei Chong Wang David M. Blei B UILDING AND U SING A S EMANTIVISUAL I MAGE H IERARCHY CVPR, 2010.
Unsupervised Learning of Categorical Segments in Image Collections *California Institute of Technology **Technion Marco Andreetto*, Lihi Zelnik-Manor**,
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic Framework Li-Jia Li, Richard Socher, Li Fei- Fei 1.
Image Search Presented by: Samantha Mahindrakar Diti Gandhi.
CS335 Principles of Multimedia Systems Content Based Media Retrieval Hao Jiang Computer Science Department Boston College Dec. 4, 2007.
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Object Recognition: History and Overview Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce.
Segmentation by Clustering Reading: Chapter 14 (skip 14.5) Data reduction - obtain a compact representation for interesting image data in terms of a set.
Object Class Recognition using Images of Abstract Regions Yi Li, Jeff A. Bilmes, and Linda G. Shapiro Department of Computer Science and Engineering Department.
Object Class Recognition Readings: Yi Li’s 2 Papers Abstract Regions Paper 1: EM as a Classifier Paper 2: Generative/Discriminative Classifier.
Spatial Pyramid Pooling in Deep Convolutional
Beyond Nouns Abhinav Gupta and Larry S. Davis University of Maryland, College Park Exploiting Prepositions and Comparative Adjectives for Learning Visual.
Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.
Jia Li, Ph.D. The Pennsylvania State University Image Retrieval and Annotation via a Stochastic Modeling Approach.
Wen-Chyi Lin CS2310 Software Engineering.  “Never express yourself more clearly than you are able to think” by Niels Bohr. However, there are times and.
Intelligent Indexing and Retrieval of Images A Machine Learning Approach Yixin Chen Dept. of Computer Science and Engineering The Pennsylvania State University.
Content-Based Image Retrieval - Approaches and Trends of the New Age
Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing.
Multimedia Databases (MMDB)
Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation
HANOLISTIC: A HIERARCHICAL AUTOMATIC IMAGE ANNOTATION SYSTEM USING HOLISTIC APPROACH Özge Öztimur Karadağ & Fatoş T. Yarman Vural Department of Computer.
(Infinitely) Deep Learning in Vision Max Welling (UCI) collaborators: Ian Porteous (UCI) Evgeniy Bart UCI/Caltech) Pietro Perona (Caltech)
Category Discovery from the Web slide credit Fei-Fei et. al.
ALIP: Automatic Linguistic Indexing of Pictures Jia Li The Pennsylvania State University.
IMAGINATION: A Robust Image-based CAPTCHA Generation System Ritendra Datta, Jia Li, and James Z. Wang The Pennsylvania State University – University Park.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Eric Xing © Eric CMU, Machine Learning Latent Aspect Models Eric Xing Lecture 14, August 15, 2010 Reading: see class homepage.
IEEE Int'l Symposium on Signal Processing and its Applications 1 An Unsupervised Learning Approach to Content-Based Image Retrieval Yixin Chen & James.
INTRODUCTION Heesoo Myeong and Kyoung Mu Lee Department of ECE, ASRI, Seoul National University, Seoul, Korea Tensor-based High-order.
Template matching and object recognition. CS8690 Computer Vision University of Missouri at Columbia Matching by relations Idea: –find bits, then say object.
MSRI workshop, January 2005 Object Recognition Collected databases of objects on uniform background (no occlusions, no clutter) Mostly focus on viewpoint.
Putting Context into Vision Derek Hoiem September 15, 2004.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Hierarchical Matching with Side Information for Image Classification
Object Recognition Part 2 Authors: Kobus Barnard, Pinar Duygulu, Nado de Freitas, and David Forsyth Slides by Rong Zhang CSE 595 – Words and Pictures Presentation.
Image Classification for Automatic Annotation
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Exploiting Ontologies for Automatic Image Annotation Munirathnam Srikanth, Joshua Varner, Mitchell Bowden, Dan Moldovan Language Computer Corporation SIGIR.
Semantic Extraction and Semantics-Based Annotation and Retrieval for Video Databases Authors: Yan Liu & Fei Li Department of Computer Science Columbia.
Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic Framework N 工科所 錢雅馨 2011/01/16 Li-Jia Li, Richard.
1.Learn appearance based models for concepts 2.Compute posterior probabilities or Semantic Multinomial (SMN) under appearance models. -But, suffers from.
Sung Ju Hwang and Kristen Grauman University of Texas at Austin.
2D to 3D Conversion Using 3D Database For Football Scenes Kiana Calagari Final Project of CMPT880 July 2013.
Max-Margin Training of Upstream Scene Understanding Models Jun Zhu Carnegie Mellon University Joint work with Li-Jia Li *, Li Fei-Fei *, and Eric P. Xing.
数字视频技术 --- 视觉数据挖掘与内容搜索 数字视频技术. 数据挖掘  The process of extracting patterns from data.  The process of analyzing data from different perspectives and summarizing.
Finding Clusters within a Class to Improve Classification Accuracy Literature Survey Yong Jae Lee 3/6/08.
Content-Based Image Retrieval
Image Retrieval and Annotation via a Stochastic Modeling Approach
Semantics Sensitive Segmentation and Annotation of Natural Images
A Forest of Sensors: Using adaptive tracking to classify and monitor activities in a site Eric Grimson AI Lab, Massachusetts Institute of Technology
Finding Things: Image Parsing with Regions and Per-Exemplar Detectors
Color-Texture Analysis for Content-Based Image Retrieval
Accounting for the relative importance of objects in image retrieval
Image Segmentation Techniques
Eric Grimson, Chris Stauffer,
Ying Dai Faculty of software and information science,
Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation
Presentation transcript:

Pictures and Words

Vision and language in human brain FFA LOC V1 PPA Broca Area Wernicke Area LanguageVision

Vision and language in human brain figure modified from:

Vision and language in human brain figure modified from: (Translation: “This is not a pipe.”) ?

Fei-Fei, Iyer, Koch, Perona, JoV, 2007 What can you see in a glance of a scene?

I think I saw two people on a field. (Subject: RW) Outdoor scene. There were some kind of animals, maybe dogs or horses, in the middle of the picture. It looked like they were running in the middle of a grassy field. (Subject: IV) two people, whose profile was toward me. looked like they were on a field of some sort and engaged in some sort of sport (their attire suggested soccer, but it looked like there was too much contact for that). (Subject: AI) Some kind of game or fight. Two groups of two men? The foregound pair looked like one was getting a fist in the face. Outdoors seemed like because i have an impression of grass and maybe lines on the grass? That would be why I think perhaps a game, rough game though, more like rugby than football because they pairs weren't in pads and helmets, though I did get the impression of similar clothing. maybe some trees? in the background. (Subject: SM) PT = 500ms PT = 27ms PT = 40ms PT = 67ms This was a picture with some dark sploches in it. Yeah...that's about it. (Subject: KM) PT = 107ms Fei-Fei, Iyer, Koch, Perona, JoV, 2007

Section outline Early “pictures and words” work Content-based retrieval Beyond nouns, towards total scene annotation

“Pictures and words” Barnard, Duygulu, de Freitas, Forsyth, Blei, Jordan, Matching words and pictures, JMLR, 2003 Duygulu, Barnard, de Freitas, Forsyth, Object Recognition as Machine Translation: Learning a lexicon for a fixed image vocabulary, ECCV, 2003 Blei & Jordan, Modeling annotated data, ACM SIGIR, 2003 Chang, Goh, Sychay, & Wu, Soft annotation using Bayes point machines, IEEE Transactions on Circuits and Systems for Video Technology, 2003 Goh, Chang, & Cheng, Ensemble of SVM-based classifiers for annotation, 2003 ….

Barnard et al. JMLR, 2005 Images are composed of multimodal “concepts”. Images are clustered based on priors over concepts. Learning determines localized concepts models from global annotations. – Addresses the correspondence problem – One possible assumption: concept models simultaneously generate both a word and blob sun sky water waves Slide courtesy of Kobus Barnard (1 hour ago!)

Barnard et al. JMLR, 2005 sun sky water waves Slide courtesy of Kobus Barnard (1 hour ago!) A generative model for assembling image data sets from multimodal clusters – Chose an image cluster by p(c) – Chose multimodal concept clusters using p(s|c) – From each multimodal cluster, sample a Gaussian for blob features, p(b|s), and a multinomial for words, p(w|s) – (Skip with some probability to account for mismatched numbers of words and blobs) – For a given correspondence*

Barnard et al. JMLR, 2005

Section outline Early “pictures and words” work Content-based retrieval Beyond nouns, towards total scene annotation

Content-based retrieval Rose Flower Petals Australian Floribunda Rose Love Corolla TowerFrance Eiffel Tower Paris Elegance Symmetry Slide courtesy of Ritendra Datta, Jia Li, James Z. Wang

Literature – MANY!!! A. W. Smeulders, M. Worring, S. Santini, A. Gupta, R. Jain, Content-Based Image Retrieval at the End of the Early Years, IEEE Trans. Pattern Analysis and Machine Intelligence, 22(12): , R. Datta, D. Joshi, J. Li, and J. Z. Wang, Image Retrieval: Ideas, Influences, and Trends of the New Age, ACM Computing Surveys, vol. 40, no. 2, pp. 5:1-60, 2008.

Try out Alipr (

Automatic Image Annotation: ALIP Slide courtesy of Ritendra Datta, Jia Li, James Z. Wang

Automatic Image Annotation: ALIP Slide courtesy of Ritendra Datta, Jia Li, James Z. Wang

Automatic Image Annotation: ALIP 2D-MHMM: Two-dimensional multi-resolution hidden Markov model Slide courtesy of Ritendra Datta, Jia Li, James Z. Wang

Automatic Image Annotation: ALIP Classification results form the basis Salient words appearing in the classification favored more Annotation Process Building, sky, lake, landscape, Europe, tree Food, indoor, cuisine, dessert Snow, animal, wildlife, sky, cloth, ice, people Slide courtesy of Ritendra Datta, Jia Li, James Z. Wang

Section outline Early “pictures and words” work Content-based retrieval Beyond nouns, towards total scene annotation – Propositions A. Gupta and L. S. Davis, Beyond Nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers, ECCV, 2008 – Objects, scenes, activities L.-J. Li and L. Fei-Fei. What, where and who? Classifying event by scene and object recognition. ICCV, 2007 L.-J. Li, R. Socher and L. Fei-Fei. Towards Total Scene Understanding:Classification, Annotation and Segmentation in an Automatic Framework. CVPR, 2009

Section outline Early “pictures and words” work Content-based retrieval Beyond nouns, towards total scene annotation – Propositions A. Gupta and L. S. Davis, Beyond Nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers, ECCV, 2008 – Objects, scenes, activities L.-J. Li and L. Fei-Fei. What, where and who? Classifying event by scene and object recognition. ICCV, 2007 L.-J. Li, R. Socher and L. Fei-Fei. Towards Total Scene Understanding:Classification, Annotation and Segmentation in an Automatic Framework. CVPR, 2009

Gupta & Davis, EECV, 2008 “Beyond nouns”

Gupta & Davis, EECV, 2008 “Beyond nouns”

Gupta & Davis, EECV, 2008

Section outline Early “pictures and words” work Content-based retrieval Beyond nouns, towards total scene annotation – Propositions A. Gupta and L. S. Davis, Beyond Nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers, ECCV, 2008 – Objects, scenes, activities L.-J. Li and L. Fei-Fei. What, where and who? Classifying event by scene and object recognition. ICCV, 2007 L.-J. Li, R. Socher and L. Fei-Fei. Towards Total Scene Understanding:Classification, Annotation and Segmentation in an Automatic Framework. CVPR, 2009

What, where and who? Classifying events by scene and object recognition L-J Li & L. Fei-Fei, ICCV 2007

scene pathwayobject pathway event L.-J. Li & L. Fei-Fei ICCV 2007 “where” pathway “what” pathway PFC

scene pathway “Polo Field” L.-J. Li & L. Fei-Fei ICCV 2007 Fei-Fei & Perona, CVPR, 2005

object pathway O= ‘horse’ L.-J. Li & L. Fei-Fei ICCV 2007 L.-J. Li, G. Wang & L. Fei-Fei, CVPR, 2007 G. Wang & L. Fei-Fei, CVPR, 2006 L. Cao & L. Fei-Fei, ICCV, 2007

The 3W stories what whowhere L.-J. Li & L. Fei-Fei ICCV 2007

ClassificationAnnotationSegmentation Horse Sky Tree Grass Athlete Horse Grass Trees Sky Saddle Horse Athlete class: Polo L-J Li, R. Socher & L. Fei-Fei, CVPR, 2009

Total Scene Our model: a hierarchical representation of the image and its semantic contents Class: Polo Athlete Horse Grass Trees Sky Saddle Horse Sky Tree Grass Horse Athlete … noisy images and tags Learning Recognition Generative Model initialization Sky Athlete Tree Mountain Rock Class: Rock climbing Athlete Mountain Trees Rock Sky Ascent Sky Athlete Water Tree sailboat Class: Sailing Athlete Sailboat Trees Water Sky Wind L-J Li, R. Socher & L. Fei-Fei, CVPR, 2009

Total Scene Our model: a hierarchical representation of the image and its semantic contents Class: Polo Athlete Horse Grass Trees Sky Saddle Horse Sky Tree Grass Horse Athlete … noisy images and tags Learning Recognition Generative Model initialization Sky Athlete Tree Mountain Rock Class: Rock climbing Athlete Mountain Trees Rock Sky Ascent Sky Athlete Water Tree sailboat Class: Sailing Athlete Sailboat Trees Water Sky Wind Generative Model L-J Li, R. Socher & L. Fei-Fei, CVPR, 2009

Total Scene The model: a hierarchical representation of the image and its semantic contents Athlete Horse Grass Trees Sky Saddle C Polo O horse R NFNF X ArAr Z NrNt S T D Horse “Switch variable” Visible Not visible “Connector variable” Visual Text

Total Scene Our model: a hierarchical representation of the image and its semantic contents Class: Polo Athlete Horse Grass Trees Sky Saddle Horse Sky Tree Grass Horse Athlete … noisy images and tags Learning Recognition Generative Model initialization Sky Athlete Tree Mountain Rock Class: Rock climbing Athlete Mountain Trees Rock Sky Ascent Sky Athlete Water Tree sailboat Class: Sailing Athlete Sailboat Trees Water Sky Wind Generative Model Learning initialization L-J Li, R. Socher & L. Fei-Fei, CVPR, 2009

Total Scene Need some good, initial “guestimate” of O C R NFNF X Ar Nr Z Nt T S O Scene/Event images from the Internet L-J Li, R. Socher & L. Fei-Fei, CVPR, 2009

Total Scene Scene/Event images from the Internet Athlete Horse Grass Tree Saddle Wind + Generative Model Auto - Auto - semi-supervised learning: Small # of initialized images + Large # of uninitialized images Large # of uninitialized images Small # of initialized images L-J Li, R. Socher & L. Fei-Fei, CVPR, 2009

Total Scene Our model: a hierarchical representation of the image and its semantic contents Class: Polo Athlete Horse Grass Trees Sky Saddle Horse Sky Tree Grass Horse Athlete … noisy images and tags Learning Recognition Generative Model initialization Sky Athlete Tree Mountain Rock Class: Rock climbing Athlete Mountain Trees Rock Sky Ascent Sky Athlete Water Tree sailboat Class: Sailing Athlete Sailboat Trees Water Sky Wind L-J Li, R. Socher & L. Fei-Fei, CVPR, 2009

Badminton Bocce Croquet Polo 8 Event/Scene Classes Rock climbing Rowing Sailing Snow boarding

43 Class: Croquet Class: Bocce Class: Snowboarding Class: Polo Class: Sailing Class: Badminton Class: Rock Climbing Class: Rowing Total Scene Some sample results L-J Li, R. Socher & L. Fei-Fei, CVPR, 2009

I think I saw two people on a field. (Subject: RW) Outdoor scene. There were some kind of animals, maybe dogs or horses, in the middle of the picture. It looked like they were running in the middle of a grassy field. (Subject: IV) two people, whose profile was toward me. looked like they were on a field of some sort and engaged in some sort of sport (their attire suggested soccer, but it looked like there was too much contact for that). (Subject: AI) Some kind of game or fight. Two groups of two men? The foregound pair looked like one was getting a fist in the face. Outdoors seemed like because i have an impression of grass and maybe lines on the grass? That would be why I think perhaps a game, rough game though, more like rugby than football because they pairs weren't in pads and helmets, though I did get the impression of similar clothing. maybe some trees? in the background. (Subject: SM) PT = 500ms PT = 27ms PT = 40ms PT = 67ms This was a picture with some dark sploches in it. Yeah...that's about it. (Subject: KM) PT = 107ms Fei-Fei, Iyer, Koch, Perona, JoV, 2007