Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic Framework Li-Jia Li, Richard Socher, Li Fei- Fei 1.

Slides:



Advertisements
Similar presentations
Presented by, Biswaranjan Panda and Moutupsi Paul Beyond Nouns -Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers Ref.
Advertisements

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
Features Accurate annotation – Average key point number for each object is more than 100. (see counts histogram on website) Multiple annotation methods.
A generic model to compose vision modules for holistic scene understanding Adarsh Kowdle *, Congcong Li *, Ashutosh Saxena, and Tsuhan Chen Cornell University,
Pictures and Words. Vision and language in human brain FFA LOC V1 PPA Broca Area Wernicke Area LanguageVision.
Challenges to image parsing researchers Lana Lazebnik UNC Chapel Hill sky sidewalk building road car person car mountain.
Learning to Combine Bottom-Up and Top-Down Segmentation Anat Levin and Yair Weiss School of CS&Eng, The Hebrew University of Jerusalem, Israel.
Simultaneous Image Classification and Annotation Chong Wang, David Blei, Li Fei-Fei Computer Science Department Princeton University Published in CVPR.
Wrap Up. We talked about Filters Edges Corners Interest Points Descriptors Image Stitching Stereo SFM.
Part 1: Bag-of-words models by Li Fei-Fei (Princeton)
LARGE-SCALE IMAGE PARSING Joseph Tighe and Svetlana Lazebnik University of North Carolina at Chapel Hill road building car sky.
Bangpeng Yao and Li Fei-Fei
Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA.
LOCUS (Learning Object Classes with Unsupervised Segmentation) A variational approach to learning model- based segmentation. John Winn Microsoft Research.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Object-centric spatial pooling for image classification Olga Russakovsky, Yuanqing Lin, Kai Yu, Li Fei-Fei ECCV 2012.
Contour Based Approaches for Visual Object Recognition Jamie Shotton University of Cambridge Joint work with Roberto Cipolla, Andrew Blake.
Biased Normalized Cuts 1 Subhransu Maji and Jithndra Malik University of California, Berkeley IEEE Conference on Computer Vision and Pattern Recognition.
Image Parsing: Unifying Segmentation and Detection Z. Tu, X. Chen, A.L. Yuille and S-C. Hz ICCV 2003 (Marr Prize) & IJCV 2005 Sanketh Shetty.
Bag-of-features models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Li-Jia Li Yongwhan Lim Li Fei-Fei Chong Wang David M. Blei B UILDING AND U SING A S EMANTIVISUAL I MAGE H IERARCHY CVPR, 2010.
Unsupervised Learning of Categorical Segments in Image Collections *California Institute of Technology **Technion Marco Andreetto*, Lihi Zelnik-Manor**,
Lecture 28: Bag-of-words models
EECS 442 – Computer vision Segments of this lectures are courtesy of Prof F. Li, R. Fergus and A. Zisserman Databases for object recognition and beyond.
Automatic Image Annotation and Retrieval using Cross-Media Relevance Models J. Jeon, V. Lavrenko and R. Manmathat Computer Science Department University.
Bag-of-features models
Unsupervised discovery of visual object class hierarchies Josef Sivic (INRIA / ENS), Bryan Russell (MIT), Andrew Zisserman (Oxford), Alyosha Efros (CMU)
WORD-PREDICTION AS A TOOL TO EVALUATE LOW-LEVEL VISION PROCESSES Prasad Gabbur, Kobus Barnard University of Arizona.
Generative learning methods for bags of features
Agenda Introduction Bag-of-words model Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.
Object Class Recognition using Images of Abstract Regions Yi Li, Jeff A. Bilmes, and Linda G. Shapiro Department of Computer Science and Engineering Department.
1 Outline Overview Integrating Vision Models CCM: Cascaded Classification Models Learning Spatial Context TAS: Things and Stuff Descriptive Querying of.
What, Where & How Many? Combining Object Detectors and CRFs
Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.
Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Image Annotation and Feature Extraction
Unsupervised Learning of Hierarchical Spatial Structures Devi Parikh, Larry Zitnick and Tsuhan Chen.
Object Bank Presenter : Liu Changyu Advisor : Prof. Alex Hauptmann Interest : Multimedia Analysis April 4 th, 2013.
Object Detection Sliding Window Based Approach Context Helps
(Infinitely) Deep Learning in Vision Max Welling (UCI) collaborators: Ian Porteous (UCI) Evgeniy Bart UCI/Caltech) Pietro Perona (Caltech)
1 Action Classification: An Integration of Randomization and Discrimination in A Dense Feature Representation Computer Science Department, Stanford University.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Category Discovery from the Web slide credit Fei-Fei et. al.
Yao, B., and Fei-fei, L. IEEE Transactions on PAMI(2012)
80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.
INTRODUCTION Heesoo Myeong and Kyoung Mu Lee Department of ECE, ASRI, Seoul National University, Seoul, Korea Tensor-based High-order.
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li and Li Fei-Fei Dept. of Computer Science, Princeton University, USA CVPR ImageNet1.
Grouplet: A Structured Image Representation for Recognizing Human and Object Interactions Bangpeng Yao and Li Fei-Fei Computer Science Department, Stanford.
Discovering Objects and their Location in Images Josef Sivic 1, Bryan C. Russell 2, Alexei A. Efros 3, Andrew Zisserman 1 and William T. Freeman 2 Goal:
Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic Framework N 工科所 錢雅馨 2011/01/16 Li-Jia Li, Richard.
CS654: Digital Image Analysis
Extracting Simple Verb Frames from Images Toward Holistic Scene Understanding Prof. Daphne Koller Research Group Stanford University Geremy Heitz DARPA.
Object-Graphs for Context-Aware Category Discovery Yong Jae Lee and Kristen Grauman University of Texas at Austin 1.
Context Neelima Chavali ECE /21/2013. Roadmap Introduction Paper1 – Motivation – Problem statement – Approach – Experiments & Results Paper 2 Experiments.
Object Recognition by Integrating Multiple Image Segmentations Caroline Pantofaru, Cordelia Schmid, Martial Hebert ECCV 2008 E.
1.Learn appearance based models for concepts 2.Compute posterior probabilities or Semantic Multinomial (SMN) under appearance models. -But, suffers from.
Sung Ju Hwang and Kristen Grauman University of Texas at Austin.
Max-Margin Training of Upstream Scene Understanding Models Jun Zhu Carnegie Mellon University Joint work with Li-Jia Li *, Li Fei-Fei *, and Eric P. Xing.
数字视频技术 --- 视觉数据挖掘与内容搜索 数字视频技术. 数据挖掘  The process of extracting patterns from data.  The process of analyzing data from different perspectives and summarizing.
Learning Hierarchical Features for Scene Labeling Cle’ment Farabet, Camille Couprie, Laurent Najman, and Yann LeCun by Dong Nie.
The topic discovery models
Saliency detection Donghun Yeo CV Lab..
Li Fei-Fei, UIUC Rob Fergus, MIT Antonio Torralba, MIT
The topic discovery models
Project Implementation for ITCS4122
Object-Graphs for Context-Aware Category Discovery
The topic discovery models
Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen
Presentation transcript:

Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic Framework Li-Jia Li, Richard Socher, Li Fei- Fei 1

2 City Travel Pagoda Sunrise Sunshine Sun

3 City Travel Pagoda Sunrise Sunshine Sun Weber et al 00 Fergus et al 03 Felzenswalb et al 04 Fei-Fei et al 05 Sivic et al 05 Bosch et al 06 Oliva et al 01 Lazebnik et al 06 Shi et al 00 Felzenszwalb et al04 Sali et al 99 Winn et al 05 Kumar et al 05 Cao et al 07 Russell et al 06 Todorovic et al 06 Duygulu et al 02 Barnard et al 03 Blei et al 03 Gupta et al 08 Alipr Li et al 03 Sudderth et al 05 Segmentation Classification Annotation Remark: Approaches in yellow will be used to compare with our model in later Experiments.

4 City Travel Pagoda Sunrise Sunshine Sun Weber et al 00 Fergus et al 03 Felzenswalb et al 04 Fei-Fei et al 05 Sivic et al 05 Bosch et al 06 Oliva et al 01 Lazebnik et al 06 Shi et al 00 Felzenszwalb et al04 Sali et al 99 Winn et al 05 Kumar et al 05 Cao et al 07 Russell et al 06 Todorovic et al 06 Duygulu et al 02 Barnard et al 03 Blei et al 03 Gupta et al 08 Alipr Li et al 03 Sudderth et al 05 Segmentation Classification Annotation Total Scene Understanding

Application 5

6 ClassificationAnnotationSegmentation Mutually beneficial!

7 Athlete Horse Grass Trees Sky Saddle ClassificationAnnotationSegmentation Horse class: Polo

8 Horse Sky Tree Grass Athlete Horse Grass Trees Sky Saddle ClassificationAnnotationSegmentation Horse Athlete class: Polo

9 Horse Athlete Horse Grass Trees Sky Saddle ClassificationAnnotationSegmentation

10 Related Work: Tu et al 03 Annotation Segmentation Horse Sky Tree Grass Horse Athlete Li & Fei-Fei 07 Annotation Classification Sky Grass Horse Athlete Horse Class: Polo Classification Segmentation Tree Heitz et al 08 Class: Polo

Learning Model Recognition & Experiment Outline Classification Annotation Segmentation

12 C Nr O R NFNF X ArAr Nt Z S T D Athlete Horse Grass Trees Sky Saddle

13 C Visual Text class: Polo Athlete Horse Grass Trees Sky Saddle Joint distribution of random variable Visual Component Text Component. D

14 O Text Component. D Visual Text C class: Polo

15 R NFNF Color Location Texture Shape Text Component. O D Visual Text C class: Polo

R NFNF O D Visual Text C class: Polo 16 X ArAr Text Component.

R NFNF O D Visual Text C class: Polo X ArAr Z NrNt “Connector variable” Athlete Horse Grass Trees Sky Saddle Text Component.

R NFNF O D Visual Text C class: Polo X ArAr Z NrNt “Connector variable”. S Athlete Horse Grass Trees Sky Saddle Athlete Horse Grass Trees Sky Saddle Visible Not visible “Switch variable” Horse Athlete Horse

R NFNF O D Visual Text C class: Polo X ArAr Z NrNt “Connector variable” S Athlete Horse Grass Trees Sky Saddle Visible Not visible “Switch variable” T Horse.

Visual Text C Nr O R NF X Ar Nt Z S T Learning Model Recognition & Experiment Outline

21 Learning Exact Inference is Intractable ! Relationship of the random variables Visual Text C Nr O R NF X Ar Nt Z S T

22 Relationship of the random variables Visual Text C Nr O R NF X Ar Nt Z S T Top-down force Bottom-up force from visual information Bottom-up force from text information Collapsed Gibbs Sampling (R. Neal, 2000)

Scene/Event images from the Internet There is no object-text correspondence… Athlete Horse Grass Tree Saddle 23

Scene/Event images from the Internet Our model builds the correspondence… C Nr O R NFNF X ArAr Nt Z S T D Athlete Horse Grass Tree Saddle 24

25 Athlete Horse Grass Trees Sky Saddle Athlete Horse Grass Ball However, a big obstacle is: many objects always co-occur together ? ? ? Scene/Event images from the Internet

26 C R NFNF X Ar Nr Z Nt T S O One solution: some good initialization of O Grass Athlete Horse Athlete Horse Grass Trees Sky Saddle Scene/Event images from the Internet

Scene/Event images from the Internet 27 Initializing O: obtain internet images for each O Object images

28 Scene/Event images C R NFNF X Ar Nr Z Nt T S O Any object detection & segmentation Algorithm D Initializing O: train an object detector for each O Object images Event/Scene images

29 Scene/Event images … Black box object detection & segmentation Black box object detection & segmentation C R NFNF X Ar Nr Z Nt T S O D Initialize O in the scene image by the trained object detectors Object images Event/Scene images Any object detection & segmentation Algorithm

30 Scene/Event images … Black box object detection & segmentation Black box object detection & segmentation C R NFNF X Ar Nr Z Nt T S O Black box object detection & segmentation D Initialize O in the scene image by the trained object detectors Cao & Fei-Fei, 2007 θ C X R O Nr Ar Our Model Object images Event/Scene images

C R NFNF X Ar Nr Z Nt T S O D Auto - Auto -semi-supervised learning: Small # of initialized images + Large # of uninitialized images Our Model + Athlete Horse Grass Tree Saddle Wind Small # of initialized images Athlete Rock Grass Tree Sky Rope Athlete Snow Tree Sky Snowboard Large # of uninitialized images Scene/Event images

Athlete Horse Grass Tree Saddle Wind Athlete Rock Grass Tree Sky Rope Athlete Snow Tree Sky Snowboard Large # of uninitialized images Visual Text C Nr O R NF X Ar Nt Z S T Learning Model Recognition & Experiment Dataset Learned Model Results Outline Small # of automatically initialized images

Badminton Bocce Croquet Polo 33 8 Event/Scene Classes Remark: Tags are not used during testing

Rock climbing Rowing Sailing Snow boarding 34 8 Event/Scene Classes

35 C Nr R NFNF X ArAr Nt Z S T Learned model: O D O

36 Athlete Grass Horse C Nr O NFNF X ArAr Nt Z S T D R Learned model: R

37 C Nr O R NFNF X ArAr Nt Z T D S Learned model: S

38 8 way classification: 54% ClassificationAnnotationSegmentation

39 ClassificationAnnotationSegmentation Alipr: Li et al 03Corr LDA: Blei et al 03

40 ClassificationAnnotationSegmentation

Effect of top-down class context 41 Horse C O R X Z T S O R X Z T S Model w/o top-down classFull Model

Athlete Horse Grass Tree Saddle Wind Athlete Rock Grass Tree Sky Rope Athlete Snow Tree Sky Snowboard Large # of uninitialized images Small # of automatically initialized images Visual Text C Nr O R NF X Ar Nt Z S T Sky Athlete Tree Mountain Rock Class: Rock climbing Athlete Mountain Tree Rock Sky Ascent Sky Athlete Water Tree sailboat Class: Sailing Athlete Sailboat Tree Water Sky Wind LearningModel Recognition & Experiment Tree Athlete Snowboard Snow Class: Snowboarding Athlete Snowboard Tree Snow Sky Powder

Thank Prof. Silvio Savarese, Juan Carlos Niebles, Chong Wang, Barry Chai, Min Sun, Bangpeng Yao, Hao Su, Jia Deng, anonymous reviewers And You 43