Putting Context into Vision Derek Hoiem September 15, 2004.

Slides:

Advertisements

Similar presentations

Putting Objects in Perspective Derek Hoiem Alexei A. Efros Martial Hebert Carnegie Mellon University Robotics Institute.

Advertisements

Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.

Part 4: Combined segmentation and recognition by Rob Fergus (MIT)

Scene Labeling Using Beam Search Under Mutex Constraints ID: O-2B-6 Anirban Roy and Sinisa Todorovic Oregon State University 1.

An Overview of Machine Learning

INTRODUCTION Heesoo Myeong, Ju Yong Chang, and Kyoung Mu Lee Department of EECS, ASRI, Seoul National University, Seoul, Korea Learning.

Real-Time Human Pose Recognition in Parts from Single Depth Images Presented by: Mohammad A. Gowayyed.

Data Visualization STAT 890, STAT 442, CM 462

Uncertainty Representation. Gaussian Distribution variance Standard deviation.

Computer and Robot Vision I

Learning to Detect A Salient Object Reporter: 鄭綱 (3/2)

Recognition: A machine learning approach

Recognition using Regions CVPR Outline Introduction Overview of the Approach Experimental Results Conclusion.

Event prediction CS 590v. Applications Video search Surveillance – Detecting suspicious activities – Illegally parked cars – Abandoned bags Intelligent.

Yilin Wang 11/5/2009. Background Labeling Problem Labeling: Observed data set (X) Label set (L) Inferring the labels of the data points Most vision problems.

Tracking Objects with Dynamics Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/21/15 some slides from Amin Sadeghi, Lana Lazebnik,

LARGE-SCALE NONPARAMETRIC IMAGE PARSING Joseph Tighe and Svetlana Lazebnik University of North Carolina at Chapel Hill CVPR 2011Workshop on Large-Scale.

Robust Real-time Object Detection by Paul Viola and Michael Jones ICCV 2001 Workshop on Statistical and Computation Theories of Vision Presentation by.

TextonBoost : Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation J. Shotton*, J. Winn†, C. Rother†, and A.

Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

An opposition to Window- Scanning Approaches in Computer Vision Presented by Tomasz Malisiewicz March 6, 2006 Advanced The Robotics Institute.

Christian Siagian Laurent Itti Univ. Southern California, CA, USA

CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.

1 Outline Overview Integrating Vision Models CCM: Cascaded Classification Models Learning Spatial Context TAS: Things and Stuff Descriptive Querying of.

What, Where & How Many? Combining Object Detectors and CRFs

Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.

Crash Course on Machine Learning

Machine learning & category recognition Cordelia Schmid Jakob Verbeek.

Object Detection Sliding Window Based Approach Context Helps

Part 3: discriminative methods Antonio Torralba. Overview of section Object detection with classifiers Boosting –Gentle boosting –Weak detectors –Object.

Computer Vision CS 776 Spring 2014 Recognition Machine Learning Prof. Alex Berg.

Why Categorize in Computer Vision ?. Why Use Categories? People love categories!

A Statistically Selected Part-Based Probabilistic Model for Object Recognition Zhipeng Zhao, Ahmed Elgammal Department of Computer Science, Rutgers, The.

ALIP: Automatic Linguistic Indexing of Pictures Jia Li The Pennsylvania State University.

Sound Detection Derek Hoiem Rahul Sukthankar (mentor) August 24, 2004.

Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks.

Deformable Part Model Presenter ： Liu Changyu Advisor ： Prof. Alex Hauptmann Interest ： Multimedia Analysis April 11 st, 2013.

INTRODUCTION Heesoo Myeong and Kyoung Mu Lee Department of ECE, ASRI, Seoul National University, Seoul, Korea Tensor-based High-order.

BAGGING ALGORITHM, ONLINE BOOSTING AND VISION Se – Hoon Park.

MSRI workshop, January 2005 Object Recognition Collected databases of objects on uniform background (no occlusions, no clutter) Mostly focus on viewpoint.

Pulkit Agrawal Y7322 BVV Sri Raj Dutt Y7110 Sushobhan Nayak Y7460.

Modeling the Shape of a Scene: Seeing the trees as a forest Scene Understanding Seminar

Epitomic Location Recognition A generative approach for location recognition K. Ni, A. Kannan, A. Criminisi and J. Winn In proc. CVPR Anchorage,

Tell Me What You See and I will Show You Where It Is Jia Xu 1 Alexander G. Schwing 2 Raquel Urtasun 2,3 1 University of Wisconsin-Madison 2 University.

Discussion of Pictorial Structures Pedro Felzenszwalb Daniel Huttenlocher Sicily Workshop September, 2006.

Grammars in computer vision

Context-based vision system for place and object recognition Antonio Torralba Kevin Murphy Bill Freeman Mark Rubin Presented by David Lee Some slides borrowed.

Li Fei-Fei, UIUC Rob Fergus, MIT Antonio Torralba, MIT Recognizing and Learning Object Categories ICCV 2005 Beijing, Short Course, Oct 15.

Towards Total Scene Understanding: Classiﬁcation, Annotation and Segmentation in an Automatic Framework N 工科所錢雅馨 2011/01/16 Li-Jia Li, Richard.

Extracting Simple Verb Frames from Images Toward Holistic Scene Understanding Prof. Daphne Koller Research Group Stanford University Geremy Heitz DARPA.

Contextual models for object detection using boosted random fields by Antonio Torralba, Kevin P. Murphy and William T. Freeman.

Object Recognition by Integrating Multiple Image Segmentations Caroline Pantofaru, Cordelia Schmid, Martial Hebert ECCV 2008 E.

TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation J. Shotton ; University of Cambridge J. Jinn,

Using the Forest to see the Trees: A computational model relating features, objects and scenes Antonio Torralba CSAIL-MIT Joint work with Aude Oliva, Kevin.

Image segmentation.

Biologically Inspired Vision-based Indoor Localization Zhihao Li, Ming Yang

A Forest of Sensors: Using adaptive tracking to classify and monitor activities in a site Eric Grimson AI Lab, Massachusetts Institute of Technology

Tracking Objects with Dynamics

Nonparametric Semantic Segmentation

Context-based vision system for place and object recognition

Object detection as supervised classification

Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science

In summary C1={skin} C2={~skin} Given x=[R,G,B], is it skin or ~skin?

Context-Aware Modeling and Recognition of Activities in Video

Eric Grimson, Chris Stauffer,

Integration and Graphical Models

Brief Review of Recognition + Context

Cascaded Classification Models

CS4670: Intro to Computer Vision

Semantic Segmentation

Presentation transcript:

Putting Context into Vision Derek Hoiem September 15, 2004

Questions to Answer What is context? How is context used in human vision? How is context currently used in computer vision? Conclusions

What is context? Any data or meta-data not directly produced by the presence of an object Nearby image data Context

What is context? Any data or meta-data not directly produced by the presence of an object Nearby image data Scene information Context

What is context? Any data or meta-data not directly produced by the presence of an object Nearby image data Scene information Presence, locations of other objects Tree

How do we use context?

Attention Are there any live fish in this picture?

Clues for Function What is this?

Clues for Function What is this? Now can you tell?

Low-Res Scenes What is this?

Low-Res Scenes What is this? Now can you tell?

More Low-Res What are these blobs?

More Low-Res The same pixels! (a car)

Why is context useful? Objects defined at least partially by function Trees grow in ground Birds can fly (usually) Door knobs help open doors

Why is context useful? Objects defined at least partially by function Context gives clues about function Not rooted into the ground  not tree Object in sky  {cloud, bird, UFO, plane, superman} Door knobs always on doors

Why is context useful? Objects defined at least partially by function Context gives clues about function Objects like some scenes better than others Toilets like bathrooms Fish like water

Why is context useful? Objects defined at least partially by function Context gives clues about function Objects like some scenes better than others Many objects are used together and, thus, often appear together Kettle and stove Keyboard and monitor

How is context used in computer vision?

Neighbor-based Context Markov Random Field (MRF) incorporates contextual constraints sites (or nodes) neighbors

Blobs and Words – Carbonetto 2004 Neighbor-based context (MRF) useful even when training data is not fully supervised Learns models of objects given captioned images …

Discriminative Random Fields – Kumar 2003 Using data surrounding the label site (not just at the label site) improves results Buildings vs. Non-Buildings

Multi-scale Conditional Random Field (mCRF) – He 2004 Independent data- based labels Raw image Local context Scene context

mCRF Final decision based on Classification (local data-based) Local labels (what relation nearby objects have to each other Image-wide labels (captures coarse scene context)

mCRF Results

Neighbor-based Context References P. Carbonetto, N. Freitas and K. Barnard. “A Statistical Model for General Contextual Object Recognition,” ECCV, 2004 S. Kumar and M. Hebert, “Discriminative Random Fields: A Discriminative Framework for Contextual Interaction in Classification,” ICCV, 2003 X. He, R. Zemel and M. Carreira-Perpiñán, “Multiscale Conditional Random Fields for Image Labeling,” CVPR, 2004

Scene-based Context Average pictures containing heads at three scales

Context Priming – Torralba 2001/2003 Pose ScaleLocation Object Presence Image Measurements Object Priming Focus of Attention Scale Selection Pose and Shape Priming Context (generally ignored) Local evidence (what everyone uses)

Getting the Gist of a Scene Simple representation Spectral characteristics (e.g., Gabor filters) with coarse description of spatial arrangement PCA reduction Probabilities modeled with mixture of Gaussians (2003) or logistic regression (Murphy 2003)

Context Priming Results Focus of Attention Object Presence Scale Selection SmallLarge

Using the Forrest to See the Trees – Murphy (2003) Adaboost Patch-Based Object Detector Gist of Scene (boosted regression) Detector Confidence at Location/Scale Expected Location/Scale of Object Combine (logistic regression) Object Probability at Location/Scale

Object Detection + Scene Context Results Often doesn’t help that much May be due to poor use of context Assumes independence of context and local evidence Only uses expected location/scale from context KeyboardScreenPeople

Scene-based Context References E. Adelson, “On Seeing Stuff: The Perception of Materials by Humans and Machines,” SPIE, 2001 B. Bose and E. Grimson, “Improving Object Classification in Far-Field Video,” ECCV, 2004 K. Murphy, A. Torralba and W. Freeman, “Using the Forrest to See the Trees: A Graphical Model Relating Features, Object, and Scenes,” NIPS, 2003 U. Rutishauser, D. Walther, C. Koch, and P. Perona, “Is bottom-up attention useful for object recognition?,” CVPR, 2004 A. Torralba, “Contextual Priming for Object Detection,” IJCV, 2003 A. Torralba and P. Sinha, “Statistical Context Priming for Object Detection,” ICCV, 2001 A. Torralba, K. Murphy, W. Freeman, and M. Rubin, “Context-Based Vision System for Place and Object Recognition,” ICCV, 2003

Object-based Context

Mutual Boosting – Fink (2003) Local WindowContextual Window Raw Image Object Likelihoods ++ Filters eyes faces Confidence

Mutual Boosting Results Learned Features First-Stage Classifier (MIT+CMU)

Contextual Models using BRFs – Torralba 2004 Template features Build structure of CRF using boosting Other objects’ locations’ likelihoods propagate through network

Labeling a Street Scene

Labeling an Office Scene F = local G = compatibility

Object-based Context References M. Fink and P. Perona, “Mutual Boosting for Contextual Inference,” NIPS, 2003 A. Torralba, K. Murphy, and W. Freeman, “Contextual Models for Object Detection using Boosted Random Fields,” AI Memo , 2004

What else can be done?

Scene Structure Improve understanding of scene structure Floor, walls, ceiling Sky, ground, roads, buildings

Semantics vs. Low-level Image Statistics Presence of Object High-D Statistics Useful to OD High-D Image Statistics Semantic s Presence of Object High-D Low-D High-D Statistics Useful to OD High-D Low-Level Semantics

Putting it all together Scene Gist Object Detection Basic Structure Identification Scene Recognition Context Priming Neighbor-Based Context Object-Based Context Scene-Based Context

Summary Neighbor-based context Using nearby labels essential for “complete labeling” tasks Using nearby labels useful even without completely supervised training data Using nearby labels and nearby data is better than just using nearby labels Labels can be used to extract local and scene context

Summary Scene-based context “Gist” representation suitable for focusing attention or determining likelihood of object presence Scene structure would provide additional useful information (but difficult to extract) Scene label would provide additional useful information

Summary Object-based context Even simple methods of using other objects’ locations improve results (Fink) Using BRFs, systems can automatically learn to find easier objects first and to use those objects as context for other objects

Conclusions General Few object detection researchers use context Context, when used effectively, can improve results dramatically A more integrated approach to use of context and data could improve image understanding

References E. Adelson, “On Seeing Stuff: The Perception of Materials by Humans and Machines,” SPIE, 2001 B. Bose and E. Grimson, “Improving Object Classification in Far-Field Video,” ECCV, 2004 P. Carbonetto, N. Freitas and K. Barnard. “A Statistical Model for General Contextual Object Recognition,” ECCV, 2004 M. Fink and P. Perona, “Mutual Boosting for Contextual Inference,” NIPS, 2003 X. He, R. Zemel and M. Carreira-Perpiñán, “Multiscale Conditional Random Fields for Image Labeling,” CVPR, 2004 S. Kumar and M. Hebert, “Discriminative Random Fields: A Discriminative Framework for Contextual Interaction in Classification,” ICCV, 2003 J. Lafferty, A. McCallum and F. Pereira, “Conditional random fields: Probabilistic models for segmenting and labeling sequence data,” ICML, 2001 K. Murphy, A. Torralba and W. Freeman, “Using the Forrest to See the Trees: A Graphical Model Relating Features, Object, and Scenes,” NIPS, 2003 U. Rutishauser, D. Walther, C. Koch, and P. Perona, “Is bottom-up attention useful for object recognition?,” CVPR, 2004 A. Torralba, “Contextual Priming for Object Detection,” IJCV, 2003 A. Torralba and P. Sinha, “Statistical Context Priming for Object Detection,” ICCV, 2001 A. Torralba, K. Murphy, and W. Freeman, “Contextual Models for Object Detection using Boosted Random Fields,” AI Memo , 2004 A. Torralba, K. Murphy, W. Freeman, and M. Rubin, “Context-Based Vision System for Place and Object Recognition,” ICCV, 2003