Cascaded Classification Models

Slides:



Advertisements
Similar presentations
Semantic Contours from Inverse Detectors Bharath Hariharan et.al. (ICCV-11)
Advertisements

Automatic Photo Pop-up Derek Hoiem Alexei A.Efros Martial Hebert Carnegie Mellon University.
Weakly supervised learning of MRF models for image region labeling Jakob Verbeek LEAR team, INRIA Rhône-Alpes.
Learning Shared Body Plans Ian Endres University of Illinois work with Derek Hoiem, Vivek Srikumar and Ming-Wei Chang.
Constrained Approximate Maximum Entropy Learning (CAMEL) Varun Ganapathi, David Vickrey, John Duchi, Daphne Koller Stanford University TexPoint fonts used.
From Interactive to Semantic Image Segmentation Varun Gulshan Supervisors: Prof. Andrew Blake Prof. Andrew Zisserman 20 Jan 2012.
A generic model to compose vision modules for holistic scene understanding Adarsh Kowdle *, Congcong Li *, Ashutosh Saxena, and Tsuhan Chen Cornell University,
Ľubor Ladický1 Phil Torr2 Andrew Zisserman1
Joint Optimisation for Object Class Segmentation and Dense Stereo Reconstruction Ľubor Ladický, Paul Sturgess, Christopher Russell, Sunando Sengupta, Yalin.
Scene Labeling Using Beam Search Under Mutex Constraints ID: O-2B-6 Anirban Roy and Sinisa Todorovic Oregon State University 1.
Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.
Wrap Up. We talked about Filters Edges Corners Interest Points Descriptors Image Stitching Stereo SFM.
LARGE-SCALE IMAGE PARSING Joseph Tighe and Svetlana Lazebnik University of North Carolina at Chapel Hill road building car sky.
Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs Roozbeh Mottaghi 1, Sanja Fidler 2, Jian Yao 2, Raquel Urtasun 2, Devi Parikh 3 1 UCLA.
Learning to Detect A Salient Object Reporter: 鄭綱 (3/2)
Recognition: A machine learning approach
Training Regimes Motivation  Allow state-of-the-art subcomponents  With “Black-box” functionality  This idea also occurs in other application areas.
1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009.
Learning object shape Gal Elidan Geremy Heitz Daphne Koller February 12 th, 2006 PAIL.
Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson
LARGE-SCALE NONPARAMETRIC IMAGE PARSING Joseph Tighe and Svetlana Lazebnik University of North Carolina at Chapel Hill CVPR 2011Workshop on Large-Scale.
Learning Spatial Context: Using stuff to find things Geremy Heitz Daphne Koller Stanford University October 13, 2008 ECCV 2008.
Scene Understanding through Transfer Learning Stephen Gould Ben Packer Geremy Heitz Daphne Koller DARPA Update September 11, 2008.
TextonBoost : Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation J. Shotton*, J. Winn†, C. Rother†, and A.
5/30/2006EE 148, Spring Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray.
Transfer Learning of Object Classes: From Cartoons to Photographs NIPS Workshop Inductive Transfer: 10 Years Later Geremy Heitz Gal Elidan Daphne Koller.
High Speed Obstacle Avoidance using Monocular Vision and Reinforcement Learning Jeff Michels Ashutosh Saxena Andrew Y. Ng Stanford University ICML 2005.
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
1 Outline Overview Integrating Vision Models CCM: Cascaded Classification Models Learning Spatial Context TAS: Things and Stuff Descriptive Querying of.
Learning Spatial Context: Can stuff help us find things? Geremy Heitz Daphne Koller April 14, 2008 DAGS Stuff (n): Material defined by a homogeneous or.
What, Where & How Many? Combining Object Detectors and CRFs
Multi-modal robotic perception Stephen Gould, Paul Baumstarck, Morgan Quigley, Andrew Ng, Daphne Koller PAIL, January 2008.
3D Scene Models Object recognition and scene understanding Krista Ehinger.
Crash Course on Machine Learning
The Three R’s of Vision Jitendra Malik.
Object Detection Sliding Window Based Approach Context Helps
“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)
Why Categorize in Computer Vision ?. Why Use Categories? People love categories!
Detecting Pedestrians Using Patterns of Motion and Appearance Paul Viola Microsoft Research Irfan Ullah Dept. of Info. and Comm. Engr. Myongji University.
Face detection Slides adapted Grauman & Liebe’s tutorial
INTRODUCTION Heesoo Myeong and Kyoung Mu Lee Department of ECE, ASRI, Seoul National University, Seoul, Korea Tensor-based High-order.
Putting Context into Vision Derek Hoiem September 15, 2004.
The Viola/Jones Face Detector A “paradigmatic” method for real-time object detection Training is slow, but detection is very fast Key ideas Integral images.
Motivation and Overview
Extracting Simple Verb Frames from Images Toward Holistic Scene Understanding Prof. Daphne Koller Research Group Stanford University Geremy Heitz DARPA.
Coherent Scene Understanding with 3D Geometric Reasoning Jiyan Pan 12/3/2012.
Learning Hierarchical Features for Scene Labeling Cle’ment Farabet, Camille Couprie, Laurent Najman, and Yann LeCun by Dong Nie.
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
Recent developments in object detection
Lecture 7: Constrained Conditional Models
CS 4501: Introduction to Computer Vision Object Localization, Detection, Semantic Segmentation Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy.
2. Skin - color filtering.
Learning a Region-based Scene Segmentation Model
Object detection with deformable part-based models
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Deep Predictive Model for Autonomous Driving
Finding Things: Image Parsing with Regions and Per-Exemplar Detectors
Nonparametric Semantic Segmentation
Summary Presentation.
Object Localization Goal: detect the location of an object within an image Fully supervised: Training data labeled with object category and ground truth.
Object detection as supervised classification
R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.
Object-Graphs for Context-Aware Category Discovery
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18
Brief Review of Recognition + Context
Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen
Lecture 29: Face Detection Revisited
Human-object interaction
“Traditional” image segmentation
Semantic Segmentation
Presentation transcript:

Cascaded Classification Models Combining Models for Holistic Scene Understanding Geremy Heitz Stephen Gould Ashutosh Saxena Daphne Koller Stanford University NIPS 2008 December 11, 2008

Outline Understanding Scene Understanding Related Work CCM Framework Results

“A car passes a bus on the road, while people walk past a building.” Human View of a “Scene” BUILDING PEOPLE BUS CAR ROAD “A car passes a bus on the road, while people walk past a building.”

Computer View of a “Scene” BUILDING ROAD Can we integrate all of these subtasks, so that whole > sum of parts ? STREET SCENE

Related Work = + = Intrinsic Images [Barrow and Tenenbaum, 1978], [Tappen et al., 2005] Hoiem et al., “Closing the Loop in Scene Interpretation” , 2008 We want to focus more on “semantic” classes We want to be flexible to using outside models = + Problems with Hoiem: 1) Required output to be in the form of an image, 2) Used his own models that he had personally developed over the previous years, and 3) At joint learning time, only learned “surfaces” and “edges/occlusions”, the other models were pre-trained ahead of time. =

How Should we Integrate? Single joint model over all variables Pros: Tighter interactions, more designer control Cons: Need expertise in each of the subtasks Simple, flexible combination of existing models Pros: State-of-the-art models, easier to extend Requires: Limited “black-box” interface to components Cons: Missing some of the modeling power DETECTION Dalal & Triggs, 2006 REGION LABELING Gould et al., 2007 DEPTH RECONSTRUCTION Saxena et al., 2007

Other Opportunities for Integration Text Understanding Audio Signals Source Separation Speaker Recognition Speech Recognition Part-of-speech tagger noun verb adj “Mr. Obama sent himself an important reminder.” Semantic Role Identification Verb: sent Sender: Mr. Obama Receiver: himself Content: reminder Anaphora Resolution

Outline Understanding Scene Understanding Related Work CCM Framework Results

Cascaded Classification Models Image Features fDET fREG fREC DET1 REG1 REC1 DET0 Independent Models REG0 REC0 Context-aware Models Object Detection Region Labeling 3D Reconstruction

Integrated Model for Scene Understanding Object Detection Region Labeling Depth Reconstruction Scene Categorization I’ll show you these

Basic Object Detection Detection Window W = Car = Person Sliding window detection, score for each window = Motorcycle = Boat = Sheep = Cow Score(W) > 0.5

Context-Aware Object Detection Scene Type: Urban scene From Scene Category MAP category, marginals From Region Labels How much of each label is in a window adjacent to W From Depths Mean, variance of depths, estimate of “true” object size Final Classifier % of “building” above W Variance of depths in W P(Y) = Logistic(Φ(W))

Region Labeling CRF Model Label each pixel as one of: {‘grass’, ‘road’, ‘sky’, etc } Conditional Markov random field (CRF) over superpixels: Singleton potentials: log- linear function of boosted detectors scores for each class Pairwise potentials: affinity of classes appearing together conditioned on (x,y) location within the image [Gould et al., IJCV 2007]

Context-Aware Region Labeling Where is the grass? Additional Feature: Relative Location Map

Depth Reconstruction CRF Label each pixel with it’s distance from the camera Conditional Markov random field (CRF) over superpixels Continuous variables Models depth as linear function of features with pairwise smoothness constraints [Saxena et al., PAMI 2008]

Depth Reconstruction with Context Grass is horizontal Sky is far away GRASS SKY BLACK BOX Find d* Reoptimize depths with new constraints: dCCM = argmin α||d - d*|| + β||d - dCONTEXT||

Training I fD fS fZ ŶD ŶS ŶZ I fD fS fZ ŶD ŶS * ŶZ I: Image ŶS ŶZ I: Image f: Image Features Ŷ: Output labels Training Regimes Independent Ground: Groundtruth Input I fD fS fZ ŶD 1 ŶS * ŶZ

Training I fD fS fZ ŶD ŶS ŶZ CCM Training Regime Later models can ignore the mistakes of previous models Training realistically emulates testing setup Allows disjoint datasets K-CCM: A CCM with K levels of classifiers I fD fS fZ ŶD 1 ŶS ŶZ

Experiments DS1 DS2 422 Images, fully labeled Categorization, Detection, Multi-class Segmentation 5-fold cross validation DS2 1745 Images, disjoint labels Detection, Multi-class Segmentation, 3D Reconstruction 997 Train, 748 Test

CCM Results – DS1 CATEGORIES PEDESTRIAN CAR REGION LABELS MOTORBIKE BOAT

CCM Results – DS2 Boats & Water Detection Car Person Bike Boat Sheep Cow Depth INDEP 0.357 0.267 0.410 0.096 0.319 0.395 16.7m 2-CCM 0.364 0.272 0.212 0.289 0.415 15.4m Regions Tree Road Grass Water Sky Building FG INDEP 0.541 0.702 0.859 0.444 0.924 0.436 0.828 2-CCM 0.581 0.692 0.860 0.565 0.930 0.489 0.819 INDEP Pred. Road Pred. Water True Road 4946 251 True Water 1150 2144 Boats & Water 2-CCM Pred. Road Pred. Water True Road 4878 322 True Water 820 2730

Example Results INDEPENDENT CCM

Example Results Independent Objects Independent Regions CCM Objects CCM Regions

CCM Summary The various subtasks of computer vision do indeed interact through context cues A simple framework can allow off-the-shelf, black-box methods to improve each other Can we train in more sophisticated ways? Downstream models re-train upstream ones Something like EM for missing labels Other applications

Thanks!