Extracting Simple Verb Frames from Images Toward Holistic Scene Understanding Prof. Daphne Koller Research Group Stanford University Geremy Heitz DARPA.

Slides:



Advertisements
Similar presentations
1 Hierarchical Part-Based Human Body Pose Estimation * Ramanan Navaratnam * Arasanathan Thayananthan Prof. Phil Torr * Prof. Roberto Cipolla * University.
Advertisements

Learning Shared Body Plans Ian Endres University of Illinois work with Derek Hoiem, Vivek Srikumar and Ming-Wei Chang.
Probabilistic Tracking and Recognition of Non-rigid Hand Motion
Constrained Approximate Maximum Entropy Learning (CAMEL) Varun Ganapathi, David Vickrey, John Duchi, Daphne Koller Stanford University TexPoint fonts used.
A generic model to compose vision modules for holistic scene understanding Adarsh Kowdle *, Congcong Li *, Ashutosh Saxena, and Tsuhan Chen Cornell University,
3D Object Recognition Pipeline Kurt Konolige, Radu Rusu, Victor Eruhmov, Suat Gedikli Willow Garage Stefan Holzer, Stefan Hinterstoisser TUM Morgan Quigley,
Recovering Human Body Configurations: Combining Segmentation and Recognition Greg Mori, Xiaofeng Ren, and Jitentendra Malik (UC Berkeley) Alexei A. Efros.
Convex Point Estimation using Undirected Bayesian Transfer Hierarchies Gal Elidan Ben Packer Geremy Heitz Daphne Koller Stanford AI Lab.
Convex Point Estimation using Undirected Bayesian Transfer Hierarchies Gal Elidan, Ben Packer, Geremy Heitz, Daphne Koller Computer Science Dept. Stanford.
Wrap Up. We talked about Filters Edges Corners Interest Points Descriptors Image Stitching Stereo SFM.
- Recovering Human Body Configurations: Combining Segmentation and Recognition (CVPR’04) Greg Mori, Xiaofeng Ren, Alexei A. Efros and Jitendra Malik -
LARGE-SCALE IMAGE PARSING Joseph Tighe and Svetlana Lazebnik University of North Carolina at Chapel Hill road building car sky.
My Group’s Current Research on Image Understanding.
INTRODUCTION Heesoo Myeong, Ju Yong Chang, and Kyoung Mu Lee Department of EECS, ASRI, Seoul National University, Seoul, Korea Learning.
Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs Roozbeh Mottaghi 1, Sanja Fidler 2, Jian Yao 2, Raquel Urtasun 2, Devi Parikh 3 1 UCLA.
Computer and Robot Vision I
Quantifying and Transferring Contextual Information in Object Detection Professor: S. J. Wang Student : Y. S. Wang 1.
Exchanging Faces in Images SIGGRAPH ’04 Blanz V., Scherbaum K., Vetter T., Seidel HP. Speaker: Alvin Date: 21 July 2004.
Training Regimes Motivation  Allow state-of-the-art subcomponents  With “Black-box” functionality  This idea also occurs in other application areas.
1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.
1 Integrating Vision Models for Holistic Scene Understanding Geremy Heitz CS223B March 4 th, 2009.
Learning object shape Gal Elidan Geremy Heitz Daphne Koller February 12 th, 2006 PAIL.
LARGE-SCALE NONPARAMETRIC IMAGE PARSING Joseph Tighe and Svetlana Lazebnik University of North Carolina at Chapel Hill CVPR 2011Workshop on Large-Scale.
Learning Spatial Context: Using stuff to find things Geremy Heitz Daphne Koller Stanford University October 13, 2008 ECCV 2008.
Scene Understanding through Transfer Learning Stephen Gould Ben Packer Geremy Heitz Daphne Koller DARPA Update September 11, 2008.
Christian Siagian Laurent Itti Univ. Southern California, CA, USA
Object Recognition Scenario? Landmark Detection (objects and humans) –Cluttered Environment –Levels of Occlusion –Types Color Shape Texture –Dynamic confusers.
Transfer Learning of Object Classes: From Cartoons to Photographs NIPS Workshop Inductive Transfer: 10 Years Later Geremy Heitz Gal Elidan Daphne Koller.
1 Outline Overview Integrating Vision Models CCM: Cascaded Classification Models Learning Spatial Context TAS: Things and Stuff Descriptive Querying of.
Learning Spatial Context: Can stuff help us find things? Geremy Heitz Daphne Koller April 14, 2008 DAGS Stuff (n): Material defined by a homogeneous or.
Multi-modal robotic perception Stephen Gould, Paul Baumstarck, Morgan Quigley, Andrew Ng, Daphne Koller PAIL, January 2008.
3D Scene Models Object recognition and scene understanding Krista Ehinger.
School of Electronic Information Engineering, Tianjin University Human Action Recognition by Learning Bases of Action Attributes and Parts Jia pingping.
The Three R’s of Vision Jitendra Malik.
Autonomous Learning of Object Models on Mobile Robots Xiang Li Ph.D. student supervised by Dr. Mohan Sridharan Stochastic Estimation and Autonomous Robotics.
A Bayesian Approach For 3D Reconstruction From a Single Image
The Whole World in Your Hand: Active and Interactive Segmentation The Whole World in Your Hand: Active and Interactive Segmentation – Artur Arsenio, Paul.
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe & Computer Vision Laboratory ETH.
Recovering Surface Layout from a Single Image D. Hoiem, A.A. Efros, M. Hebert Robotics Institute, CMU Presenter: Derek Hoiem CS 598, Spring 2009 Jan 29,
Dynamic 3D Scene Analysis from a Moving Vehicle Young Ki Baik (CV Lab.) (Wed)
Yao, B., and Fei-fei, L. IEEE Transactions on PAMI(2012)
Visual Scene Understanding (CS 598) Derek Hoiem Course Number: Instructor: Derek Hoiem Room: Siebel Center 1109 Class Time: Tuesday and Thursday.
INTRODUCTION Heesoo Myeong and Kyoung Mu Lee Department of ECE, ASRI, Seoul National University, Seoul, Korea Tensor-based High-order.
Putting Context into Vision Derek Hoiem September 15, 2004.
Geodesic Saliency Using Background Priors
Acquiring 3D Indoor Environments with Variability and Repetition Young Min Kim Stanford University Niloy J. Mitra UCL/ KAUST Dong-Ming Yan KAUST Leonidas.
Modeling the Shape of a Scene: Seeing the trees as a forest Scene Understanding Seminar
Hierarchical Matching with Side Information for Image Classification
Human Activity Recognition at Mid and Near Range Ram Nevatia University of Southern California Based on work of several collaborators: F. Lv, P. Natarajan,
Detecting Eye Contact Using Wearable Eye-Tracking Glasses.
Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic Framework N 工科所 錢雅馨 2011/01/16 Li-Jia Li, Richard.
Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.
1.Learn appearance based models for concepts 2.Compute posterior probabilities or Semantic Multinomial (SMN) under appearance models. -But, suffers from.
Coherent Scene Understanding with 3D Geometric Reasoning Jiyan Pan 12/3/2012.
Computational Vision Jitendra Malik University of California, Berkeley.
Learning Hierarchical Features for Scene Labeling Cle’ment Farabet, Camille Couprie, Laurent Najman, and Yann LeCun by Dong Nie.
Fast Human Detection in Crowded Scenes by Contour Integration and Local Shape Estimation Csaba Beleznai, Horst Bischof Computer Vision and Pattern Recognition,
Bangpeng Yao1, Xiaoye Jiang2, Aditya Khosla1,
Recognizing Deformable Shapes
Let’s draw You’ll need one white board and 2 dry erase markers per desk.
Nonparametric Semantic Segmentation
Identifying Human-Object Interaction in Range and Video Data
Brief Review of Recognition + Context
Cascaded Classification Models
KFC: Keypoints, Features and Correspondences
Liyuan Li, Jerry Kah Eng Hoe, Xinguo Yu, Li Dong, and Xinqi Chu
RCNN, Fast-RCNN, Faster-RCNN
Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen
Recognizing Deformable Shapes
Human-object interaction
Presentation transcript:

Extracting Simple Verb Frames from Images Toward Holistic Scene Understanding Prof. Daphne Koller Research Group Stanford University Geremy Heitz DARPA CLLR Workshop December 2, 2008

Grand Goal: Scene Understanding “man wearing a backpack, smoking a cigarette, walking a dog on a sidewalk” Man Dog Backpack Cigarette “A cow walking through the grass on a pasture by the sea”

Understanding Verb Frames “a man is walking on a sidewalk” Primitives Objects Parts Surfaces Regions Interactions Context Actions Methods exist to extract these, but we need to both do a better job, and get them all at once Modeling verb frames requires understanding the interactions between primitives, and which fit well into the framework of graphical models. Man Dog Backpack Cigarette Building Sidewalk “a dog is walking on a sidewalk” Frame: to walk

Outline Extracting the Primitives Qualitative 3D Scene Layout Modeling Relationships Learning Frames Refined Characterization of Objects

Computer View of a “Scene” BUILDING ROAD STREET SCENE

Object Detection = Car = Person = Motorcycle = Boat = Sheep = Cow Detection Window W Score(W) > 0.5

Finding the Primitives Jointly SEASIDE PASTURE GRASS SKY Grass = Flat Sky = Far FG = Vertical 40% Grass, 30% Sky… 1 cow, 2 boats… [Heitz et al., NIPS 2008a]

Results – TAS Model Contextual Detector Base Detector [Heitz et al., ECCV 2008]

Qualitative 3D Scene Layout Primitives imply a certain 3D layout of the scene, absolute depth may not be preserved For example: Sky is a far, vertical plane Water, road are horizontal planes Objects “popup” from the image

Modeling Relationships Beside In front of On We have explored how to model 2D relationships We should be able to extend this to 3D relationships [Gould et al., IJCV 2008] [Heitz et al., ECCV 2008]

Outline Extracting the Primitives Qualitative 3D Scene Layout Modeling Relationships Learning Frames Refined Characterization of Objects

Learning Semantics: Verb Frames Given primitives, rough layout, and relationships Let’s learn subjects, verb, and objects for frames: The [S] [V] the [O]. [S],[O] CAR ROAD COW GRASS PERSON APPLE … [V] WALKS ON EATS DRIVES ON JUMPS OVER THROWS …

The CAR DRIVES ON the ROAD

Refined Characterization We need to know that the white stick is a cigarette… and where the man’s mouth is… in order to determine that he’s smoking.

Refined Object Characterization Set of “keypoint” landmarks Outline shape defined by connecting contour [Heitz et al., NIPS 2008b, IJCV in submission]

Results GiraffeLlama Rhino

Mammals [Heitz et al., NIPS 2008b, IJCV in submission] EatingStanding RunningStanding

Activity Recognition Eating Drinking 2) Extract histogram of “stuff” in a window around the head landmark 1) Localize the landmarks of the cow, including the head. Grass Cow 3) Make a decision Eating

Activity Recognition with People RunningWalkingStandingHitting Pose of person is one of the important factors Also need to recognize objects person interacts with

How far can we take this? Front legs off ground = Jumping Ball near hands = Throwing Apple near mouth = Eating

Does phased learning help? Cartoon/Caricature Exaggerates the most salient features of the object class. Simple BG Real object with no confusing clutter. Cluttered BG Object in standard pose on natural background. Articulated Once we have built a strong appearance model, can we learn complicated articulations?

Our Related Papers G. Elidan, B. Packer, G. Heitz, and D. Koller. Convex Point Estimation using Undirected Bayesian Transfer Hierarchies. UAI, S. Gould, J. Rodgers, D. Cohen, G. Elidan, and D. Koller. Multi- Class Segmentation with Relative Location Prior. IJCV, S. Gould, P. Baumstarck, M. Quigley, A. Ng, and D. Koller. Integrating Visual and Range Data for Robotic Object Detection. ECCV Workshop M2SFA2, G. Heitz and D. Koller. Learning Spatial Context: Using Stuff to Find Things. ECCV, G. Heitz, S. Gould, A. Saxena, and D. Koller. Cascaded Classification Models: Combining Models for Holistic Scene Understanding. NIPS, G. Heitz, G. Elidan, B. Packer, and D. Koller. Shape-based Object Localization for Descriptive Classification. NIPS, 2008.