Coherent Scene Understanding with 3D Geometric Reasoning Jiyan Pan 12/3/2012.

Slides:

Advertisements

Similar presentations

Vanishing points .

Advertisements

Semantic Contours from Inverse Detectors Bharath Hariharan et.al. (ICCV-11)

Automatic Photo Pop-up Derek Hoiem Alexei A.Efros Martial Hebert Carnegie Mellon University.

Putting Objects in Perspective Derek Hoiem Alexei A. Efros Martial Hebert Carnegie Mellon University Robotics Institute.

Learning Shared Body Plans Ian Endres University of Illinois work with Derek Hoiem, Vivek Srikumar and Ming-Wei Chang.

Alignment Visual Recognition “Straighten your paths” Isaiah.

Scene Labeling Using Beam Search Under Mutex Constraints ID: O-2B-6 Anirban Roy and Sinisa Todorovic Oregon State University 1.

Fitting: The Hough transform. Voting schemes Let each feature vote for all the models that are compatible with it Hopefully the noise features will not.

Xbxb dbdb dtdt γ nvnv θ xtxt npnp hphp ngng α H f ground plane image plane (inverse) gravity ground plane orientation ground plane height object vertical.

Learning to estimate human pose with data driven belief propagation Gang Hua, Ming-Hsuan Yang, Ying Wu CVPR 05.

Proportion Priors for Image Sequence Segmentation Claudia Nieuwenhuis, etc. ICCV 2013 Oral.

Lecture 8: Stereo.

Student: Yao-Sheng Wang Advisor: Prof. Sheng-Jyh Wang ARTICULATED HUMAN DETECTION 1 Department of Electronics Engineering National Chiao Tung University.

Localization of Piled Boxes by Means of the Hough Transform Dimitrios Katsoulas Institute for Pattern Recognition and Image Processing University of Freiburg.

Camera calibration and epipolar geometry

Beyond bags of features: Part-based models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

Fitting: The Hough transform

Image Correspondence and Depth Recovery Gene Wang 4/26/2011.

Single-view metrology

Last Time Pinhole camera model, projection

Robust Higher Order Potentials For Enforcing Label Consistency

1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.

A Study of Approaches for Object Recognition

Features-based Object Recognition Pierre Moreels California Institute of Technology Thesis defense, Sept. 24, 2007.

Visibility Subspaces: Uncalibrated Photometric Stereo with Shadows Kalyan Sunkavalli, Harvard University Joint work with Todd Zickler and Hanspeter Pfister.

Lecture 11: Structure from motion CS6670: Computer Vision Noah Snavely.

Recovering Articulated Object Models from 3D Range Data Dragomir Anguelov Daphne Koller Hoi-Cheung Pang Praveen Srinivasan Sebastian Thrun Computer Science.

Triangulation and Multi-View Geometry Class 9 Read notes Section 3.3, , 5.1 (if interested, read Triggs’s paper on MVG using tensor notation, see.

Lecture 15: Single-view modeling CS6670: Computer Vision Noah Snavely.

Lecture 4: Edge Based Vision Dr Carole Twining Thursday 18th March 2:00pm – 2:50pm.

Single-view geometry Odilon Redon, Cyclops, 1914.

Automatic Photo Popup Derek Hoiem Alexei A. Efros Martial Hebert Carnegie Mellon University.

CSE473/573 – Stereo Correspondence

Robust estimation Problem: we want to determine the displacement (u,v) between pairs of images. We are given 100 points with a correlation score computed.

3D Scene Models Object recognition and scene understanding Krista Ehinger.

Single-view metrology

776 Computer Vision Jan-Michael Frahm, Enrique Dunn Spring 2013.

Models for Multi-View Object Class Detection Han-Pang Chiu 1.

Recovering Surface Layout from a Single Image D. Hoiem, A.A. Efros, M. Hebert Robotics Institute, CMU Presenter: Derek Hoiem CS 598, Spring 2009 Jan 29,

Dynamic 3D Scene Analysis from a Moving Vehicle Young Ki Baik (CV Lab.) (Wed)

Object Detection 01 – Advance Hough Transformation JJCAO.

Metrology 1.Perspective distortion. 2.Depth is lost.

Self-Calibration and Metric Reconstruction from Single Images Ruisheng Wang Frank P. Ferrie Centre for Intelligent Machines, McGill University.

Generalized Hough Transform

Features-based Object Recognition P. Moreels, P. Perona California Institute of Technology.

Single View Geometry Course web page: vision.cis.udel.edu/cv April 9, 2003  Lecture 20.

CS654: Digital Image Analysis Lecture 25: Hough Transform Slide credits: Guillermo Sapiro, Mubarak Shah, Derek Hoiem.

Fitting: The Hough transform

Raquel A. Romano 1 Scientific Computing Seminar May 12, 2004 Projective Geometry for Computer Vision Projective Geometry for Computer Vision Raquel A.

Paper Reading Dalong Du Nov.27, Papers Leon Gu and Takeo Kanade. A Generative Shape Regularization Model for Robust Face Alignment. ECCV08. Yan.

Feature Matching. Feature Space Outlier Rejection.

1Ellen L. Walker 3D Vision Why? The world is 3D Not all useful information is readily available in 2D Why so hard? “Inverse problem”: one image = many.

Object Recognition by Integrating Multiple Image Segmentations Caroline Pantofaru, Cordelia Schmid, Martial Hebert ECCV 2008 E.

Single-view geometry Odilon Redon, Cyclops, 1914.

776 Computer Vision Jan-Michael Frahm Spring 2012.

Announcements Final is Thursday, March 18, 10:30-12:20 –MGH 287 Sample final out today.

Automatic 3D modelling of Architecture Anthony Dick 1 Phil Torr 2 Roberto Cipolla 1 1 Department of Engineering 2 Microsoft Research, University of Cambridge.

IEEE 2015 Conference on Computer Vision and Pattern Recognition Active Learning for Structured Probabilistic Models with Histogram Approximation Qing SunAnkit.

Hough Transform CS 691 E Spring Outline Hough transform Homography Reading: FP Chapter 15.1 (text) Some slides from Lazebnik.

Modeling Perspective Effects in Photographic Composition Zihan Zhou, Siqiong He, Jia Li, and James Z. Wang The Pennsylvania State University.

Single-view metrology

CENG 789 – Digital Geometry Processing 10- Least-Squares Solutions

Approximate Models for Fast and Accurate Epipolar Geometry Estimation

3D Photography: Epipolar geometry

Cascaded Classification Models

Filtering Things to take away from this lecture An image as a function

Video Compass Jana Kosecka and Wei Zhang George Mason University

CENG 789 – Digital Geometry Processing 11- Least-Squares Solutions

Filtering An image as a function Digital vs. continuous images

Presentation transcript:

Coherent Scene Understanding with 3D Geometric Reasoning Jiyan Pan 12/3/2012

Task Detect objects Identify surface regions Estimate ground plane Infer gravity direction Geometrically coherent in the 3D world 3D geometric context

xbxb dbdb dtdt γ nvnv θ xtxt npnp hphp ngng α H f ground plane image plane (inverse) gravity ground plane orientation ground plane height object vertical orientation real world height object depth camera center focal length object pitch and roll angles object landmarks Coordinate system Deterministic relationships Variables of global 3D geometries: n g, n p, h p

xbxb dbdb dtdt γ nvnv θ xtxt npnp hphp ngng α H f ground plane image plane (inverse) gravity ground plane orientation ground plane height object vertical orientation real world height object depth camera center focal length object pitch and roll angles object landmarks Coordinate system Probabilistic relationships Derived from appearance Prior knowledge

Can we solve them all for a coherent solution? Non-linear Non-deterministic Even invalid equations from false detections

√ √ √ √ X Global 3D context Local 3D context

√ √ √ √ X “Chicken and egg” problem:  Local entities could be validated by global 3D context  Global 3D context is induced from local entities Global 3D context Local 3D context ?

Possible solution (All in PGM) Put both global 3D geometries and local entities in a PGM [1] – Precision issue: Have to quantize continuous variables – Complexity issue: Pairwise potential would contain up to ~1e6 entries [1] D. Hoiem, A. A. Efros, and M. Hebert. Putting objects in perspective. IJCV, 2008 Ground o1o1 o2o2 okok Gravity 100(pitch) × 100 (roll) × 100 (height)

Possible solution (Fixed global geometries as hypotheses) Task much easier under a fixed hypothesis of global 3D geometries Ground o1o1 o2o2 okok Gravity × × × × ××

Task much easier under a fixed hypothesis of global 3D geometries Possible solution (Fixed global geometries as hypotheses) o1o1 o2o2 okok ω1ω1 ω2ω2 ω3ω3 How to generate global 3D geometry hypotheses?

Possible solution (Hypotheses by exhaustive search) Exhaustive search over the quantized space of global 3D geometries [2] – Computational complexity tends to limit search space [2] S. Bao et al. Toward coherent object detection and scene layout understanding. IVC, 2011

Possible solution (Hypotheses by Hough voting) Each local entity casts vote to the Hough voting space of the global 3D geometries and peaks are selected [3] – False detections could corrupt the votes – Would applying EM help? Not likely, if false detections overwhelm [3] M. Sun et al. Object detection with geometrical context feedback loop. BMVC, 2010 L1L1 L2L2 L3L3 L5L5 L4L4 L7L7 L6L6

Our solution We take a RANSAC-like approach: Randomly mix the contributions of local entities L1L1 L2L2 L3L3 L5L5 L4L4 L7L7 L6L6

Our solution We take a RANSAC-like approach: Randomly mix the contributions of local entities L1L1 L2L2 L3L3 L5L5 L4L4 L7L7 L6L6

Our solution We take a RANSAC-like approach: Randomly mix the contributions of local entities – Compared to averaging over all local entities: More robust against outliers – Compared to directly using estimates from each single local entity: More robust against noise L1L1 L2L2 L3L3 L5L5 L4L4 L7L7 L6L6

Number of random mixtures Minimum hypothesis error Gravity Direction Individual Mixture Average

Number of random mixtures Minimum hypothesis error Ground Plane Orientation Individual Mixture Average

√ √ √ √ X Local 3D context Global 3D context

3D geometric context ground plane orientation valid invalid (#1) ground plane #1: Common ground (global)

3D geometric context #2: Gravity direction (global) (inverse) gravity ground plane orientation invalid (#2) ground plane

3D geometric context #3: Depth ordering (local) (inverse) gravity ground plane orientation incompatible (#3) ground plane

3D geometric context #4: Space occupancy (local) (inverse) gravity ground plane orientation incompatible (#4) ground plane

Global geometric compatibility for an object: Orientation: Given a global 3D geometry hypothesis

Global geometric compatibility for an object: Orientation: Height: Given a global 3D geometry hypothesis

Global geometric compatibility for a surface: Orientation: local estimates vs. or Location: horizontal surface region vs. ground horizon Given a global 3D geometry hypothesis

Local geometric compatibility for two objects: Depth ordering: Space occupancy: Given a global 3D geometry hypothesis

Objective function of the CRF: Given a global 3D geometry hypothesis

√ √ √ √ X Local 3D context Global 3D context Best hypothesis

3D reasoning agrees with raw detector 3D reasoning recovers detection rejected by raw detector 3D reasoning rejects detection accepted by raw detector

3D reasoning agrees with raw detector 3D reasoning recovers detection rejected by raw detector 3D reasoning rejects detection accepted by raw detector

3D reasoning agrees with raw detector 3D reasoning recovers detection rejected by raw detector 3D reasoning rejects detection accepted by raw detector

3D reasoning agrees with raw detector 3D reasoning recovers detection rejected by raw detector 3D reasoning rejects detection accepted by raw detector

3D reasoning agrees with raw detector 3D reasoning recovers detection rejected by raw detector 3D reasoning rejects detection accepted by raw detector

3D reasoning agrees with raw detector 3D reasoning recovers detection rejected by raw detector 3D reasoning rejects detection accepted by raw detector

3D reasoning agrees with raw detector 3D reasoning recovers detection rejected by raw detector 3D reasoning rejects detection accepted by raw detector

3D reasoning agrees with raw detector 3D reasoning recovers detection rejected by raw detector 3D reasoning rejects detection accepted by raw detector

False Positive per Image True Positive Rate Deformable Part Model Detector Baseline Hoiem Ours 3D geometric reasoning improves object detection performance D. Hoiem, A. A. Efros, and M. Hebert. Putting objects in perspective. IJCV, 2008

False Positive per Image True Positive Rate Dalal-Triggs Detector Baseline Hoiem Ours 3D geometric reasoning improves object detection performance D. Hoiem, A. A. Efros, and M. Hebert. Putting objects in perspective. IJCV, 2008

Improvement in AP over baseline detector Ours 10.4% Hoiem 4.8% Sun 5.1% M. Sun et al. Object detection with geometrical context feedback loop. BMVC, 2010 D. Hoiem, A. A. Efros, and M. Hebert. Putting objects in perspective. IJCV, D geometric reasoning improves object detection performance

Horizon estimation median error Ours 2.05⁰ Hoiem 3.15⁰ Sun 2.41⁰ M. Sun et al. Object detection with geometrical context feedback loop. BMVC, 2010 D. Hoiem, A. A. Efros, and M. Hebert. Putting objects in perspective. IJCV, 2008

√ √ √ √ X Local 3D context Global 3D context Best hypothesis

Contributions of different geometric context False Positive per Image True Positive Rate Detection ROC Curve Det Det+IdvlGeo Det+PairGeo Det+FullGeo

Benefit is mutual Error in gravity direction Error in ground orientation Vanishing points alone 2.62⁰4.85⁰ Whole system 2.05⁰2.21⁰

Extensions – Improved depth ordering constraint – Local geometric constraints involving vertical surfaces – Multiple supporting planes – Using more prior knowledge of objects – Utilizing semantic categories of surface regions

closer object farther object closer object farther object occlusion mask of the farther object intersection region of the two object masks √ X Fully cover?

Occlusion: bottleneck in our system – Missed detection – Erroneous estimation of local properties – Less effective depth ordering constraint

Generalized Hough voting: better at handle occlusions K. Rematas et al. CORP 2011 B. Leibe et al. IJCV 2008

Occlusion-and-geometry-aware Hough voting

√ √ √ √ X Local 3D context Global 3D context Best hypothesis

So far we have treated the entire region labeled as "vertical" as a whole

Decompose vertical region into surface segments Occlusion boundary recovery (Hoiem et al. IJCV’11) Vanishing line sweeping (Lee et al. CVPR’09)

ground plane inverse gravity √ vertical surface candidate 1 vertical surface candidate 2

ground plane vertical surface candidate 1 inverse gravity vertical surface candidate 2 X

ground plane vertical surface candidate inverse gravity object candidate √

ground plane vertical surface candidate inverse gravity X

Given object layout, erect surfaces one by one “Interpretation by synthesis” (Gupta et al. ECCV’10)

supporting plane 1

supporting plane 2

ground plane

w l β

Spring 2013 (ICCV’13 submission) – Improved depth ordering constraint – Using more prior knowledge of objects – Multiple supporting planes Fall 2013 (CVPR’14 submission) – Local geometric constraints involving vertical surfaces – Utilizing semantic categories of surface regions During Spring Semester of 2014 – Thesis writing

Expected Contributions Systematically model the relationships among global and local geometric variables Develop a RANSAC-CRF scheme to handle non-linear, non-deterministic, and possibly invalid relationships Occlusion-and-geometry-aware object detection for finer depth order reasoning Joint reasoning among global geometries, surface segments, and objects

Thank you!