Presentation is loading. Please wait.

Presentation is loading. Please wait.

Depth Estimation via Scene Classification Vladimir Nedović 28-05-2008 with: Arnold Smeulders & Jan-Mark Geusebroek (UvA) André.

Similar presentations


Presentation on theme: "Depth Estimation via Scene Classification Vladimir Nedović 28-05-2008 with: Arnold Smeulders & Jan-Mark Geusebroek (UvA) André."— Presentation transcript:

1 Depth Estimation via Scene Classification Vladimir Nedović 28-05-2008 vnedovic@science.uva.nl with: Arnold Smeulders & Jan-Mark Geusebroek (UvA) André Redert (Philips Research)

2 seems chaotic, but there is structure - same as in natural image statistics viewpoint constraints understood, influence on film art ‘modal’ scene configurations – structures orthogonal to each other Order in Pollock's Chaos Jackson Pollock, Blue Poles: Number 1, 1952 R.P. Taylor, A.P. Micolich and D. Jonas, Fractal Analysis Of Pollock's Drip Paintings, Nature, vol. 399, p.422 (1999) Sandro Botticelli, Annunciation, 1489-90 Post-perspective (Quattrocento, after 1430) Pre-perspective (Gothic art, before 1430) Simone Martini (1285-1344) W. Richards, A. Jepson and J. Feldman, Priors, Preferences and Categorical Percepts, in Perception as Bayesian Inference, pp. 80-111, 1996. Know any tilted buildings?

3 Outline Introduction Related work Our approach Preliminary classification Conclusions

4 Introduction The context: fully automatic 2D to 3D conversion of video data for 3DTV GOAL: in a fast manner, obtain an approximate, but visually pleasing 3D model from a single image We know about stereo, structure from motion, etc. but can we also derive depth from a single image? humans can, right? humans can, right? Can we exploit some constraints? is the data really chaotic? is the data really chaotic? what about perceptual limitations of viewers? what about perceptual limitations of viewers?

5 Related work BUT: BUT: outdoor images only + assumes sky&ground are always present outdoor images only + assumes sky&ground are always present i.e. accounts for less than half of all possibilities i.e. accounts for less than half of all possibilities Related work (3): Saxena (Stanford Univ.) 3D mesh from ML on low-level features (no classes) 3D mesh from ML on low-level features (no classes) Related work (2): Hoiem (Carnegie Melon Univ.) obtained 3D orientation of scene surfaces using machine learning (ICCV 2005) obtained 3D orientation of scene surfaces using machine learning (ICCV 2005) improved object detection (CVPR 2006 best paper) + accounted for occlusions to derive relative ordering of elements (ICCV 2007) improved object detection (CVPR 2006 best paper) + accounted for occlusions to derive relative ordering of elements (ICCV 2007) Related work (1): Related work (1): Torralba & Oliva showed that depth can be derived from structure, itself derived from natural image statistics (IEEE PAMI 2001)

6 S Separate a visual scene into its two constituent elements: consider objects separately from the stage on which they act Our approach Our approach: depth estimation via geometric scene classification i.e. holistic, not pixel-based Determine the 3D stage model first S Stage ≈ first approximation of global depth reduces subsequent (finer) depth processing tasks can guide other processes, e.g. object localization & recognition ć V. Nedović et al. ICCV2007 objectstage

7 Our approach - stage models - For the stage, a rough depth model is sufficient Exploit geometric structure of images, which reduces the number of possible configurations Only a few configurations are prominent => the first step in depth estimation can be stage classification regularities arise from: regularities arise from: natural image statistics -> texture gradients natural image statistics -> texture gradients viewpoint constraints -> perspective viewpoint constraints -> perspective modal configurations & film rules -> orthogonality modal configurations & film rules -> orthogonality

8 Our approach - stage hierarchy - Structure of the visual world leads to only 15 geometric scene types Structure of the visual world leads to only 15 geometric scene types Influence of structure identical indoors & outdoors => such distinction unnecessary Three-level hierarchy Three-level hierarchy perform classification in steps: first determine the geometric neighbourhood, then proceed further

9 Our approach - three-level hierarchy - i.e. no parameter estimation needed! i.e. 2-3 sub-stages per each stage accounting for variability in parameters geometry at bottom so constrained that pre- defined crude depth maps already possible

10 Preliminary classification (1) Proof of concept with a single feature type natural image statistics-based Weibull features (i.e. texture gradients) natural image statistics-based Weibull features (i.e. texture gradients) TRECVID dataset of TV news used for evaluation A.F. Smeaton et al. “Evaluation campaigns and TRECVid”, 8 th ACM Int’l Workshop on Multimedia Info. Retrieval, 2006. Features extracted based on a 4x4 region grid over the image two features per region => 64 features in total

11 Preliminary classification (2) Support Vector Machines (SVM) classifier based on a 1 vs. 1 multi-class approach individual stages (results of symmetrical variants combined) individual stages (results of symmetrical variants combined) stage groups two-step classification, average within group (assuming super-stage is known)

12 Conclusions (1) We need a fast & approximate solution: do only what is necessary, viewers may not perceive it anyway do only what is necessary, viewers may not perceive it anyway generalize where possible, to reduce the problem at every step generalize where possible, to reduce the problem at every step Separate a scene into a stage and the objects Determine the stage 3D model first rough model is sufficient rough model is sufficient plus, structure greatly reduces the number of possible configurations plus, structure greatly reduces the number of possible configurations and, stage will help us to locate and process objects and, stage will help us to locate and process objects

13 Conclusions (2) Therefore, we can use scene classification as the first step in depth estimation Due to structure, we can create simple models that fit TV data 15 stages is sufficient 15 stages is sufficient no need to distinguish between indoor & outdoor no need to distinguish between indoor & outdoor

14 Conclusions (3) Our approach: three-step classification geometry at the bottom constrained enough, so we can already assign pre-defined depth maps geometry at the bottom constrained enough, so we can already assign pre-defined depth maps no parameter estimation necessary no parameter estimation necessary Proof of concept demonstrated with a single feature type performance much better than chance performance much better than chance but enhancements needed (more features etc.) but enhancements needed (more features etc.)

15 Questions?


Download ppt "Depth Estimation via Scene Classification Vladimir Nedović 28-05-2008 with: Arnold Smeulders & Jan-Mark Geusebroek (UvA) André."

Similar presentations


Ads by Google