Visual Perception of 3D Shape Roland W. Fleming Manish Singh Max Planck Institute for Biological Cybernetics Rutgers University – New Brunswick
The problem of 3D perception Bishop Berkeley ( ): "It is I think agreed by all that distance of itself, and immediately, cannot be seen. For distance being a line directed end-wise to the eye, it projects only one point in the fund of the eye, which point remains invariably the same whether the distance be longer or shorter." P1P1 P2P2 P
The optics of the eye project the 3D world onto a 2D image plane on the retina. What we as behaving organisms care about is the 3D structure of the world. Unfortunately the projection from 3D to 2D is not invertible. The problem of 3D perception Image [2D] World [3D]
Multiple surfaces are consistent with any given image, so 3D shape perception is fundamentally ambiguous It is an inference from incomplete information The problem of 3D perception
Ambiguities in 3D Perception Necker Cube 2 dominant interpretations
Ambiguities in 3D Perception 2 dominant interpretations Only a handful of legal interpretations are generally experienced. Why? Note that neither of these two interpretations are correct perspective projections!
Philosophical Schools Constructivism (e.g. Helmholtz, Gregory, Rock) – vision is ill-posed: sensory data are impoverished – the world we see is a construction – perception is a process of inductive inference – Extra-retinal information and assumptions about the world play a central role Direct Perception (e.g. Gibson) – “ambient optic array” contains sufficient information to support action – we perceive the world directly, through active interaction – the relevant information is global and comparative
Philosophical Schools Gestalt Perception (e.g. Koffka, Metzger, Kohler) – vision is all about structure – the interpretation that we experience is determined by the interaction of simple rules describing the organization of the interpretation – The simplest interpretation is favoured: Prägnanz time
Explaining the Necker Cube 2 dominant interpretations Constructivism: the percepts are the most probable interpretations Direct Perception: the relevant image information specifies these interpretations, but such ambiguous images are rarely encountered in the real world, and we normally resolve the ambiguity through interaction Gestalt: the percepts are the simplest, ‘most orderly’ interpretations.
Perception Pipeline image
Perception Pipeline cues image shading texture
Perception Pipeline cues image shading texture shape estimate shape estimate
Perception Pipeline cues priors image shading texture shape estimate shape estimate “Surfaces are generally smooth” “Texture tends to be isotropic” “Light usually comes from above”
Generic Viewpoint Assumption Koenderink & van Doorn (1979). Binford (1981). Freeman (1994).
Image-based material editing Kahn, Reinhard, Fleming & Bülthoff (2006). Transactions on Graphics: Proceedings of SIGGRAPH 06. © ACM SIGGRAPH. transparencyre-textured Given single photograph as input, modify material appearance of object. Physically correct solution not possible: aim for ‘perceptually correct’ solution. Exploit assumptions of human vision to develop heuristics.
Crude Shape Reconstruction Light from the side: shadows and intensity gradient leads to substantial distortions of the face original reconstructed depths
Importance of viewpoint Substantial errors in depth reconstruction are not visible in transformed image transformed image correct viewpoint
Importance of viewpoint
Seen from Above
Hollow Mask Illusion Convexity and familiarity combine to yield a strong sense that the mask is convex, even when it is concave. But note that the apparent lighting and shape is different. convexconcavetransition
Bas-Relief Ambiguity Scenes related to one another by an affine transformation are indistinguishable from one another Belhumeur, Kriegman & Yuille (1997)
Scenes related to one another by an affine transformation are indistinguishable from one another Bas-Relief Ambiguity Belhumeur, Kriegman & Yuille (1997)
Bas-Relief Ambiguity Belhumeur, Kriegman & Yuille (1997) showed that shape from shading information is fundamentally ambiguous. For direct illumination, scenes that are related to one another by an affine transformation (scaling + shearing) yield pixel-for-pixel identical images. Despite this we rarely experience any ambiguity in the perception of shaded objects. Everyday perception gives us the impression that we see objects in a correct and stable way. But do we? Koenderink and colleagues have shown that perceived shape varies considerably from day to day, with the percepts typically related to one another by an affine transformation.
Light from Above In the absence of other information to indicate shape or lighting direction, the brain assumes light comes from above “light” from below “light” from above
Light from Above In the absence of other information to indicate shape or lighting direction, the brain assumes light comes from above “light” from below “light” from above
Linear Perspective
Bounding Contours © Dejan Todorović, Adapted and used with permission
Bounding Contours © Dejan Todorović, Adapted and used with permission
Bounding Contours
Structure from Motion Individual frames carry a relatively weak sense of 3D shape. It is only through optic flow (motion) that the shape is revealed
Pattern of compressions and rarefactions across the image indicates something about the 3D shape. Shape from Texture
Isotropic compression of textures due to distance Shape from Texture
Anisotropic compression of textures due to slant Shape from Texture
Anisotropic compression of textures due to slant
Shape from Texture Anisotropic compression of textures due to slant
Anisotopic compression specifies surface orientation up to a 180° ambiguity on the surface tilt. This means we can experience perceptual flips (bistability) when there are no other cues to specify convexity vs. concavity Under orthographic projection, there is no isotropic compression and no convergence, so we can see the red line as lying either on a ridge or in a valley
Under perspective projection, isotropic compression (scale gradient) and convergence cues resolve the ambiguity. We experience the red line as lying on a ridge, and not on a valley.
Homogeneous: the statistics of the texture are uniform from location to location. This is necessary to ensure that changes in the statistics of the texture observed in the image are due solely to the process of projection into the image plane and are not intrinsic to the texture itself Isotropic: the texture does not have a dominant local orientation. This is necessary to ensure that anisotropic compressions are aligned with the depth gradient of the surface Assumptions in Shape from Texture
Illusory distortions of shape Inspired by Todd & Thaler VSS 05
Illusory distortions of shape
Inspired by Todd & Thaler VSS 05 Illusory distortions of shape
Interaction of light with surface
Matte Glossy Mirrored
Confounding Effects of Illumination Identical materials can lead to very different images Different materials can lead to very similar images Images © Ron O. Dror. All rights reserved.
Ambiguity between illumination and Shape
reflectance mapimage Classical Shape from Shading Visual system estimates surface orientation from image intensity
Classical Shape from Shading reflectance map Image intensity is a scalar but surface orientation is a vector Recovering orientation from intensity is under-constrained Large amount of computer vision research proposing ways to reduce this ambiguity Problem: image intensity is ambiguous:
Visual system estimates surface orientation from image intensity Classical Shape from Shading reflectance map Circular logic: estimating the reflectance map requires knowing the geometry. Under typical viewing conditions, it is unclear how well subjects can estimate the reflectance map. Problem: reflectance map is unknown:
Visual system estimates surface orientation from image intensity Classical Shape from Shading reflectance map There is no principled way of predicting when human shape perception should succeed or fail Successes attributed to correct estimation of reflectance map, errors to incorrect estimates of reflectance map. But why and when should this occur? Problem: predicting human perception
Use image measurements other than intensity Use the kinds of image measurements the visual system employs at the front end Alternative approach reflectance mapimage
Mirrors No stereopsis No diffuse shading No texture Nothing but a distorted reflection of the world surrounding the object! Yet we perceive the 3D shape. How? Fleming, Torralba & Adelson (2004). Journal of Vision.
highly curved Curvatures determine distortions
slightly curved Anisotropies in surface curvature lead to powerful distortions of the reflected world Curvatures determine distortions
Eigenvectors of Hessian matrix Intrinsic principal curvatures
image depths
Population codes
Orientation fields Ground truth
3D shape appears to be conveyed by the continuously varying patterns of orientation across the image of a surface
Beyond specularity Specular reflection Diffuse reflection
Orientations in shading
Orientation fields in shading
Reflectance as Illumination Mirrors in an increasingly blurry world
highly curved
slightly curved Anisotropies in surface curvature lead to anisotropies in the image.
Light Warps Vergne, Pacanowski, Barla, Granier & Schlick (2009). Light Warping for enhanced Surface Depiction in SIGGRAPH ’09: ACM SIGGRAPH 2009 Papers. © ACM SIGGRAPH 2009, All rights reserved.
Light Warps Vergne, Pacanowski, Barla, Granier & Schlick (2009). Light Warping for enhanced Surface Depiction in SIGGRAPH ’09: ACM SIGGRAPH 2009 Papers. © ACM SIGGRAPH 2009, All rights reserved.
Apparent Ridges Judd, Durand & Adelson (2007). Apparent Ridges for Line Drawing. ACM Transactions on Graphics: Proceedings of SIGGRAPH © ACM SIGGRAPH 2007, All rights reserved.
Apparent Ridges Judd, Durand & Adelson (2007). Apparent Ridges for Line Drawing. ACM Transactions on Graphics: Proceedings of SIGGRAPH © ACM SIGGRAPH 2007, All rights reserved.
Texture vs. Reflectance
“Shape from Smear”
Higher level shape properties Neither object is physically unstable (falling over) But: one “affords being toppled” more than the other
Perceived Shape is Multi-Scale Coarse Mid Fine
Perceived Shape is Multi-Scale Lee, C. H., Varshney, A. & Jacobs, D. W., Mesh saliency, in SIGGRAPH '05: ACM SIGGRAPH 2005 Papers, pp (New York, NY, USA: ACM, 2005). © ACM SIGGRAPH 2005, All rights reserved. Mesh Saliency
Perceived Shape is Multi-Scale Lee, C. H., Varshney, A. & Jacobs, D. W., Mesh saliency, in SIGGRAPH '05: ACM SIGGRAPH 2005 Papers, pp (New York, NY, USA: ACM, 2005). © ACM SIGGRAPH 2005, All rights reserved. Coarse spatial scaleFine spatial scale Applications : Level of Detail Hiding Watermarks Viewpoint selection
Conclusions There are many different cues to 3D shape, which the human visual system can draw on under typical viewing conditions. Most cues are ambiguous or unreliable if considered in isolation. The secret of conveying shape effectively is to provide multiple cues. Orientation fields may be an important common language in human shape processing. There are probably many other applications in CG that can exploit this. Many of the assumptions made by human vision can be exploited in a computer graphics applications. Richer, more perceptual representations of geometry are an exciting challenge for the future.