Using Clouds Shadows to Infer Scene Structure and Camera Calibration Nathan Jacobs, Brian Bies, Robert Pless jacobsn@cse.wustl.edu http://www.cse.wustl.edu/~jacobsn/
time lapse videos in the wild Today there are hundreds of thousands of outdoor cameras, sitting still and watching the world pass by. I am interested in geolocating and calibrating these cameras and in inferring properties of the scenes they view. webcam dataset: http://amos.cse.wustl.edu
Here is typical time-lapse video of a partly cloud day captured by a static camera. Today I will show you how to turn video like this into a depth map.
related work Bouguet and Perona. ICCV 98 Caspi and Werman. CVPR 06 Kawasaki and Furukawa. IJCV 08 Sunkavalli, Romeiro, Matusik, Zickler and Pfister. CVPR 2008 Koppal and Narasimhan, PAMI 08 Shen and Tan, CVPR 09 There is a lot of related work in inferring geometric properties of scenes using natural phenomena. Here I will highlight a few of the types of cues others have focused on. First, is work that attempts to estimate scene geometry by tracking shadows cast by stationary objects. Second is work that uses photometric cues to estimate surface normals. Third, is work that reasons about the relationship of haze and fog to depth. And, finally, is work that looks how stochastically structured light patterns can be used to Schechner, Narasimhan and Nayar. CVPR 01 Narasimhan and Nayar. CVPR 03 He, Sun and Tang. CVPR 09 Zhang, Curless and Seitz. CVPR 2003 Swirski, Schechner, Herzberg and Negahdaripour. ICCV 09
outline: from clouds to depth maps spatial cue: nearby points see similar clouds depth estimation using gradient descent optimization algorithm depth estimation using linear constraints followed by search temporal delay cue: wind pushes clouds across the scene
spatial cue First Law of Geography: Everything is related to everything else, but near things are more related than distant things. -Waldo Tobler The spatial cue is an instance of the first law of geography which states that
spatial cue The reason this applies to our setting is that the closer two points are in the world the more likely they are to be under the shadow of the same cloud.
temporal correlation is related to distance We can see this relationship in the following false color image. This image was constructed by… These correlation maps are the input to our algorithm. Our algorithm works be explicitly modeling this relationship
What is the relationship between correlation and distance? The big question is “…” The answer is… it depends. Pixel Intensities time
Nonmetric Multidimensional Scaling with projective constraints. algorithm overview Nonmetric Multidimensional Scaling with projective constraints. compute correlation between pairs of pixels estimate focal length and create an initial planar depth map iterate until convergence use current depth map to compute the distance between points estimate correlation to distance mapping update depth map to minimize error in distances correlation distance
detail in buildings no post processing
estimating the correlation to distance mapping expected value of distance given correlation estimate mapping using monotonic regression non-parametric constrained to be monotonically decreasing minimize L1-norm linear programming solution
improving an existing depth map depths correlation to distance mapping distances implied by depth map weights
improving an existing depth map Pixel rays Camera Location
recap of the spatial cue assume: monotonically decreasing relationship between correlation and distance algorithm: compute temporal similarity between pairs of pixels use modified NMDS to estimate a depth map
temporal delay cue x z y Now I will describe the temporal delay cue. Here’s how it works: y sees roughly the same pattern of clouds as x but with a temporal delay z has the same temporal delay as y but sees a slightly different part of the clouds this is the temporal delay cue
linear constraints on location unknown given estimate from images x y W
wind direction estimating delay: find the temporal delay that maximizes correlation hue: estimated temporal delay brightness: confidence in estimate
linear constraints on depth unknown given estimate from images W x rank deficient set of linear constraints z y
from constraints to a depth map Uncertainty one dimensional search! (along the null space) simpler optimization
another depth map from delay cue I just want to take a moment to emphasize what I think is one of the most exciting aspects of the temporal delay cue. Because the delay constraints are based on wind velocity the depths we estimate have units of meters. Estimating metric depth from a single camera view notoriously challenging task, and we have introduced a new cue that makes it possible.
summary: depth from clouds two new cues for depth estimation spatial cue works with very low frame rate NMDS + projective constraints temporal delay cue requires higher frame rate simpler optimization possibility of metric depth What is it? Why can we do it? Why is it cool? Why is it important?
Questions? acknowledgements funding: NSF IIS-0546383 time lapse sequences: Martin Setvak Nathan Jacobs http://www.cse.wustl.edu/~jacobsn/
finding an initial depth map pairwise distances correlation lowest error
the ambiguity in the depth map
null space search null space of constraints