CSE 576 Ali Farhadi Several slides from Larry Zitnick and Steve Seitz

Slides:



Advertisements
Similar presentations
Epipolar Geometry.
Advertisements

Lecture 11: Two-view geometry
Stereo Vision Reading: Chapter 11
Gratuitous Picture US Naval Artillery Rangefinder from World War I (1918)!!
Stereo Many slides adapted from Steve Seitz. Binocular stereo Given a calibrated binocular stereo pair, fuse it to produce a depth image Where does the.
Stereo and Projective Structure from Motion
Jan-Michael Frahm, Enrique Dunn Spring 2012
Two-view geometry.
Lecture 8: Stereo.
Stereo.
Camera calibration and epipolar geometry
Structure from motion.
Last Time Pinhole camera model, projection
Multiple View Geometry : Computational Photography Alexei Efros, CMU, Fall 2005 © Martin Quinn …with a lot of slides stolen from Steve Seitz and.
Epipolar geometry. (i)Correspondence geometry: Given an image point x in the first view, how does this constrain the position of the corresponding point.
Structure from motion. Multiple-view geometry questions Scene geometry (structure): Given 2D point matches in two or more images, where are the corresponding.
Lecture 21: Multiple-view geometry and structure from motion
The plan for today Camera matrix
3D from multiple views : Rendering and Image Processing Alexei Efros …with a lot of slides stolen from Steve Seitz and Jianbo Shi.
Stereo CSE P 576 Larry Zitnick Many slides courtesy of Steve Seitz.
Stereo and Structure from Motion
May 2004Stereo1 Introduction to Computer Vision CS / ECE 181B Tuesday, May 11, 2004  Multiple view geometry and stereo  Handout #6 available (check with.
CSE473/573 – Stereo Correspondence
Stereo Guest Lecture by Li Zhang
Project 1 artifact winners Project 2 questions Project 2 extra signup slots –Can take a second slot if you’d like Announcements.
Multiple View Geometry : Computational Photography Alexei Efros, CMU, Fall 2006 © Martin Quinn …with a lot of slides stolen from Steve Seitz and.
CS 558 C OMPUTER V ISION Lecture IX: Dimensionality Reduction.
3-D Scene u u’u’ Study the mathematical relations between corresponding image points. “Corresponding” means originated from the same 3D point. Objective.
Epipolar Geometry and Stereo Vision Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/12/11 Many slides adapted from Lana Lazebnik,
Multi-view geometry. Multi-view geometry problems Structure: Given projections of the same 3D point in two or more images, compute the 3D coordinates.
Computer Vision Spring ,-685 Instructor: S. Narasimhan WH 5409 T-R 10:30am – 11:50am Lecture #15.
Stereo Readings Trucco & Verri, Chapter 7 –Read through 7.1, 7.2.1, 7.2.2, 7.3.1, 7.3.2, and 7.4, –The rest is optional. Single image stereogram,
Epipolar Geometry and Stereo Vision Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/05/15 Many slides adapted from Lana Lazebnik,
Epipolar Geometry and Stereo Vision Computer Vision ECE 5554 Virginia Tech Devi Parikh 10/15/13 These slides have been borrowed from Derek Hoiem and Kristen.
Structure from images. Calibration Review: Pinhole Camera.
Multi-view geometry.
Lecture 12 Stereo Reconstruction II Lecture 12 Stereo Reconstruction II Mata kuliah: T Computer Vision Tahun: 2010.
Recap from Monday Image Warping – Coordinate transforms – Linear transforms expressed in matrix form – Inverse transforms useful when synthesizing images.
Stereo Vision Reading: Chapter 11 Stereo matching computes depth from two or more images Subproblems: –Calibrating camera positions. –Finding all corresponding.
Lecture 04 22/11/2011 Shai Avidan הבהרה : החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.
Multiview Geometry and Stereopsis. Inputs: two images of a scene (taken from 2 viewpoints). Output: Depth map. Inputs: multiple images of a scene. Output:
Stereo Readings Szeliski, Chapter 11 (through 11.5) Single image stereogram, by Niklas EenNiklas Een.
Stereo Many slides adapted from Steve Seitz.
Project 2 code & artifact due Friday Midterm out tomorrow (check your ), due next Fri Announcements TexPoint fonts used in EMF. Read the TexPoint.
Stereo Many slides adapted from Steve Seitz. Binocular stereo Given a calibrated binocular stereo pair, fuse it to produce a depth image image 1image.
Computer Vision, Robert Pless
Single-view geometry Odilon Redon, Cyclops, 1914.
CSE 185 Introduction to Computer Vision Stereo. Taken at the same time or sequential in time stereo vision structure from motion optical flow Multiple.
Bahadir K. Gunturk1 Phase Correlation Bahadir K. Gunturk2 Phase Correlation Take cross correlation Take inverse Fourier transform  Location of the impulse.
Two-view geometry. Epipolar Plane – plane containing baseline (1D family) Epipoles = intersections of baseline with image planes = projections of the.
Lecture 16: Stereo CS4670 / 5670: Computer Vision Noah Snavely Single image stereogram, by Niklas EenNiklas Een.
stereo Outline : Remind class of 3d geometry Introduction
Digital Image Processing
Solving for Stereo Correspondence Many slides drawn from Lana Lazebnik, UIUC.
Project 2 due today Project 3 out today Announcements TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAA.
Multiple View Geometry and Stereo. Overview Single camera geometry – Recap of Homogenous coordinates – Perspective projection model – Camera calibration.
Energy minimization Another global approach to improve quality of correspondences Assumption: disparities vary (mostly) smoothly Minimize energy function:
CSE 185 Introduction to Computer Vision Stereo 2.
Stereo CS4670 / 5670: Computer Vision Noah Snavely Single image stereogram, by Niklas EenNiklas Een.
Multi-view geometry. Multi-view geometry problems Structure: Given projections of the same 3D point in two or more images, compute the 3D coordinates.
Noah Snavely, Zhengqi Li
Epipolar Geometry and Stereo Vision
Stereo CSE 455 Linda Shapiro.
CS4670 / 5670: Computer Vision Kavita Bala Lec 27: Stereo.
CSE 455 Ali Farhadi Several slides from Larry Zitnick and Steve Seitz
Stereo and Structure from Motion
Two-view geometry.
Two-view geometry.
Multi-view geometry.
Stereo vision Many slides adapted from Steve Seitz.
Presentation transcript:

CSE 576 Ali Farhadi Several slides from Larry Zitnick and Steve Seitz Stereo CSE 576 Ali Farhadi Several slides from Larry Zitnick and Steve Seitz

Why do we perceive depth?

What do humans use as depth cues? Motion Convergence When watching an object close to us, our eyes point slightly inward. This difference in the direction of the eyes is called convergence. This depth cue is effective only on short distances (less than 10 meters). Binocular Parallax As our eyes see the world from slightly different locations, the images sensed by the eyes are slightly different. This difference in the sensed images is called binocular parallax. Human visual system is very sensitive to these differences, and binocular parallax is the most important depth cue for medium viewing distances. The sense of depth can be achieved using binocular parallax even if all other depth cues are removed. Monocular Movement Parallax If we close one of our eyes, we can perceive depth by moving our head. This happens because human visual system can extract depth information in two similar images sensed after each other, in the same way it can combine two images from different eyes. Focus Accommodation Accommodation is the tension of the muscle that changes the focal length of the lens of eye. Thus it brings into focus objects at different distances. This depth cue is quite weak, and it is effective only at short viewing distances (less than 2 meters) and with other cues. Marko Teittinen http://www.hitl.washington.edu/scivw/EVE/III.A.1.c.DepthCues.html

What do humans use as depth cues? Image cues Retinal Image Size When the real size of the object is known, our brain compares the sensed size of the object to this real size, and thus acquires information about the distance of the object. Linear Perspective When looking down a straight level road we see the parallel sides of the road meet in the horizon. This effect is often visible in photos and it is an important depth cue. It is called linear perspective. Texture Gradient The closer we are to an object the more detail we can see of its surface texture. So objects with smooth textures are usually interpreted being farther away. This is especially true if the surface texture spans all the distance from near to far. Overlapping When objects block each other out of our sight, we know that the object that blocks the other one is closer to us. The object whose outline pattern looks more continuous is felt to lie closer. Aerial Haze The mountains in the horizon look always slightly bluish or hazy. The reason for this are small water and dust particles in the air between the eye and the mountains. The farther the mountains, the hazier they look. Shades and Shadows When we know the location of a light source and see objects casting shadows on other objects, we learn that the object shadowing the other is closer to the light source. As most illumination comes downward we tend to resolve ambiguities using this information. The three dimensional looking computer user interfaces are a nice example on this. Also, bright objects seem to be closer to the observer than dark ones. Jonathan Chiu Marko Teittinen http://www.hitl.washington.edu/scivw/EVE/III.A.1.c.DepthCues.html

Amount of horizontal movement is … …inversely proportional to the distance from the camera

Cameras Thin lens equation: Any object point satisfying this equation is in focus manipulate eq to show shape of focus region

Depth from Stereo Goal: recover depth by finding image coordinate x’ that corresponds to x X x x' f x x’ Baseline B z C C’ X

Depth from disparity Disparity is inversely proportional to depth. X z Baseline B O’ Disparity is inversely proportional to depth.

Depth from Stereo Goal: recover depth by finding image coordinate x’ that corresponds to x Sub-Problems Calibration: How do we recover the relation of the cameras (if not already known)? Correspondence: How do we search for the matching point x’? X x x'

Correspondence Problem x ? We have two images taken from cameras with different intrinsic and extrinsic parameters How do we match a point in the first image to a point in the second? How can we constrain our search?

Key idea: Epipolar constraint X X X x x’ x’ x’ Potential matches for x have to lie on the corresponding line l’. Potential matches for x’ have to lie on the corresponding line l.

Epipolar geometry: notation X x x’ Baseline – line connecting the two camera centers Epipoles = intersections of baseline with image planes = projections of the other camera center Epipolar Plane – plane containing baseline (1D family)

Epipolar geometry: notation X x x’ Baseline – line connecting the two camera centers Epipoles = intersections of baseline with image planes = projections of the other camera center Epipolar Plane – plane containing baseline (1D family) Epipolar Lines - intersections of epipolar plane with image planes (always come in corresponding pairs)

Example: Converging cameras

Example: Motion parallel to image plane

Example: Forward motion What would the epipolar lines look like if the camera moves directly forward?

Example: Motion perpendicular to image plane

Example: Motion perpendicular to image plane Points move along lines radiating from the epipole: “focus of expansion” Epipole is the principal point

Example: Forward motion Epipole has same coordinates in both images. Points move along lines radiating from “Focus of expansion”

Epipolar constraint X x x’ If we observe a point x in one image, where can the corresponding point x’ be in the other image?

Epipolar constraint X X X x x’ x’ x’ Potential matches for x have to lie on the corresponding epipolar line l’. Potential matches for x’ have to lie on the corresponding epipolar line l.

Epipolar constraint example

Camera parameters How many numbers do we need to describe a camera? We need to describe its pose in the world We need to describe its internal parameters

A Tale of Two Coordinate Systems v w u x y z COP Camera o Two important coordinate systems: 1. World coordinate system 2. Camera coordinate system “The World”

Camera parameters To project a point (x,y,z) in world coordinates into a camera First transform (x,y,z) into camera coordinates Need to know Camera position (in world coordinates) Camera orientation (in world coordinates) Then project into the image plane Need to know camera intrinsics These can all be described with matrices

Camera parameters A camera is described by several parameters Translation T of the optical center from the origin of world coords Rotation R of the image plane focal length f, principle point (x’c, y’c), pixel size (sx, sy) blue parameters are called “extrinsics,” red are “intrinsics” Projection equation The projection matrix models the cumulative effect of all parameters Useful to decompose into a series of operations projection intrinsics rotation translation identity matrix The definitions of these parameters are not completely standardized especially intrinsics—varies from one book to another

Extrinsics How do we get the camera to “canonical form”? (Center of projection at the origin, x-axis points right, y-axis points up, z-axis points backwards) Step 1: Translate by -c

Extrinsics How do we get the camera to “canonical form”? (Center of projection at the origin, x-axis points right, y-axis points up, z-axis points backwards) Step 1: Translate by -c How do we represent translation as a matrix multiplication?

Extrinsics How do we get the camera to “canonical form”? (Center of projection at the origin, x-axis points right, y-axis points up, z-axis points backwards) Step 1: Translate by -c Step 2: Rotate by R 3x3 rotation matrix

Extrinsics How do we get the camera to “canonical form”? (Center of projection at the origin, x-axis points right, y-axis points up, z-axis points backwards) Step 1: Translate by -c Step 2: Rotate by R

Perspective projection (converts from 3D rays in camera coordinate system to pixel coordinates) (intrinsics) in general, (upper triangular matrix) : aspect ratio (1 unless pixels are not square) : skew (0 unless pixels are shaped like rhombi/parallelograms) : principal point ((0,0) unless optical axis doesn’t intersect projection plane at origin)

Projection matrix intrinsics projection rotation translation

Projection matrix = (in homogeneous image coordinates)

Epipolar constraint example

Epipolar constraint: Calibrated case X x x’ Assume that the intrinsic and extrinsic parameters of the cameras are known We can multiply the projection matrix of each camera (and the image points) by the inverse of the calibration matrix to get normalized image coordinates We can also set the global coordinate system to the coordinate system of the first camera. Then the projection matrices of the two cameras can be written as [I | 0] and [R | t]

Epipolar constraint: Calibrated case X = (x,1)T x x’ = Rx+t t R X’ is X in the second camera’s coordinate system We can identify the non-homogeneous 3D vectors X and X’ with the homogeneous coordinate vectors x and x’ of the projections of the two points into the two respective images The vectors Rx, t, and x’ are coplanar

Epipolar constraint: Calibrated case X x x’ Essential Matrix (Longuet-Higgins, 1981) The vectors Rx, t, and x’ are coplanar

Epipolar constraint: Calibrated case X x x’ E x is the epipolar line associated with x (l' = E x) ETx' is the epipolar line associated with x' (l = ETx') E e = 0 and ETe' = 0 E is singular (rank two) E has five degrees of freedom

Epipolar constraint: Uncalibrated case X x x’ The calibration matrices K and K’ of the two cameras are unknown We can write the epipolar constraint in terms of unknown normalized coordinates:

Epipolar constraint: Uncalibrated case X x x’ Fundamental Matrix (Faugeras and Luong, 1992)

Epipolar constraint: Uncalibrated case X x x’ F x is the epipolar line associated with x (l' = F x) FTx' is the epipolar line associated with x' (l' = FTx') F e = 0 and FTe' = 0 F is singular (rank two) F has seven degrees of freedom

The eight-point algorithm Smallest eigenvalue of ATA Minimize: under the constraint ||F||2=1

The eight-point algorithm Meaning of error sum of squared algebraic distances between points x’i and epipolar lines F xi (or points xi and epipolar lines FTx’i) Nonlinear approach: minimize sum of squared geometric distances

Problem with eight-point algorithm

Problem with eight-point algorithm Poor numerical conditioning Can be fixed by rescaling the data

The normalized eight-point algorithm (Hartley, 1995) Center the image data at the origin, and scale it so the mean squared distance between the origin and the data points is 2 pixels Use the eight-point algorithm to compute F from the normalized points Enforce the rank-2 constraint (for example, take SVD of F and throw out the smallest singular value) Transform fundamental matrix back to original units: if T and T’ are the normalizing transformations in the two images, then the fundamental matrix in original coordinates is T’T F T

Comparison of estimation algorithms 8-point Normalized 8-point Nonlinear least squares Av. Dist. 1 2.33 pixels 0.92 pixel 0.86 pixel Av. Dist. 2 2.18 pixels 0.85 pixel 0.80 pixel

Moving on to stereo… Fuse a calibrated binocular stereo pair to produce a depth image image 1 image 2 Dense depth map Many of these slides adapted from Steve Seitz and Lana Lazebnik

Depth from disparity Disparity is inversely proportional to depth. X z Baseline B O’ Disparity is inversely proportional to depth.

Basic stereo matching algorithm If necessary, rectify the two stereo images to transform epipolar lines into scanlines For each pixel x in the first image Find corresponding epipolar scanline in the right image Search the scanline and pick the best match x’ Compute disparity x-x’ and set depth(x) = fB/(x-x’)

Basic stereo matching algorithm For each pixel in the first image Find corresponding epipolar line in the right image Search along epipolar line and pick the best match Triangulate the matches to get depth information Simplest case: epipolar lines are scanlines When does this happen?

Simplest Case: Parallel images Epipolar constraint: R = I t = (T, 0, 0) x x’ t The y-coordinates of corresponding points are the same

Stereo image rectification

Stereo image rectification Reproject image planes onto a common plane parallel to the line between camera centers Pixel motion is horizontal after this transformation Two homographies (3x3 transform), one for each input image reprojection C. Loop and Z. Zhang. Computing Rectifying Homographies for Stereo Vision. IEEE Conf. Computer Vision and Pattern Recognition, 1999.

Example Unrectified Rectified

Correspondence search Left Right scanline Matching cost disparity Slide a window along the right scanline and compare contents of that window with the reference window in the left image Matching cost: SSD or normalized correlation

Correspondence search Left Right scanline SSD

Correspondence search Left Right scanline Norm. corr

Effect of window size W = 3 W = 20 Smaller window Larger window + More detail More noise Larger window + Smoother disparity maps Less detail Fails near boundaries smaller window: more detail, more noise bigger window: less noise, more detail

Failures of correspondence search Occlusions, repetition Textureless surfaces Non-Lambertian surfaces, specularities

Results with window search Data Window-based matching Ground truth

How can we improve window-based matching? So far, matches are independent for each point What constraints or priors can we add?

Stereo constraints/priors Uniqueness For any point in one image, there should be at most one matching point in the other image

Stereo constraints/priors Uniqueness For any point in one image, there should be at most one matching point in the other image Ordering Corresponding points should be in the same order in both views

Stereo constraints/priors Uniqueness For any point in one image, there should be at most one matching point in the other image Ordering Corresponding points should be in the same order in both views Ordering constraint doesn’t hold

Priors and constraints Uniqueness For any point in one image, there should be at most one matching point in the other image Ordering Corresponding points should be in the same order in both views Smoothness We expect disparity values to change slowly (for the most part)

Stereo as energy minimization What defines a good stereo correspondence? Match quality Want each pixel to find a good match in the other image Smoothness If two pixels are adjacent, they should (usually) move about the same amount

Stereo as energy minimization Better objective function { { match cost smoothness cost Want each pixel to find a good match in the other image Adjacent pixels should (usually) move about the same amount

Stereo as energy minimization match cost: SSD distance between windows I(x, y) and J(x, y + d(x,y)) = smoothness cost: : set of neighboring pixels 4-connected neighborhood 8-connected neighborhood

Smoothness cost L1 distance “Potts model”

Dynamic programming Can minimize this independently per scanline using dynamic programming (DP) : minimum cost of solution such that d(x,y) = d

Energy minimization via graph cuts edge weight d1 d2 d3 edge weight Labels (disparities)

Energy minimization via graph cuts d1 d2 d3 Graph Cut Delete enough edges so that each pixel is connected to exactly one label node Cost of a cut: sum of deleted edge weights Finding min cost cut equivalent to finding global minimum of energy function

Stereo as energy minimization I(x, y) J(x, y) y = 141 C(x, y, d); the disparity space image (DSI) x d

Stereo as energy minimization d x Simple pixel / window matching: choose the minimum of each column in the DSI independently:

Matching windows Similarity Measure Formula Sum of Absolute Differences (SAD) Sum of Squared Differences (SSD) Zero-mean SAD Locally scaled SAD Normalized Cross Correlation (NCC) smaller window: more detail, more noise bigger window: less noise, more detail SAD SSD NCC Ground truth http://siddhantahuja.wordpress.com/category/stereo-vision/

Before & After Before Graph cuts Ground truth Y. Boykov, O. Veksler, and R. Zabih, Fast Approximate Energy Minimization via Graph Cuts, PAMI 2001 For the latest and greatest: http://www.middlebury.edu/stereo/

Real-time stereo Used for robot navigation (and other tasks) Nomad robot searches for meteorites in Antartica http://www.frc.ri.cmu.edu/projects/meteorobot/index.html Used for robot navigation (and other tasks) Several software-based real-time stereo techniques have been developed (most based on simple discrete search)

Why does stereo fail? Fronto-Parallel Surfaces: Depth is constant within the region of local support

Why does stereo fail? Monotonic Ordering - Points along an epipolar scanline appear in the same order in both stereo images Occlusion – All points are visible in each image

Why does stereo fail? Image Brightness Constancy: Assuming Lambertian surfaces, the brightness of corresponding points in stereo images are the same.

Why does stereo fail? Match Uniqueness: For every point in one stereo image, there is at most one corresponding point in the other image.

Stereo reconstruction pipeline Steps Calibrate cameras Rectify images Compute disparity Estimate depth What will cause errors? Camera calibration errors Poor image resolution Occlusions Violations of brightness constancy (specular reflections) Large motions Low-contrast image regions

Choosing the stereo baseline all of these points project to the same pair of pixels width of a pixel Large Baseline Small Baseline What’s the optimal baseline? Too small: large depth error Too large: difficult search problem

Multi-view stereo ?

Beyond two-view stereo The third view can be used for verification

Using more than two images Multi-View Stereo for Community Photo Collections M. Goesele, N. Snavely, B. Curless, H. Hoppe, S. Seitz Proceedings of ICCV 2007,