Advanced Computer Vision

Slides:

Advertisements

Similar presentations

Epipolar Geometry.

Advertisements

The fundamental matrix F

Lecture 11: Two-view geometry

3D reconstruction.

MASKS © 2004 Invitation to 3D vision Lecture 7 Step-by-Step Model Buidling.

Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.

Two-View Geometry CS Sastry and Yang

Two-view geometry.

Camera calibration and epipolar geometry

Structure from motion.

Epipolar geometry. (i)Correspondence geometry: Given an image point x in the first view, how does this constrain the position of the corresponding point.

Structure from motion. Multiple-view geometry questions Scene geometry (structure): Given 2D point matches in two or more images, where are the corresponding.

Uncalibrated Geometry & Stratification Sastry and Yang

Multiple View Geometry Marc Pollefeys University of North Carolina at Chapel Hill Modified by Philippos Mordohai.

Multiple-view Reconstruction from Points and Lines

Many slides and illustrations from J. Ponce

Previously Two view geometry: epipolar geometry Stereo vision: 3D reconstruction epipolar lines Baseline O O’ epipolar plane.

Multiple View Geometry Marc Pollefeys University of North Carolina at Chapel Hill Modified by Philippos Mordohai.

Single-view geometry Odilon Redon, Cyclops, 1914.

CSCE 641 Computer Graphics: Image-based Modeling (Cont.) Jinxiang Chai.

Structure Computation. How to compute the position of a point in 3- space given its image in two views and the camera matrices of those two views Use.

CS 558 C OMPUTER V ISION Lecture IX: Dimensionality Reduction.

3-D Scene u u’u’ Study the mathematical relations between corresponding image points. “Corresponding” means originated from the same 3D point. Objective.

Multi-view geometry. Multi-view geometry problems Structure: Given projections of the same 3D point in two or more images, compute the 3D coordinates.

776 Computer Vision Jan-Michael Frahm, Enrique Dunn Spring 2013.

Automatic Camera Calibration

Computer vision: models, learning and inference

Lecture 11 Stereo Reconstruction I Lecture 11 Stereo Reconstruction I Mata kuliah: T Computer Vision Tahun: 2010.

Epipolar geometry The fundamental matrix and the tensor

1 Preview At least two views are required to access the depth of a scene point and in turn to reconstruct scene structure Multiple views can be obtained.

Projective cameras Motivation Elements of Projective Geometry Projective structure from motion Planches : –

Course 12 Calibration. 1.Introduction In theoretic discussions, we have assumed: Camera is located at the origin of coordinate system of scene.

Advanced Computer Vision Structure from Motion1 Chapter 7 S TRUCTURE FROM M OTION Presented by Prof. Chiou-Shann Fuh & Pradnya Borade

CSCE 643 Computer Vision: Structure from Motion

3D Reconstruction Jeff Boody. Goals ● Reconstruct 3D models from a sequence of at least two images ● No prior knowledge of the camera or scene ● Use the.

Lecture 03 15/11/2011 Shai Avidan הבהרה : החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.

Geometry of Multiple Views

Affine Structure from Motion

Single-view geometry Odilon Redon, Cyclops, 1914.

Advanced Computer Vision Structure from Motion1 Chapter 7 S TRUCTURE FROM M OTION.

Raquel A. Romano 1 Scientific Computing Seminar May 12, 2004 Projective Geometry for Computer Vision Projective Geometry for Computer Vision Raquel A.

Two-view geometry. Epipolar Plane – plane containing baseline (1D family) Epipoles = intersections of baseline with image planes = projections of the.

EECS 274 Computer Vision Affine Structure from Motion.

Computer vision: models, learning and inference M Ahad Multiple Cameras

776 Computer Vision Jan-Michael Frahm & Enrique Dunn Spring 2013.

Geometry Reconstruction March 22, Fundamental Matrix An important problem: Determine the epipolar geometry. That is, the correspondence between.

Uncalibrated reconstruction Calibration with a rig Uncalibrated epipolar geometry Ambiguities in image formation Stratified reconstruction Autocalibration.

Structure from motion Multi-view geometry Affine structure from motion Projective structure from motion Planches : –

EECS 274 Computer Vision Projective Structure from Motion.

Correspondence and Stereopsis. Introduction Disparity – Informally: difference between two pictures – Allows us to gain a strong sense of depth Stereopsis.

Calibrating a single camera

Geometric Camera Calibration

55:148 Digital Image Processing Chapter 11 3D Vision, Geometry

Advanced Computer Vision

René Vidal and Xiaodong Fan Center for Imaging Science

The Brightness Constraint

Two-view geometry Computer Vision Spring 2018, Lecture 10

Epipolar geometry.

Structure from motion Input: Output: (Tomasi and Kanade)

The Brightness Constraint

3D reconstruction class 11

Multiple View Geometry for Robotics

Uncalibrated Geometry & Stratification

Reconstruction.

Two-view geometry.

Multi-view geometry.

Single-view geometry Odilon Redon, Cyclops, 1914.

Structure from motion Input: Output: (Tomasi and Kanade)

Lecture 15: Structure from motion

Presentation transcript:

Advanced Computer Vision Chapter 7 STRUCTURE FROM MOTION Presented by Prof. Chiou-Shann Fuh & Jia-Yau Shiau (蕭家堯) r05943148@ntu.edu.tw Structure from Motion

Advanced Computer Vision https://goo.gl/x2Usuk Structure from Motion

What Is Structure from Motion? To be brief A method for creating 3D models from 2D pictures of an object. Structure from Motion

Example Picture 1 Picture 2 Structure from Motion

Example (cont). 3D model created from the two images Structure from Motion

What Is Structure from Motion? To be brief A method for creating 3D models from 2D pictures of an object. To be formal An automatic recovery of camera motion and scene structure from two or more images. A self calibration technique and called automatic camera tracking or match-moving. Structure from Motion

What Is Structure from Motion? Let K, R, t denote the parameters of camera X denotes the 3D position of the object x denotes the 2D position of the object Then Triangulation: estimate X with known x, K, R, t Camera Calibration: estimate K, R, t with known x, X Structure from Motion: estimate K, R, t, X with known x Structure from Motion

What Is Structure from Motion? 2D feature tracking 3D estimation optimization geometry fitting Essential matrix Triangulation Bundle adjust https://www.youtube.com/watch?v=b3GH7VGZi1c Structure from Motion

Agenda Camera Parameters Triangulation Two-frame Structure from Motion Factorization Bundle Adjustment Constrained Structure and Motion Structure from Motion

Camera Parameters Pinhole camera model Canon EF 16-35mm f/2.8L II USM 簡單來說，給定camera的pose，計算物體在3D中的位子。 Canon EF 16-35mm f/2.8L II USM Pinhole camera model Structure from Motion

Camera Parameters Structure from Motion 簡單來說，給定camera的pose，計算物體在3D中的位子。 Structure from Motion

Camera Parameters Intrinsic matric K R t Structure from Motion K R t 簡單來說，給定camera的pose，計算物體在3D中的位子。 Intrinsic matric Structure from Motion

What Is Structure from Motion? 2D feature tracking 3D estimation optimization geometry fitting Triangulation Structure from Motion

7.1Triangulation A problem of estimating a point’s 3D location when it is seen from multiple cameras is known as triangulation. It is a converse of pose estimation problem. Given projection matrices, 3D points can be computed from their measured image positions in two or more views. 簡單來說，給定camera的pose，計算物體在3D中的位子。 Structure from Motion

7.1Triangulation Structure from Motion

7.1Triangulation Find the 3D point p that lies closest to all of the 3D rays corresponding to the 2D matching feature locations {xj} observed by cameras {Pj = Kj [Rj | tj] } tj = -Rjcj cj is the jth camera center. KRT 相機參數 Structure from Motion

7.1Triangulation Suppose camera j Structure from Motion

7.1Triangulation p c d Goal: minimized the squared distance between p and q Structure from Motion

Method 1 (cont.) Structure from Motion

Method 1 (cont.) The squared distance between p and qj is The optimal value for p, which lies closest to all of the rays, can be computed as a regular least square problem by summing over all the rj2 and finding the optimal value of p, Structure from Motion

Method 2 (cont.) Structure from Motion

Method 2 (cont.) Structure from Motion

Example: Triangulation https://www.youtube.com/watch?v=uq9SEJxZiUg Structure from Motion

What Is Structure from Motion? 2D feature tracking 3D estimation optimization geometry fitting Structure from Motion

Epipolar Geometry Structure from Motion

Epipolar Geometry Epipolar plane Structure from Motion

Epipolar Geometry Epipolar line 對應點找 Structure from Motion

Epipolar Geometry Baseline Structure from Motion 所有Epipolar plane都會通過Baseline Baseline Structure from Motion

Epipolar Geometry Epipolar pole Structure from Motion 所有Epipolar line都會通過pole Epipolar pole Structure from Motion

Epipolar Geometry Structure from Motion

Two-Frame Structure from Motion (cont). 由兩個不同的角度來看同一個物件，其所構成的投影幾何就稱為 Epipolar Geometry。 Warning of Mathematic Structure from Motion

Two-Frame Structure from Motion (cont). Structure from Motion

Two-Frame Structure from Motion (cont). Structure from Motion

Two-Frame Structure from Motion (cont). Cross product (外積) 得到垂直向量 Dot product (內積) 垂直得到0 Structure from Motion

Two-Frame Structure from Motion (cont). Cross product (外積) 得到垂直向量 Dot product (內積) 垂直得到0 Structure from Motion

Two-Frame Structure from Motion (cont). Cross product with t Cross product (外積) 得到垂直向量 Dot product (內積) 垂直得到0 Structure from Motion

Two-Frame Structure from Motion (cont). Cross product (外積) 得到垂直向量 Dot product (內積) 垂直得到0 Structure from Motion

Two-Frame Structure from Motion (cont). https://www.csie.ntu.edu.tw/~cyy/courses/vfx/05spring/lectures/scribe/08scribe2.pdf Two-Frame Structure from Motion (cont). Fundamental matrix 包含相機參數 Structure from Motion

Two-Frame Structure from Motion (cont). Structure from Motion

Two-Frame Structure from Motion (cont). In the matrix, some entries are product of image measurement such as xi0yi1 and others are direct image measurements (even identity). Given N>8 such equation, we can compute an estimate for the entire E. Structure from Motion

Two-Frame Structure from Motion (cont). https://www.csie.ntu.edu.tw/~cyy/courses/vfx/16spring/lectures/ Two-Frame Structure from Motion (cont). Structure from Motion

Singular Value Decomposition (Brief) https://ccjou.wordpress.com/2009/09/01/%E5%A5%87%E7%95%B0%E5%80%BC%E5%88%86%E8%A7%A3-svd/ Singular Value Decomposition (Brief) Structure from Motion

Two-Frame Structure from Motion (cont). If the measurements have noise, the terms that are product of measurement have their noise amplified by the other element in the product, which lead to poor scaling. In order to deal with this, a suggestion is that the point coordinate should be translated and scaled so that their centroid lies at the origin, variance is unity; i.e. Structure from Motion

Two-Frame Structure from Motion (cont). such that and n= number of points. Once the essential matrix has been computed from the transformed coordinates; the original essential matrix E can be recovered as Structure from Motion

Two-Frame Structure from Motion (cont). When the essential matrix has been recovered, the direction of the translation vector t can be estimated. The absolute distance between two cameras can never be recovered from pure image measurement alone. Ground control points in Photogrammetry: knowledge about absolute camera, point positions or distances. Required to establish the final scale, position and orientation. Structure from Motion

Two-Frame Structure from Motion (cont). Structure from Motion

Pure Translation Figure 7.9: Pure translation camera motion results in visual motion where all the points move towards (or away from) a common focus of expansion (FOE). 擴展焦點（focus of expansion） Structure from Motion

Focus of Expansion https://www.youtube.com/watch?v=GIUDAZLfYhY Structure from Motion

Pure Translation (cont). Known rotation: The resulting essential matrix E is (in the noise-free case) skew symmetric and can estimate more directly by setting eij= -eji and eii = 0. Two-point parallax now suffices to estimate the FOE. Structure from Motion

Pure Translation (cont). A more direct derivation of FOE estimates can be obtained by minimizing the triple product. which is equivalent to finding null space for the set of equations Structure from Motion

Pure Translation (cont). In a situation where large number of points at infinity are available, (when the camera motion is small compared to distant objects, this suggests a strategy. Pick a pair of points to estimate a rotation, hoping that both of the points lie at infinity (very far from camera). Then compute FOE and check whether residual error is small and whether the motions towards or away from the epipoler (FOE) are all in the same direction. Structure from Motion

Pure Rotation This results in a degenerate estimate of the essential matrix E and the translation direction. If we consider that the rotation matrix is known, the estimates for the FOE will be degenerate, since and hence is degenerate. Structure from Motion

Example: Structure from Motion https://www.youtube.com/watch?v=SATijfXnshg CSE 576, Spring 2008 Structure from Motion

Self-calibration https://www.youtube.com/watch?v=p1kCR1i2nF0 Auto-calibration is developed for covering a projective reconstruction into a metric one, which is equivalent to recovering the unknown calibration matrix Kj associated with each image. In the presence of additional information about scene, different methods can be applied. If there are parallel lines in the scene, three or more vanishing points, which are the images of points at infinity, can be used to establish homography for the plane at infinity, from which focal length and rotation can be recovered. Structure from Motion

Self-calibration (cont). In the absence of external information: consider all sets of camera matrices Pj = Kj[ Rj | tj ] projecting world coordinates pi=(Xi,Yi,Zi,Wi) into screen coordinates xij ~ Pjpi. Consider transforming the 3D scene {pi} through an arbitrary 4 4 projective transformation yielding a new model consisting of points Structure from Motion

Self-calibration (cont). A technique that can recover the focal lengths (f0,f1) of both images from fundamental matrix F in a two-frame reconstruction. Assume that camera has zero skew, a known aspect ratio, and known optical center. Most cameras have square pixels and an optical center near middle of image and are likely to deviate from simple camera model due to radial distortion Problem occurs when images have been cropped off-center. Structure from Motion

Application: View Morphing Application of basic two-frame structure from motion. Also known as view interpolation. Used to generate a smooth 3D animation from one view of a 3D scene to another. To create such a transition: smoothly interpolate camera matrices, i.e., camera position, orientation, focal lengths. More effect is obtained by easing in and easing out camera parameters. https://www.youtube.com/watch?v=pPlkdRK8haU Structure from Motion

Application: View Morphing Triangulate set of matched feature points in each image . To generate in-between frames: establish full set of 3D correspondences or 3D models for each reference view. As the 3D points are re-projected into their intermediate views, pixels can be mapped from their original source images to their new views using affine or projective mapping. The final image then composited using linear blend of the two reference images as with usual morphing. Structure from Motion

Factorization When processing video sequences, we often get extended feature track from which it is possible to recover the structure and motion using a process called factorization. Structure from Motion

Factorization (cont.) Figure 7.10: 3D reconstruction of a rotating ping pong ball using factorization (Tomasi and Kanade 1992) : (a) sample image with tracked features overlaid; (b)sub-sampled feature motion stream ; (c) two views of the reconstructed 3D model. Structure from Motion

Factorization (cont.) A disadvantage is they require a complete set of tracks i.e., each point must be visible in each frame, in order for the factorization approach to work. Structure from Motion

Factorization (cont.) 2x1 2x3 3x1 2x1 Structure from Motion

Factorization (cont.) Warning of Mathematic Structure from Motion

Factorization (cont.) shifting Suppose we have n feature in one image: Suppose we have n feature in one image: Structure from Motion

Factorization (cont.) Now, we have m images: Structure from Motion

Factorization (cont.) Structure from Motion https://www.csie.ntu.edu.tw/~cyy/courses/vfx/16spring/lectures/ Factorization (cont.) Structure from Motion

Perspective and Projective Factorization Factorization disadvantage is that it cannot deal with perspective cameras. Perform an initial affine (e.g.,orthographic) reconstruction and to then correct for the perspective effects in an iterative manner. Structure from Motion

What Is Structure from Motion? 2D feature tracking 3D estimation optimization geometry fitting Structure from Motion

Bundle Adjustment The most accurate way to recover structure from motion is to perform robust nonlinear minimization of the measurement (re-projection) errors, which is known as photogrammetry (in computer vision) communities as bundle adjustment. Our feature location measurement xij now depends only on the point (track index) i but also on the camera pose index j. xij = f(pi,Rj,cj,Kj) 3D point positions pi are also updated simultaneously Structure from Motion

Bundle Adjustment minimized the error between xij = f(pi,Rj,cj,Kj) and real measurement Structure from Motion

Levenberg-Marquardt Mix of Newton’s method and steepest descent method https://blog.csdn.net/liu14lang/article/details/53991897 Structure from Motion

Exploiting Sparsity Large bundle adjustment problems, such as those involving 3D scenes from thousands of Internet photographs can require solving non-linear least square problems with millions of measurements Structure from motion is bipartite problem in structure and motion. Each feature point xij in a given image depends on one 3D point position pi and 3D camera pose (Rj,cj). Structure from Motion

Exploiting Sparsity Structure from Motion

Uncertainty and Ambiguity Structure from motion involves the estimation of so many highly coupled parameters, often with no known “ground truth” components. The estimates produces by structure from motion algorithm can often exhibit large amounts of uncertainty . Example: bas-relief ambiguity, which makes it hard to simultaneously estimate 3D depth of scene and the amount of camera motion. Structure from Motion

Reconstruction from Internet Photos Widely used application of structure from motion: the reconstruction of 3D objects and scenes from video sequences and collection of images. Before structure from motion comparison can begin, it is first necessary to establish sparse correspondences between different pairs of images and to then link such correspondences into feature track, which associates individual 2D image feature with global 3D points . https://www.youtube.com/watch?v=i7ierVkXYa8 Structure from Motion

Reconstruction from Internet Photos (cont). For the reconstruction process, it is important to select good pair of images and a significant amount of out-of-plane parallax to ensure that a stable reconstruction can be obtained . Structure from Motion

Reconstruction from Internet Photos (cont). Figure 7.15: Incremental structure from motion: Starting with an initial two-frame reconstruction of Trevi Fountain, batches of images are added using pose estimation, and their positions (along with 3D model) are refined using bundle adjustment Structure from Motion

Reconstruction from Internet Photos (cont). Figure7.16: 3D reconstruction produced by the incremental structure from motion algorithm. (a) cameras and point cloud from Trafalgar Square; (b) cameras and points overlaid on an image from the Great Wall of China.; (c) overhead view of reconstruction of Old Town Square in Prague registered to an aerial photograph. Structure from Motion

Constrained Structure and Motion If the object of interest is rotating around a fixed but unknown axis, specialized techniques can be used to recover this motion. In other situation, the camera itself may be moving in a fixed arc around some center of rotation. Specialized capture steps, such as mobile stereo camera rings or moving vehicles equipped with multiple fixed cameras, can also take advantage of the knowledge that individual cameras are mostly fixed with respect to the capture rig. Structure from Motion

Constrained Structure and Motion (cont). Line-based technique: Pairwise epipolar geometry cannot be recovered from line matches alone, even if the cameras are calibrated. Consider projecting the set of lines in each image into a set of 3D planes in space. You can move the two cameras around into any configuration and still obtain a valid reconstruction for 3D lines. Structure from Motion

Constrained Structure and Motion (cont). When lines are visible in three or more views, the trifocal tensor can be used to transfer lines from one pair of image to another. The trifocal tensor can also be computed on the basis line matches alone. For triples of images, the trifocal tensor is used to verify that the lines are in geometric correspondence before evaluating the correlations between line segments. Structure from Motion

Constrained Structure and Motion (cont). Figure 7.18: Two images of toy house along with their matched 3D line segments. Structure from Motion

End of Today End Structure from Motion