Paper – Stephen Se, David Lowe, Jim Little

Slides:



Advertisements
Similar presentations
Feature extraction: Corners
Advertisements

Object Recognition from Local Scale-Invariant Features David G. Lowe Presented by Ashley L. Kapron.
For Internal Use Only. © CT T IN EM. All rights reserved. 3D Reconstruction Using Aerial Images A Dense Structure from Motion pipeline Ramakrishna Vedantam.
TP14 - Local features: detection and description Computer Vision, FCUP, 2014 Miguel Coimbra Slides by Prof. Kristen Grauman.
MASKS © 2004 Invitation to 3D vision Lecture 7 Step-by-Step Model Buidling.
Object Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition l Panoramas,
Silvina Rybnikov Supervisors: Prof. Ilan Shimshoni and Prof. Ehud Rivlin HomePage:
(Includes references to Brian Clipp
Lab 2 Lab 3 Homework Labs 4-6 Final Project Late No Videos Write up
Simultaneous Localization & Mapping - SLAM
GIS and Image Processing for Environmental Analysis with Outdoor Mobile Robots School of Electrical & Electronic Engineering Queen’s University Belfast.
Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.
Fast High-Dimensional Feature Matching for Object Recognition David Lowe Computer Science Department University of British Columbia.
Motion Tracking. Image Processing and Computer Vision: 82 Introduction Finding how objects have moved in an image sequence Movement in space Movement.
Feature extraction: Corners 9300 Harris Corners Pkwy, Charlotte, NC.
Adam Rachmielowski 615 Project: Real-time monocular vision-based SLAM.
A Study of Approaches for Object Recognition
Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.
Active Simultaneous Localization and Mapping Stephen Tully, : Robotic Motion Planning This project is to actively control the.
Feature matching and tracking Class 5 Read Section 4.1 of course notes Read Shi and Tomasi’s paper on.
Computing motion between images
Visual Odometry for Ground Vehicle Applications David Nister, Oleg Naroditsky, James Bergen Sarnoff Corporation, CN5300 Princeton, NJ CPSC 643, Presentation.
Scale Invariant Feature Transform (SIFT)
Authors: Joseph Djugash, Sanjiv Singh, George Kantor and Wei Zhang
Sebastian Thrun CS223B Computer Vision, Winter Stanford CS223B Computer Vision, Winter 2005 Lecture 3 Advanced Features Sebastian Thrun, Stanford.
Stockman MSU/CSE Math models 3D to 2D Affine transformations in 3D; Projections 3D to 2D; Derivation of camera matrix form.
Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.
Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.
/09/dji-phantom-crashes-into- canadian-lake/
Object Tracking/Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition.
The Brightness Constraint
3D SLAM for Omni-directional Camera
(Wed) Young Ki Baik Computer Vision Lab.
Young Ki Baik, Computer Vision Lab.
General ideas to communicate Show one particular Example of localization based on vertical lines. Camera Projections Example of Jacobian to find solution.
Visual SLAM Visual SLAM SPL Seminar (Fri) Young Ki Baik Computer Vision Lab.
CSCE 643 Computer Vision: Extractions of Image Features Jinxiang Chai.
Feature extraction: Corners 9300 Harris Corners Pkwy, Charlotte, NC.
Lecture 7: Features Part 2 CS4670/5670: Computer Vision Noah Snavely.
21 June 2009Robust Feature Matching in 2.3μs1 Simon Taylor Edward Rosten Tom Drummond University of Cambridge.
Real-Time Simultaneous Localization and Mapping with a Single Camera (Mono SLAM) Young Ki Baik Computer Vision Lab. Seoul National University.
Range-Only SLAM for Robots Operating Cooperatively with Sensor Networks Authors: Joseph Djugash, Sanjiv Singh, George Kantor and Wei Zhang Reading Assignment.
Feature extraction: Corners and blobs. Why extract features? Motivation: panorama stitching We have two images – how do we combine them?
Visual Odometry David Nister, CVPR 2004
Features Jan-Michael Frahm.
Lecture 9 Feature Extraction and Motion Estimation Slides by: Michael Black Clark F. Olson Jean Ponce.
Visual Odometry for Ground Vehicle Applications David Nistér, Oleg Naroditsky, and James Bergen Sarnoff Corporation CN5300 Princeton, New Jersey
CS654: Digital Image Analysis
Recognizing specific objects Matching with SIFT Original suggestion Lowe, 1999,2004.
CSCI 631 – Foundations of Computer Vision March 15, 2016 Ashwini Imran Image Stitching.
Distinctive Image Features from Scale-Invariant Keypoints Presenter :JIA-HONG,DONG Advisor : Yen- Ting, Chen 1 David G. Lowe International Journal of Computer.
Keypoint extraction: Corners 9300 Harris Corners Pkwy, Charlotte, NC.
Blob detection.
SIFT.
776 Computer Vision Jan-Michael Frahm Spring 2012.
Visual homing using PCA-SIFT
SIFT Scale-Invariant Feature Transform David Lowe
Lecture 07 13/12/2011 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.
Distinctive Image Features from Scale-Invariant Keypoints
Nearest-neighbor matching to feature database
3D Vision Interest Points.
TP12 - Local features: detection and description
+ SLAM with SIFT Se, Lowe, and Little Presented by Matt Loper
Florian Shkurti, Ioannis Rekleitis, Milena Scaccia and Gregory Dudek
Nearest-neighbor matching to feature database
CAP 5415 Computer Vision Fall 2012 Dr. Mubarak Shah Lecture-5
Dongwook Kim, Beomjun Kim, Taeyoung Chung, and Kyongsu Yi
The SIFT (Scale Invariant Feature Transform) Detector and Descriptor
Lecture VI: Corner and Blob Detection
Feature descriptors and matching
Presentation transcript:

Mobile Robot Localization and Mapping with Uncertainty using Scale-Invariant Visual Landmarks Paper – Stephen Se, David Lowe, Jim Little Presentation – Nicholas Moya

Decoding the Title Visual SLAM using SIFT features as landmarks SLAM: Simultaneous Localization and Mapping SIFT: Scale-Invariant Feature transform The robot finds SIFT features in its environment, and uses them as landmarks to build a map and localize itself within that map.

Overview Previous Work SIFT Estimating Motion Results Improvements Conclusion

Previous Work

Previous Work SLAM definition Problem The process of simultaneously tracking the position of a mobile robot relative to its environment while building a map of the environment. Problem Robot starts in unknown location, observes its landmarks, estimates location and landmark location, then builds a map of landmarks and uses them for localization. Early Attempts were sensor-based: Sonar – Fast but crude Laser – Accurate but slow to respond Vision – Processor intensive but higher accuracy and resolution

Previous Work Recent Attempts have been vision-based, but not without problems Monocular – Creates gaps in map for long term image sequences Could not run in real time Not computationally strong enough to be used in large environments The approach in this paper uses high-level image features (corners) which are scale invariant and distinctive enough to allow for more efficient map creation algorithms.

New SLAM Algorithm Identify SIFT landmarks Scale invariant feature transform Match landmarks between frames from different cameras Use Trinocular stereo vision system Stereo means using more than more one viewpoint for vision 3D position of landmarks Match landmarks between past and future frames 3D map full of SIFT features that is constantly updated Adaptive to changing environments Estimate motion, position Localize itself within map

SIFT

SIFT SIFT Definition A method of determining local features in images, specifically, features that are invariant to image translation, scaling, rotation, illumination changes, and affine projection. Landmarks are determined by extracting SIFT features from each frame from the three images and stereo match them among the images.

Generating SIFT Features 1 Smooth image with Gaussian Kernel ( = 2 ) 2 DoG Image = Original Image - Smoothed Image 3 DoG image is resampled with 1.5 pixel spacing to produce next level in the pyramid 4 Repeated until the pyramid is complete SIFT feature locations are determined by identifying repeatable points in a pyramid of scaled images DoG: Difference of Gaussians Features are identified by detecting maxima and minima in the DoG pyramid.

Matching using the Triclops system Three cameras are arranged in an ‘L’ shape SIFT features are matched by passing several constraints Vertically: within 1 pixel of each other Horizontally: within 20 pixels of each other Must be within 20 orientation Must be found one scale higher and/or lower than the other The features that pass are kept as landmarks. Using the (x,y,z) coordinates of the landmarks gives the position of the robot relative to its environment

Ego-Motion Estimation

Ego-Motion Estimation After SIFT stereo matching, some information for each SIFT feature (rm, cm) Measured image coordinates of the reference camera (s,o,d) Scale, Orientation, Disparity of the feature Disparity: Think of parallax (X,Y,Z) 3D coordinates of landmark relative to camera Must match SIFT features between frames to estimate motion But how to match SIFT features between frames?

Matching SIFT Features Predicting relative 3D position (u0,v0) image center coordinates Predicting expected disparity I interocular distance f focal length Predicting expected scale Search for SIFT landmarks match based on criteria Within 10x10 pixel region Scale S and S’ must be within 20% of each other Orientation within 20 degrees Disparity d and d’ must be within 20%

Least-Squares Minimization Least Squares is used to determine camera movement Tells us the x, y, z, yaw, pitch, roll that would bring the best alignment between the matched SIFT features. P = eri = row and column errors of the ith match hat represents least Ei = residual error. squares Features with big Ei are eliminated

b – original position, a – 10o left, c - 10o right e – 60 cm in front, d – 60 cm in front and 10o left, e – 60 cm in front and 10o right

Database Each matched SIFT feature is logged in a database of landmarks to match features in future views. Specifically, the [x,y,z,s,o,l] are logged for each landmark Each landmark has a miss count – number of times they are expected to be matched but aren’t. If the miss count is reaches a threshold N, the landmark is pruned from the database.

Results

Experiment Set height change parameter to 0 (not ascending or descending in elevation) Manually drive robot one loop around a chair and back Just testing SIFT features and ego-motion, not navigation At end, 249 320x240 frames were taken with 3590 SIFT landmarks logged. Each iteration taking 0.5 seconds 40~60 matches each frame

Improvements

Improvements – SIFT Database Include consecutive miss count, l, general miss count m in data for each SIFT landmark, and n the number of times it has been seen. Only include SIFT features which have n>=3 (must be seen 3 or more times to be logged) Used to discard false positives due to noise SIFT features are not invariant to changes in PoV or multiple viewpoints (if orientation is for than 20 degrees for example) The same object could be not counted because it looks different from a different viewpoint Add a view vector, v, to hold the scale and orientation for a SIFT feature for each viewed direction with its each s and o.

Improvements – Error Modeling Quantify the reliability of estimations Incorporating a covariance matrix into each SIFT landmark in the database Employ a Kalman Filter for each landmark to update state info Use a covariance matrix for the robot position Gathered from the odometry data of the robot Odometry: using data from motion sensors to estimate change in position over time When a match is found between frames, the covariance matrix for the landmark will be augmented using the robot pose covariance matrix and then combined with the existing covariance matrix in the database.

Conclusion

Conclusion Vision based SLAM using SIFT Time and Memory efficacy are very important Algorithm runs at 2 Hz for 320x240 sized images Tested using a 700 MHz processor