Outdoor SLAM using Visual Appearance and Laser Ranging P. Newman, D. Cole and K. Ho ICRA 2006 Jackie Libby Advisor: George Kantor.

Slides:



Advertisements
Similar presentations
Feature Detection. Description Localization More Points Robust to occlusion Works with less texture More Repeatable Robust detection Precise localization.
Advertisements

Presented by Xinyu Chang
CSE 473/573 Computer Vision and Image Processing (CVIP)
Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction.
Probabilistic Robotics
Kiyoshi Irie, Tomoaki Yoshida, and Masahiro Tomono 2011 IEEE International Conference on Robotics and Automation Shanghai International Conference Center.
Uncertainty Representation. Gaussian Distribution variance Standard deviation.
Image alignment Image from
Instructor: Mircea Nicolescu Lecture 13 CS 485 / 685 Computer Vision.
Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.
Fast High-Dimensional Feature Matching for Object Recognition David Lowe Computer Science Department University of British Columbia.
Robust and large-scale alignment Image from
Active SLAM in Structured Environments Cindy Leung, Shoudong Huang and Gamini Dissanayake Presented by: Arvind Pereira for the CS-599 – Sequential Decision.
Iterative closest point algorithms
Adam Rachmielowski 615 Project: Real-time monocular vision-based SLAM.
(1) Feature-point matching by D.J.Duff for CompVis Online: Feature Point Matching Detection,
A Study of Approaches for Object Recognition
Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
Lecture 4: Feature matching
Probabilistic Robotics
Feature extraction: Corners and blobs
Distinctive image features from scale-invariant keypoints. David G. Lowe, Int. Journal of Computer Vision, 60, 2 (2004), pp Presented by: Shalomi.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Stepan Obdrzalek Jirı Matas
Scale Invariant Feature Transform (SIFT)
Lecture 3a: Feature detection and matching CS6670: Computer Vision Noah Snavely.
Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2006 with a lot of slides stolen from Steve Seitz and.
Distinctive Image Features from Scale-Invariant Keypoints David G. Lowe – IJCV 2004 Brien Flewelling CPSC 643 Presentation 1.
Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.
Overview Introduction to local features
Computer vision.
1 Interest Operators Harris Corner Detector: the first and most basic interest operator Kadir Entropy Detector and its use in object recognition SIFT interest.
Characterizing activity in video shots based on salient points Nicolas Moënne-Loccoz Viper group Computer vision & multimedia laboratory University of.
Overview Harris interest points Comparing interest points (SSD, ZNCC, SIFT) Scale & affine invariant interest points Evaluation and comparison of different.
CSE554AlignmentSlide 1 CSE 554 Lecture 5: Alignment Fall 2011.
3D SLAM for Omni-directional Camera
Segmentation Course web page: vision.cis.udel.edu/~cv May 7, 2003  Lecture 31.
CSCE 643 Computer Vision: Structure from Motion
Young Ki Baik, Computer Vision Lab.
1 Distributed and Optimal Motion Planning for Multiple Mobile Robots Yi Guo and Lynne Parker Center for Engineering Science Advanced Research Computer.
CSCE 643 Computer Vision: Extractions of Image Features Jinxiang Chai.
1 Similarity-based matching for face authentication Christophe Rosenberger Luc Brun ICPR 2008.
CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.
Distinctive Image Features from Scale-Invariant Keypoints David Lowe Presented by Tony X. Han March 11, 2008.
Jack Pinches INFO410 & INFO350 S INFORMATION SCIENCE Computer Vision I.
Range-Only SLAM for Robots Operating Cooperatively with Sensor Networks Authors: Joseph Djugash, Sanjiv Singh, George Kantor and Wei Zhang Reading Assignment.
Feature extraction: Corners and blobs. Why extract features? Motivation: panorama stitching We have two images – how do we combine them?
COS429 Computer Vision =++ Assignment 4 Cloning Yourself.
A Tutorial on using SIFT Presented by Jimmy Huff (Slightly modified by Josiah Yoder for Winter )
Lecture 9 Feature Extraction and Motion Estimation Slides by: Michael Black Clark F. Olson Jean Ponce.
776 Computer Vision Jan-Michael Frahm Spring 2012.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Instructor: Mircea Nicolescu Lecture 10 CS 485 / 685 Computer Vision.
CSCI 631 – Foundations of Computer Vision March 15, 2016 Ashwini Imran Image Stitching.
Blob detection.
Invariant Local Features Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale, and other imaging.
776 Computer Vision Jan-Michael Frahm Spring 2012.
SIFT Scale-Invariant Feature Transform David Lowe
Paper – Stephen Se, David Lowe, Jim Little
+ SLAM with SIFT Se, Lowe, and Little Presented by Matt Loper
SLAM – Loop Closing with Visually Salient Features
Video Google: Text Retrieval Approach to Object Matching in Videos
Simultaneous Localization and Mapping
Dongwook Kim, Beomjun Kim, Taeyoung Chung, and Kyongsu Yi
From a presentation by Jimmy Huff Modified by Josiah Yoder
Outline H. Murase, and S. K. Nayar, “Visual learning and recognition of 3-D objects from appearance,” International Journal of Computer Vision, vol. 14,
Video Google: Text Retrieval Approach to Object Matching in Videos
Lecture VI: Corner and Blob Detection
Lecture 5: Feature invariance
Presentation transcript:

Outdoor SLAM using Visual Appearance and Laser Ranging P. Newman, D. Cole and K. Ho ICRA 2006 Jackie Libby Advisor: George Kantor

Main Idea laser and camera for 3d SLAM system Laser: builds 3d point cloud map Camera: detects loop closure from sequences of images First working implementation in outdoor environment

Outline Slam framework Laser data representation How loop is detected with vision How loop is closed once detected results

SLAM representation: state x(k+1|k) = state vector of vehicle poses from t = 1, …, k+1 given observations from t = 1, …, k x vn = n th vehicle pose in state vector u(k+1) = odometry between k and k+1 = SΕ 3 transformation composition operator (motion model)

SLAM representation: covariance P v = covariance of newly added vehicle state (bottom left) P vp = ? (my guess: covariance between new state and all previous states)

Scan-match framework Nodding “yes-yes” laser – Returns planar scans at different elevations each vehicle pose corresponds to a laser scan – x v (k) -> S k – timesteps not constant x v (i), x v (j) -> S i, S j -> T i,j T i,j = rigid transformation – Observation -> EKF equations – i, j sequential -> T i,j can replace u – loop closing -> j >> i T i,j used to correct position

State vector with growing uncertainty outdoor data set x,y,z grid (20m marks) 1 σ x,y,z uncertainty ellipsoids Effect of long loops How to detect loop closure 1)slam pdf, small Mahalanobis distance - Ellipsoids overconfident -> this method fails 2)Vision system - Matches image sequences - similarity in appearance

My work with SLAM: CASC – Advisor: George Kantor – Sanjiv Singh – Marcel Bergerman – Ben Grocholsky – Brad Hamner Retroreflective cones no-no nodding SICK laser

Detecting loop closure (high level) Assumption: 2 images look similar -> close in space Not close in time -> loop detected Problem: Just one image pair -> false positive “repetitious low level descriptors” “common texture” leaves, bricks on buildings “background similarity” “common large scale features” plants, windows Solution: sequences of image pairs increases confidence

Image, IuDetect interest pointsSIFT DescriptorsClustered into wordsVocab = set of all wordsAssign weight to each wordImage = vector of weights Similarity between two images Image, I v

Thanks Martial Hebert! Image, I u Detect interest points Harris Affine detector Scale invariant, affine invariant Example: corners -> still a corner from any angle or scale

Detect interest pointsSIFT Descriptors {d 1, …, d n } 16x16 pixel window around interest point Assign each pixel a gradient orientation (out of 8 values) For each 4x4 window, make histogram of orientations 16 histograms * 8 values = 128 = dimension of SIFT vector (ignore blue circles) Thanks Martial Hebert!

SIFT Descriptors {d 1, …, d n }Clustered into words,Vocab = set of all words SIFT Descriptors: n is different for different images word, = {d 1, …, d k } clustering happens in an offline learning process Vocabulary, V future work: different vocabularies for different settings urban, park, indoors

Vocab = set of all wordsweight for each word The more frequent the word, the less descriptive it is inverse weighting frequency scheme *: *K. S. Jones, “Exhaustivity and specificity,” Journal of Documentation, vol. 28, no. 1, pp. 11–21, w i = weight for index word, N = total # images n i = # images where this word appears

weight for each wordImage  vector of weights Image has been transformed into a vector vector is long, |V|, but many elements are zero

Similarity Image Cosine distance: cos(0) = 1 The closer the vectors -> the smaller the angle between them the greater the similarity

Image, IuDetect interest pointsSIFT Descriptors {d1, …, dn}Clustered into words,Vocab = set of all wordsweight for each wordImage  vector of weights Similarity Image, I v

Boxes = false positives one to many mapping: a i b i+1, b i+2, b i+3 … b i a i+1, a i+2, a i+3 … causes: vehicle stopped repetitive low-level structure (windows, bricks, leaves) distant images Similarity Matrix Similarity matrix, M M i,j = S(i,j) darker means more similar axes = timesteps same for x and y comparing each image against every other one main diagonal is line of reflection Loop closure = off diagonal streaks a i b i a i+1 b i+1

Sequence extraction (finding streaks) Modified Smith-Waterman algorithm dynamic programming penalty terms avoid boxes allow for curved lines (i.e. change in velocity) α term allows gaps Maximum H i,j = η A,B Maximal cumulative similarity

Removing “common mode similarity” (finding boxes) Decompose M into sum of outer products “Dominant structure” = repetitive structure dominant structure  largest eigenvalues/vectors Eigenface: First three outer products More repetition -> more range in eigenvalues Relative significance: Maximize entropy:

Sequence Significance problem: Maximum H i,j = η A,B  this doesn’t mean there’s a loop solution: randomly shuffle rows and columns of M, recompute η A,B look at distribution: Extreme value distribution (EVD) Probability that sequence could be random η A,B = real score η = random score threshold at 0.5%

Estimating Loop Closure Geometry We have detected a loop (with image sequence) Now how do we close it? (how do we find T ij ?) One solution: iterative scan matching – η A,B  a i, b j  x vi, x vj  S i, S j  T ij – Problem: local minima Better solution: projective model – Essential matrix – 5 point algorithm with Ransac loop – User lasers to: Remove scale ambiguity Fine-tune with iterative scan matching – quality of final scan match is another quality check

Enforcing loop closure Naïve method: single EKF update step only works for small errors, because of linear approximation Better method: constrained non-linear optimization incremental changes [x v1, …., x vn ]  [T 1,2, … T n-1,n, T n,1 ] [Σ 1,2, …, Σ n,1 ] (from scan-matching) Want new poses, [T* 1,2, … T* n-1,n, T* n,1 ] Subject to constraint: Minimize:

Results Resulting estimated map And vehicle trajectory successfully applied to several data sets 98% runtime spent on laser registration -> bottleneck 1/3 real time most expensive part of vision subsystem is feature detector/descriptor (Harris Affine/ SIFT)

Conclusions and future work Conclusions – SLAM system for outdoor applications – Works for challenging urban environment – Complementary vision laser system Vision for loop closing Laser data for geometry map building – First working implementation Future work – SLAM formulation not efficient – Laser scan matching is bottle neck – Learning vocabularies for distinct domains (urban, park, indoors) Different similarity matrix if domain switches