Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe & Computer Vision Laboratory ETH Zurich Chicago, Kristen Grauman Department of Computer Sciences University of Texas in Austin
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial 2 K. Grauman, B. Leibe Outline 1. Detection with Global Appearance & Sliding Windows 2. Local Invariant Features: Detection & Description 3. Specific Object Recognition with Local Features ― Coffee Break ― 4. Visual Words: Indexing, Bags of Words Categorization 5. Matching Local Features 6. Part-Based Models for Categorization 7. Current Challenges and Research Directions
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial 3 K. Grauman, B. Leibe Recognition with Local Features Image content is transformed into local features that are invariant to translation, rotation, and scale Goal: Verify if they belong to a consistent configuration Local Features, e.g. SIFT Slide credit: David Lowe
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial 4 K. Grauman, B. Leibe Finding Consistent Configurations Global spatial models Generalized Hough Transform [Lowe99] RANSAC [Obdrzalek02, Chum05, Nister06] Basic assumption: object is planar Assumption is often justified in practice Valid for many structures on buildings Sufficient for small viewpoint variations on 3D objects
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial 5 K. Grauman, B. Leibe Hough Transform Origin: Detection of straight lines in clutter Basic idea: each candidate point votes for all lines that it is consistent with. Votes are accumulated in quantized array Local maxima correspond to candidate lines Representation of a line Usual form y = a x + b has a singularity around 90º. Better parameterization: x cos( ) + y sin( ) = θ ρ x y
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial 6 K. Grauman, B. Leibe Examples Hough transform for a square (left) and a circle (right)
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial 7 K. Grauman, B. Leibe Hough Transform: Noisy Line Problem: Finding the true maximum TokensVotes θ ρ Slide credit: David Lowe
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial 8 K. Grauman, B. Leibe Hough Transform: Noisy Input Problem: Lots of spurious maxima TokensVotes Slide credit: David Lowe θ ρ
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial 9 K. Grauman, B. Leibe Generalized Hough Transform [Ballard81] Generalization for an arbitrary contour or shape Choose reference point for the contour (e.g. center) For each point on the contour remember where it is located w.r.t. to the reference point Remember radius r and angle relative to the contour tangent Recognition: whenever you find a contour point, calculate the tangent angle and ‘vote’ for all possible reference points Instead of reference point, can also vote for transformation The same idea can be used with local features! Slide credit: Bernt Schiele
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial 10 K. Grauman, B. Leibe Gen. Hough Transform with Local Features For every feature, store possible “occurrences” – Object identity – Pose – Relative position For new image, let the matched features vote for possible object positions
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial 11 K. Grauman, B. Leibe When is the Hough transform useful? Textbooks wrongly imply that it is useful mostly for finding lines In fact, it can be very effective for recognizing arbitrary shapes or objects The key to efficiency is to have each feature (token) determine as many parameters as possible For example, lines can be detected much more efficiently from small edge elements (or points with local gradients) than from just points For object recognition, each token should predict location, scale, and orientation (4D array) Bottom line: The Hough transform can extract feature groupings from clutter in linear time! Slide credit: David Lowe
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial 12 K. Grauman, B. Leibe 3D Object Recognition Gen. HT for Recognition Typically only 3 feature matches needed for recognition Extra matches provide robustness Affine model can be used for planar objects Slide credit: David Lowe [Lowe99]
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial 13 K. Grauman, B. Leibe View Interpolation Training Training views from similar viewpoints are clustered based on feature matches. Matching features between adjacent views are linked. Recognition Feature matches may be spread over several training viewpoints. Use the known links to “transfer votes” to other viewpoints. Slide credit: David Lowe [Lowe01]
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial 14 K. Grauman, B. Leibe Recognition Using View Interpolation Slide credit: David Lowe [Lowe01]
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial 15 K. Grauman, B. Leibe Location Recognition Slide credit: David Lowe Training [Lowe04]
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial 16 K. Grauman, B. Leibe Applications Sony Aibo (Evolution Robotics) SIFT usage Recognize docking station Communicate with visual cards Other uses Place recognition Loop closure in SLAM Slide credit: David Lowe
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial 17 K. Grauman, B. Leibe RANSAC (RANdom SAmple Consensus) [Fischler81] Randomly choose a minimal subset of data points necessary to fit a model (a sample) Points within some distance threshold t of model are a consensus set. Size of consensus set is model’s support. Repeat for N samples; model with biggest support is most robust fit Points within distance t of best model are inliers Fit final model to all inliers Slide credit: David Lowe
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial 18 K. Grauman, B. Leibe Slide credit: David Forsyth
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial 19 K. Grauman, B. Leibe RANSAC: How many samples? How many samples are needed? Suppose w is fraction of inliers (points from line). n points needed to define hypothesis (2 for lines) k samples chosen. Prob. that a single sample of n points is correct: Prob. that all samples fail is: Choose k high enough to keep this below desired failure rate. Slide credit: David Lowe
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial 20 K. Grauman, B. Leibe RANSAC: Computed k (p=0.99) Slide credit: David Lowe Sample size n Proportion of outliers 5%10%20%25%30%40%50%
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial 21 K. Grauman, B. Leibe After RANSAC RANSAC divides data into inliers and outliers and yields estimate computed from minimal set of inliers Improve this initial estimate with estimation over all inliers (e.g. with standard least-squares minimization) But this may change inliers, so alternate fitting with re- classification as inlier/outlier Slide credit: David Lowe
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial 22 K. Grauman, B. Leibe Example: Finding Feature Matches Find best stereo match within a square search window (here 300 pixels 2 ) Global transformation model: epipolar geometry from Hartley & Zisserman Slide credit: David Lowe
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial 23 K. Grauman, B. Leibe Example: Finding Feature Matches Find best stereo match within a square search window (here 300 pixels 2 ) Global transformation model: epipolar geometry from Hartley & Zisserman Slide credit: David Lowe before RANSACafter RANSAC
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial 24 K. Grauman, B. Leibe Comparison Gen. Hough Transform Advantages Very effective for recognizing arbitrary shapes or objects Can handle high percentage of outliers (>95%) Extracts groupings from clutter in linear time Disadvantages Quantization issues Only practical for small number of dimensions (up to 4) Improvements available Probabilistic Extensions Continuous Voting Space RANSAC Advantages General method suited to large range of problems Easy to implement Independent of number of dimensions Disadvantages Only handles moderate number of outliers (<50%) Many variants available, e.g. PROSAC: Progressive RANSAC [Chum05] Preemptive RANSAC [Nister05] [Leibe08]
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial 25 B. Leibe Example Applications Mobile tourist guide Self-localization Object/building recognition Photo/video augmentation Aachen Cathedral [Quack, Leibe, Van Gool, CIVR’08]
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Web Demo: Movie Poster Recognition 26 K. Grauman, B. Leibe 50’000 movie posters indexed Query-by-image from mobile phone available in Switzer- land
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Application: Large-Scale Retrieval 27 K. Grauman, B. Leibe [Philbin CVPR’07] QueryResults from 5k Flickr images (demo available for 100k set)
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Application: Image Auto-Annotation 28 K. Grauman, B. Leibe Left: Wikipedia image Right: closest match from Flickr [Quack CIVR’08] Moulin Rouge Tour Montparnasse Colosseum Viktualienmarkt Maypole Old Town Square (Prague)
Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial 29 K. Grauman, B. Leibe Outline 1. Detection with Global Appearance & Sliding Windows 2. Local Invariant Features: Detection & Description 3. Specific Object Recognition with Local Features ― Coffee Break ― 4. Visual Words: Indexing, Bags of Words Categorization 5. Matching Local Features 6. Part-Based Models for Categorization 7. Current Challenges and Research Directions