10/31/13 Object Recognition and Augmented Reality Computational Photography Derek Hoiem, University of Illinois Dali, Swans Reflecting Elephants.

Slides:

Advertisements

Similar presentations

Kapitel 14 Recognition – p. 1 Recognition Scene understanding / visual object categorization Pose clustering Object recognition by local features Image.

Advertisements

Kapitel 14 Recognition Scene understanding / visual object categorization Pose clustering Object recognition by local features Image categorization Bag-of-features.

Wiki-Reality: Augmenting Reality with Community Driven Websites Speaker: Yi Wu Intel Labs/vision and image processing research Collaborators: Douglas Gray,

Three things everyone should know to improve object retrieval

Content-Based Image Retrieval

Instance-level recognition II.

TP14 - Indexing local features

CS4670 / 5670: Computer Vision Bag-of-words models Noah Snavely Object

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool.

Image alignment Image from

Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.

CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic

Bag of Features Approach: recent work, using geometric information.

Effective Image Database Search via Dimensionality Reduction Anders Bjorholm Dahl and Henrik Aanæs IEEE Computer Society Conference on Computer Vision.

Robust and large-scale alignment Image from

Object retrieval with large vocabularies and fast spatial matching

1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.

Lecture 28: Bag-of-words models

Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.

Feature-based object recognition Prof. Noah Snavely CS1114

Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2006 with a lot of slides stolen from Steve Seitz and.

Keypoint-based Recognition and Object Search

Machine learning & category recognition Cordelia Schmid Jakob Verbeek.

10/31/13 Object Recognition and Augmented Reality Computational Photography Derek Hoiem, University of Illinois Dali, Swans Reflecting Elephants.

Object Recognition and Augmented Reality

Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

CS 766: Computer Vision Computer Sciences Department, University of Wisconsin-Madison Indexing and Retrieval James Hill, Ozcan Ilikhan, Mark Lenz {jshill4,

Indexing Techniques Mei-Chen Yeh.

Clustering with Application to Fast Object Search

CSE 185 Introduction to Computer Vision

Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.

Keypoint-based Recognition Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/04/10.

CSE 473/573 Computer Vision and Image Processing (CVIP)

Near Duplicate Image Detection: min-Hash and tf-idf weighting

04/30/13 Last class: summary, goggles, ices Discrete Structures (CS 173) Derek Hoiem, University of Illinois 1 Image: wordpress.com/2011/11/22/lig.

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

A Statistical Approach to Speed Up Ranking/Re-Ranking Hong-Ming Chen Advisor: Professor Shih-Fu Chang.

Dynamic 3D Scene Analysis from a Moving Vehicle Young Ki Baik (CV Lab.) (Wed)

1 Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval Ondrej Chum, James Philbin, Josef Sivic, Michael Isard and.

Perceptual and Sensory Augmented Computing Visual Object Recognition Tutorial Visual Object Recognition Bastian Leibe & Computer Vision Laboratory ETH.

Example: line fitting. n=2 Model fitting Measure distances.

Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.

Lecture 08 27/12/2011 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.

776 Computer Vision Jan-Michael Frahm Spring 2012.

Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.

COS 429 PS3: Stitching a Panorama Due November 10 th.

Invariant Local Features Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale, and other imaging.

776 Computer Vision Jan-Michael Frahm Spring 2012.

Prof. Alex Berg (Credits to many other folks on individual slides)

SIFT Scale-Invariant Feature Transform David Lowe

Lecture 07 13/12/2011 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.

CS 2770: Computer Vision Feature Matching and Indexing

Nearest-neighbor matching to feature database

Large-scale Instance Retrieval

Object Recognition and Augmented Reality

Video Google: Text Retrieval Approach to Object Matching in Videos

Large-scale Instance Retrieval

By Suren Manvelyan, Crocodile (nile crocodile?) By Suren Manvelyan,

Features Readings All is Vanity, by C. Allan Gilbert,

Nearest-neighbor matching to feature database

Brief Review of Recognition + Context

Video Google: Text Retrieval Approach to Object Matching in Videos

Computational Photography

Presented by Xu Miao April 20, 2005

Presentation transcript:

10/31/13 Object Recognition and Augmented Reality Computational Photography Derek Hoiem, University of Illinois Dali, Swans Reflecting Elephants

Last class: Image Stitching 1.Detect keypoints 2.Match keypoints 3.Use RANSAC to estimate homography 4.Project onto a surface and blend

Augmented reality Insert and/or interact with object in scene – Project by Karen Liu Project by Karen Liu – Responsive characters in AR Responsive characters in AR – KinectFusion KinectFusion Overlay information on a display – Tagging reality Tagging reality – Layar Layar – Google goggles Google goggles – T2 video (13:23)

Adding fake objects to real video Approach 1.Recognize and/or track points that give you a coordinate frame 2.Apply homography (flat texture) or perspective projection (3D model) to put object into scene Main challenge: dealing with lighting, shadows, occlusion

Information overlay Approach 1.Recognize object that you’ve seen before 2.Retrieve info and overlay Main challenge: how to match reliably and efficiently?

Today How to quickly find images in a large database that match a given image region?

Let’s start with interest points Query Database Compute interest points (or keypoints) for every image in the database and the query

Simple idea See how many keypoints are close to keypoints in each other image Lots of Matches Few or No Matches But this will be really, really slow!

Key idea 1: “Visual Words” Cluster the keypoint descriptors

Key idea 1: “Visual Words” K-means algorithm Illustration: 1. Randomly select K centers 2. Assign each point to nearest center 3. Compute new center (mean) for each cluster

Key idea 1: “Visual Words” K-means algorithm Illustration: 1. Randomly select K centers 2. Assign each point to nearest center 3. Compute new center (mean) for each cluster Back to 2

Kmeans: Matlab code function C = kmeans(X, K) % Initialize cluster centers to be randomly sampled points [N, d] = size(X); rp = randperm(N); C = X(rp(1:K), :); lastAssignment = zeros(N, 1); while true % Assign each point to nearest cluster center bestAssignment = zeros(N, 1); mindist = Inf*ones(N, 1); for k = 1:K for n = 1:N dist = sum((X(n, :)-C(k, :)).^2); if dist < mindist(n) mindist(n) = dist; bestAssignment(n) = k; end % break if assignment is unchanged if all(bestAssignment==lastAssignment), break; end; lastAssignment = bestAssignmnet; % Assign each cluster center to mean of points within it for k = 1:K C(k, :) = mean(X(bestAssignment==k, :)); end

K-means Demo

Key idea 1: “Visual Words” Cluster the keypoint descriptors Assign each descriptor to a cluster number – What does this buy us? – Each descriptor was 128 dimensional floating point, now is 1 integer (easy to match!) – Is there a catch? Need a lot of clusters (e.g., 1 million) if we want points in the same cluster to be very similar Points that really are similar might end up in different clusters

Key idea 1: “Visual Words” Cluster the keypoint descriptors Assign each descriptor to a cluster number Represent an image region with a count of these “visual words”

Key idea 1: “Visual Words” Cluster the keypoint descriptors Assign each descriptor to a cluster number Represent an image region with a count of these “visual words” An image is a good match if it has a lot of the same visual words as the query region

Naïve matching is still too slow Imagine matching 1,000,000 images, each with 1,000 keypoints

Key Idea 2: Inverse document file Like a book index: keep a list of all the words (keypoints) and all the pages (images) that contain them. Rank database images based on tf-idf measure. tf-idf: Term Frequency – Inverse Document Frequency # words in document # times word appears in document # documents # documents that contain the word

Fast visual search “Scalable Recognition with a Vocabulary Tree”, Nister and Stewenius, CVPR “Video Google”, Sivic and Zisserman, ICCV 2003

Slide Slide Credit: Nister 110,000,000 Images in 5.8 Seconds

Slide Slide Credit: Nister

Slide Slide Credit: Nister

Slide

Key Ideas Visual Words – Cluster descriptors (e.g., K-means) Inverse document file – Quick lookup of files given keypoints tf-idf: Term Frequency – Inverse Document Frequency # words in document # times word appears in document # documents # documents that contain the word

Recognition with K-tree Following slides by David Nister (CVPR 2006)

Performance

More words is better Improves Retrieval Improves Speed Branch factor

Higher branch factor works better (but slower)

Can we be more accurate? So far, we treat each image as containing a “bag of words”, with no spatial information a f z e e a f e e a f e e h h Which matches better?

Can we be more accurate? So far, we treat each image as containing a “bag of words”, with no spatial information Real objects have consistent geometry

Final key idea: geometric verification Goal: Given a set of possible keypoint matches, figure out which ones are geometrically consistent How can we do this?

Final key idea: geometric verification RANSAC for affine transform a f z e e a f e e z a f z e e a f e e z Affine Transform Randomly choose 3 matching pairs Estimate transformation Predict remaining points and count “inliers” Repeat N times:

Application: Large-Scale Retrieval K. Grauman, B. Leibe56 [Philbin CVPR’07] Query Results on 5K (demo available for 100K)

Application: Image Auto-Annotation K. Grauman, B. Leibe57 Left: Wikipedia image Right: closest match from Flickr [Quack CIVR’08] Moulin Rouge Tour Montparnasse Colosseum Viktualienmarkt Maypole Old Town Square (Prague)

Example Applications B. Leibe58 Mobile tourist guide Self-localization Object/building recognition Photo/video augmentation Aachen Cathedral [Quack, Leibe, Van Gool, CIVR’08]

Video Google System 1.Collect all words within query region 2.Inverted file index to find relevant frames 3.Compare word counts 4.Spatial verification Sivic & Zisserman, ICCV 2003 Demo online at : e/index.html e/index.html 59K. Grauman, B. Leibe Query region Retrieved frames

Summary: Uses of Interest Points Interest points can be detected reliably in different images at the same 3D location – DOG interest points are localized in x, y, scale SIFT is robust to rotation and small deformation Interest points provide correspondence – For image stitching – For defining coordinate frames for object insertion – For object recognition and retrieval

Next class Opportunities of scale: stuff you can do with millions of images – Texture synthesis of large regions – Recover GPS coordinates – Etc.