Object Recognition Szeliski Chapter 14.

Slides:



Advertisements
Similar presentations
Distinctive Image Features from Scale-Invariant Keypoints David Lowe.
Advertisements

Part 4: Combined segmentation and recognition by Rob Fergus (MIT)
Fitting: The Hough transform. Voting schemes Let each feature vote for all the models that are compatible with it Hopefully the noise features will not.
1 Part 1: Classical Image Classification Methods Kai Yu Dept. of Media Analytics NEC Laboratories America Andrew Ng Computer Science Dept. Stanford University.
Instructor: Mircea Nicolescu Lecture 17
CS4670 / 5670: Computer Vision Bag-of-words models Noah Snavely Object
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Unsupervised Learning of Visual Object Categories Michael Pfeiffer
Face Recognition and Detection
Image alignment Image from
Discriminative and generative methods for bags of features
Bag-of-features models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Beyond bags of features: Part-based models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Fitting: The Hough transform
Robust and large-scale alignment Image from
Object Detection and Recognition
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
How many object categories are there? Biederman 1987.
Statistical Recognition Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and Kristen Grauman.
1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.
Lecture 28: Bag-of-words models
Recognition Readings Szeliski Chapter 14 The “Margaret Thatcher Illusion”, by Peter Thompson.
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Object Recognition: History and Overview Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce.
Object Recognition by Parts Object recognition started with line segments. - Roberts recognized objects from line segments and junctions. - This led to.
Project 4 out today –help session today –photo session today Project 2 winners Announcements.
Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2005 with a lot of slides stolen from Steve Seitz and.
Bag-of-features models
Object Recognition: Conceptual Issues Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and K. Grauman.
Object Recognition: Conceptual Issues Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and K. Grauman.
Visual Object Recognition Rob Fergus Courant Institute, New York University
Computer Vision I Instructor: Prof. Ko Nishino. Today How do we recognize objects in images?
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2006 with a lot of slides stolen from Steve Seitz and.
Project 2 due today Project 3 out today –help session today Announcements.
Object Recognition by Parts Object recognition started with line segments. - Roberts recognized objects from line segments and junctions. - This led to.
Machine learning & category recognition Cordelia Schmid Jakob Verbeek.
Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.
Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.
So far focused on 3D modeling
Keypoint-based Recognition Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/04/10.
Internet-scale Imagery for Graphics and Vision James Hays cs195g Computational Photography Brown University, Spring 2010.
Fitting: The Hough transform. Voting schemes Let each feature vote for all the models that are compatible with it Hopefully the noise features will not.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Classification 1: generative and non-parameteric methods Jakob Verbeek January 7, 2011 Course website:
MSRI workshop, January 2005 Object Recognition Collected databases of objects on uniform background (no occlusions, no clutter) Mostly focus on viewpoint.
Fitting: The Hough transform
Epitomic Location Recognition A generative approach for location recognition K. Ni, A. Kannan, A. Criminisi and J. Winn In proc. CVPR Anchorage,
11/26/2015 Copyright G.D. Hager Class 2 - Schedule 1.Optical Illusions 2.Lecture on Object Recognition 3.Group Work 4.Sports Videos 5.Short Lecture on.
CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.
Li Fei-Fei, UIUC Rob Fergus, MIT Antonio Torralba, MIT Recognizing and Learning Object Categories ICCV 2005 Beijing, Short Course, Oct 15.
CS654: Digital Image Analysis
Part 4: combined segmentation and recognition Li Fei-Fei.
Recognition Readings – C. Bishop, “Neural Networks for Pattern Recognition”, Oxford University Press, 1998, Chapter 1. – Szeliski, Chapter (eigenfaces)
776 Computer Vision Jan-Michael Frahm Spring 2012.
776 Computer Vision Jan-Michael Frahm Spring 2012.
Announcements Project 2 due tomorrow night Project 3 out Wednesday
SFM under orthographic projection
Lecture 14: Introduction to Recognition
Lecture 25: Introduction to Recognition
Announcements Project 1 artifact winners
Lecture 26: Faces and probabilities
Brief Review of Recognition + Context
CS4670: Intro to Computer Vision
Announcements Project 2 artifacts Project 3 due Thursday night
Announcements Project 4 out today Project 2 winners help session today
Where are we? We have covered: Project 1b was due today
Announcements Artifact due Thursday
Announcements Artifact due Thursday
The “Margaret Thatcher Illusion”, by Peter Thompson
Presentation transcript:

Object Recognition Szeliski Chapter 14

Recognition

Using context (Russell, Torralba, Liu et al. 2007). Simultaneous recognition and segmentation (Shotton, Winn, Rother et al. 2009) Location recognition (Philbin, Chum, Isard et al. 2007) Recognition

Slides from Rick Szeliski Recognition problems What is it? Object and scene recognition Who is it? Identity recognition Where is it? Object detection What are they doing? Activities All of these are classification problems Choose one class from a list of possible candidates Recognition Slides from Rick Szeliski

Slides from Rick Szeliski What is recognition? A different taxonomy from [Csurka et al. 2006]: Recognition Where is this particular object? Categorization What kind of object(s) is(are) present? Content-based image retrieval Find me something that looks similar Detection Locate all instances of a given class Recognition Slides from Rick Szeliski

Slides from Rick Szeliski Readings Weakly Supervised Scale-Invariant Learning of Models for Visual Recognition Fergus, R. , Perona, P. and Zisserman, A. International Journal of Computer Vision, Vol. 71(3), 273-303, March 2007 MIT course http://people.csail.mit.edu/torralba/courses/6.870/6.870.recognition.htm Recognition Slides from Rick Szeliski

Slides from Rick Szeliski Sources Steve Seitz, CSE 455/576, previous quarters Fei-Fei, Fergus, Torralba, CVPR’2007 course Efros, CMU 16-721 Learning in Vision Freeman, MIT 6.869 Computer Vision: Learning Linda Shapiro, CSE 576, Spring 2007 Recognition Slides from Rick Szeliski

Today’s lecture Object Detection Known object recognition [Lowe] Bag of keypoints [Csurka etc.] Location recognition [Schindler et al.] Deformable object/category recognition [Fergus et al.] Recognition by segmentation Recognition

Object Detection How to recognize each person in this image? Every possible sub-window? Effective special-purpose detectors: rapidly find likely regions where particular objects might occur. How to recognize each person in this image? Recognition

Object Detection Face Detection General object detection More successful Built in digital cameras to enhance auto-focus Video conference to control pan-tilt heads … General object detection Pedestrian Cars. Recognition

Face Detection Every pixel? Scale? Tutorials Too slow in practice http://people.csail.mit.edu/torralba/shortCourseRLOC/ General detection and recognition http://vision.ai.uiuc.edu/mhyang/face-detection-survey.html Face recognition Recognition

Face Recognition and Detection Face detection You can ask people to see what they come up with How to tell if a face is present? CSE 576, Spring 2008 Face Recognition and Detection 12

Face Recognition and Detection Skin detection skin Skin pixels have a distinctive range of colors Corresponds to region(s) in RGB color space Skin classifier A pixel X = (R,G,B) is skin if it is in the skin (color) region How to find this region? CSE 576, Spring 2008 Face Recognition and Detection

Face Recognition and Detection Skin detection Learn the skin region from examples Manually label skin/non pixels in one or more “training images” Plot the training data in RGB space skin pixels shown in orange, non-skin pixels shown in gray some skin pixels may be outside the region, non-skin pixels inside. CSE 576, Spring 2008 Face Recognition and Detection

Face Recognition and Detection Skin classifier Given X = (R,G,B): how to determine if it is skin or not? Nearest neighbor find labeled pixel closest to X Find plane/curve that separates the two classes popular approach: Support Vector Machines (SVM) Data modeling fit a probability density/distribution model to each class CSE 576, Spring 2008 Face Recognition and Detection

Face Recognition and Detection Probability X is a random variable P(X) is the probability that X achieves a certain value called a PDF probability distribution/density function a 2D PDF is a surface 3D PDF is a volume continuous X discrete X CSE 576, Spring 2008 Face Recognition and Detection

Probabilistic skin classification Model PDF / uncertainty Each pixel has a probability of being skin or not skin Skin classifier Given X = (R,G,B): how to determine if it is skin or not? Choose interpretation of highest probability Where do we get and ? CSE 576, Spring 2008 Face Recognition and Detection

Learning conditional PDF’s We can calculate P(R | skin) from a set of training images It is simply a histogram over the pixels in the training images each bin Ri contains the proportion of skin pixels with color Ri This doesn’t work as well in higher-dimensional spaces. Why not? Approach: fit parametric PDF functions common choice is rotated Gaussian center covariance CSE 576, Spring 2008 Face Recognition and Detection

Learning conditional PDF’s We can calculate P(R | skin) from a set of training images But this isn’t quite what we want Why not? How to determine if a pixel is skin? We want P(skin | R) not P(R | skin) How can we get it? CSE 576, Spring 2008 Face Recognition and Detection

Face Recognition and Detection Bayes rule what we measure (likelihood) domain knowledge (prior) In terms of our problem: what we want (posterior) normalization term What can we use for the prior P(skin)? Domain knowledge: P(skin) may be larger if we know the image contains a person For a portrait, P(skin) may be higher for pixels in the center Learn the prior from the training set. How? CSE 576, Spring 2008 Face Recognition and Detection P(skin) is proportion of skin pixels in training set

Face Recognition and Detection Bayesian estimation likelihood posterior (unnormalized) Bayesian estimation Goal is to choose the label (skin or ~skin) that maximizes the posterior ↔ minimizes probability of misclassification this is called Maximum A Posteriori (MAP) estimation CSE 576, Spring 2008 Face Recognition and Detection

Skin detection results CSE 576, Spring 2008 Face Recognition and Detection

General classification This same procedure applies in more general circumstances More than two classes More than one dimension Example: face detection Here, X is an image region dimension = # pixels each face can be thought of as a point in a high dimensional space H. Schneiderman, T. Kanade. "A Statistical Method for 3D Object Detection Applied to Faces and Cars". CVPR 2000 CSE 576, Spring 2008 Face Recognition and Detection

Face Detection Feature-based Template-based Appearance-based Distinctive image features: eyes, nose, mouth Geometrical arrangement Template-based Active shape model (ASM), active appearance model (AAM) Requires good initialization near a real face Appearance-based Search for likely candidates Refine using cascade of more expensive but selective detection algorithm Rely heavily on training classifiers using labeled faces Recognition

Slides from Rick Szeliski Recognition Slides from Rick Szeliski

CVPR 2007 Minneapolis, Short Course, June 17 Recognizing and Learning Object Categories: Year 2007 Li Fei-Fei, Princeton Rob Fergus, MIT Antonio Torralba, MIT (see other slide deck)

Slides from Rick Szeliski Today’s lecture Known object recognition [Lowe] Bag of keypoints [Csurka etc.] Location recognition [Schindler et al.] Deformable object/category recognition [Fergus et al.] Recognition by segmentation Recognition Slides from Rick Szeliski

Single object recognition Slides from Rick Szeliski

Single object recognition Lowe, et al. 1999, 2003 Mahamud and Herbert, 2000 Ferrari, Tuytelaars, and Van Gool, 2004 Rothganger, Lazebnik, and Ponce, 2004 Moreels and Perona, 2005 … Recognition Slides from Rick Szeliski

Planar object recognition [Lowe] Use SIFT features Verify affine (or homography) geometric alignment Recognition Slides from Rick Szeliski

Planar object recognition [Lowe] Use SIFT features Verify affine (or homography) geometric alignment Recognition Slides from Rick Szeliski

3D object recognition [Lowe] Extract object outlines with background subtraction Recognition Slides from Rick Szeliski

3D object recognition [Lowe] Use 3 matches to recognize Use additional matches for verification Tolerant to occlusions Recognition Slides from Rick Szeliski

Feature-based recognition How can we scale to millions of objects? Comparison to all stored objects/features is infeasible. Answer: quantize features into words [Csurka et al. 04] use information retrieval (inverted index) use metric tree for faster quantization (NN) [Nister & Stewenius 05] Recognition Slides from Rick Szeliski

Slides from Rick Szeliski Today’s lecture Known object recognition [Lowe] Bag of keypoints [Csurka etc.] Location recognition [Schindler et al.] Deformable object/category recognition [Fergus et al.] Recognition by segmentation Recognition Slides from Rick Szeliski

Part 1: Bag-of-words models CVPR 2007 Minneapolis, Short Course, June 17 (see other slide deck) Part 1: Bag-of-words models by Li Fei-Fei (Princeton)

Slides from Rick Szeliski Today’s lecture Known object recognition [Lowe] Bag of keypoints [Csurka etc.] Location recognition [Schindler et al.] Deformable object/category recognition [Fergus et al.] Recognition by segmentation Recognition Slides from Rick Szeliski

How to scale to 106s of images? Make “word” generation even more efficient: “Vocabulary tree” Recognition Slides from Rick Szeliski

Scalable Recognition with a Vocabulary Tree David Nistér, Henrik Stewénius Recognition Slides from Rick Szeliski

Slides from Rick Szeliski Vocabulary Tree Recognition Slides from Rick Szeliski

Slides from Rick Szeliski We then run k-means on the descriptor space. In this setting, k defines what we call the branch-factor of the tree, which indicates how fast the tree branches. In this illustration, k is three. We then run k-means again, recursively on each of the resulting quantization cells. This defines the vocabulary tree, which is essentially a hierarchical set of cluster centers and their corresponding Voronoi regions. We typically use a branch-factor of 10 and six levels, resulting in a million leaf nodes. We lovingly call this the Mega-Voc. Recognition Slides from Rick Szeliski

Slides from Rick Szeliski Performance One of the reasons we managed to take the system this far is that we have worked with rigorous performance testing against a database with ground truth. We now have a benchmark database with over 10000 images grouped in sets of four images of the same object. We query on one of the images in the group and see how many of the other images in the group make it to the top of the query. In the paper, we have tested this up to a million images, by embedding the ground truth database into a database encompassing all the frames of 10 feature length movies. Recognition Slides from Rick Szeliski

Slides from Rick Szeliski http://vis.uky.edu/~stewe/ukbench/ We have just posted the whole 10000 image test set on the web for everyone to use. Just Google my webpage and you will find it under the title Recognition Benchmark. Recognition Slides from Rick Szeliski

Slides from Rick Szeliski Location Recognition Can we apply this to recognizing your location from a cell-phone photo? Recognition Slides from Rick Szeliski

City-Scale Location Recognition Grant Schindler, Matthew Brown, and Richard Szeliski CVPR’2007

Slides from Rick Szeliski The Problem Recognition Slides from Rick Szeliski

Slides from Rick Szeliski Main idea Find N-best matches in vocabulary tree Recognition Slides from Rick Szeliski

Slides from Rick Szeliski Other ideas Use only informative features (ignore trees…) Integrate matches with adjacent (streetside) neighbors Recognition Slides from Rick Szeliski

Slides from Rick Szeliski Today’s lecture Known object recognition [Lowe] Bag of keypoints [Csurka etc.] Location recognition [Schindler et al.] Deformable object/category recognition [Fergus et al.] Recognition by segmentation Recognition Slides from Rick Szeliski

Part 2: part-based models CVPR 2007 Minneapolis, Short Course, June 17 (see other slide deck) Part 2: part-based models by Rob Fergus (MIT)

Slides from Rick Szeliski Today’s lecture Known object recognition [Lowe] Bag of keypoints [Csurka etc.] Location recognition [Schindler et al.] Deformable object/category recognition [Fergus et al.] Recognition by segmentation Recognition Slides from Rick Szeliski

Part 4: Combined segmentation and recognition CVPR 2007 Minneapolis, Short Course, June 17 Part 4: Combined segmentation and recognition by Rob Fergus (MIT)

Slides from Rick Szeliski Aim Given an image and object category, segment the object Object Category Model Segmentation Cow Image Segmented Cow Segmentation should (ideally) be shaped like the object e.g. cow-like obtained efficiently in an unsupervised manner able to handle self-occlusion Recognition Slides from Rick Szeliski Slide from Kumar ‘05

Implicit Shape Model - Liebe and Schiele, 2003 Matched Codebook Entries Probabilistic Voting Interest Points Voting Space (continuous) Segmentation Refined Hypotheses (uniform sampling) Backprojected Hypotheses Backprojection of Maxima Recognition Slides from Rick Szeliski

Other topics: context (scenes) Antonio Torralba, Contextual Priming for Object Detection, IJCV(53), No. 2, July 2003, pp. 169-191 Recognition Slides from Rick Szeliski

Slides from Rick Szeliski New work: tiny images Recognition Slides from Rick Szeliski

Datasets and object collections CVPR 2007 Minneapolis, Short Course, June 17 (see other slide deck) Datasets and object collections

Summary of object recognition Known object recognition [Lowe] Bag of keypoints [Csurka etc.] Location recognition [Schindler et al.] Deformable object/category recognition [Fergus et al.] Recognition by segmentation Context and scenes Recognition Slides from Rick Szeliski