Object Recognition. So what does object recognition involve?

Slides:



Advertisements
Similar presentations
Fitting: The Hough transform. Voting schemes Let each feature vote for all the models that are compatible with it Hopefully the noise features will not.
Advertisements

Part 1: Bag-of-words models by Li Fei-Fei (Princeton)
FATIH CAKIR MELIHCAN TURK F. SUKRU TORUN AHMET CAGRI SIMSEK Content-Based Image Retrieval using the Bag-of-Words Concept.
Outline SIFT Background SIFT Extraction Application in Content Based Image Search Conclusion.
Marco Cristani Teorie e Tecniche del Riconoscimento1 Teoria e Tecniche del Riconoscimento Estrazione delle feature: Bag of words Facoltà di Scienze MM.
TP14 - Indexing local features
Large-Scale Image Retrieval From Your Sketches Daniel Brooks 1,Loren Lin 2,Yijuan Lu 1 1 Department of Computer Science, Texas State University, TX, USA.
1 Part 1: Classical Image Classification Methods Kai Yu Dept. of Media Analytics NEC Laboratories America Andrew Ng Computer Science Dept. Stanford University.
Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 1 Computer Vision: Histograms of Oriented.
CS4670 / 5670: Computer Vision Bag-of-words models Noah Snavely Object
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
1 Image Retrieval Hao Jiang Computer Science Department 2009.
Face detection Many slides adapted from P. Viola.
Instructor: Mircea Nicolescu Lecture 13 CS 485 / 685 Computer Vision.
Bag-of-features models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Model: Parts and Structure. History of Idea Fischler & Elschlager 1973 Yuille ‘91 Brunelli & Poggio ‘93 Lades, v.d. Malsburg et al. ‘93 Cootes, Lanitis,
Beyond bags of features: Part-based models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Fitting: The Hough transform
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
What is Texture? Texture depicts spatially repeating patterns Many natural phenomena are textures radishesrocksyogurt.
1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.
Lecture 28: Bag-of-words models
Agenda Introduction Bag-of-words model Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.
A Study of Approaches for Object Recognition
“Bag of Words”: when is object recognition, just texture recognition? : Advanced Machine Perception A. Efros, CMU, Spring 2009 Adopted from Fei-Fei.
Object Recognition. So what does object recognition involve?
1 Interest Operator Lectures lecture topics –Interest points 1 (Linda) interest points, descriptors, Harris corners, correlation matching –Interest points.
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Object Recognition: History and Overview Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and Jean Ponce.
Object Recognition. So what does object recognition involve?
Object recognition Jana Kosecka Slides from D. Lowe, D. Forsythe and J. Ponce book, ICCV 2005 Tutorial Fei-Fei Li, Rob Fergus and A. Torralba.
Distinctive Image Feature from Scale-Invariant KeyPoints
Bag-of-features models
Object recognition Jana Kosecka Slides from D. Lowe, D. Forsythe and J. Ponce book, ICCV 2005 Tutorial Fei-Fei Li, Rob Fergus and A. Torralba.
“Bag of Words”: recognition using texture : Advanced Machine Perception A. Efros, CMU, Spring 2006 Adopted from Fei-Fei Li, with some slides from.
Visual Object Recognition Rob Fergus Courant Institute, New York University
Fitting: The Hough transform
Distinctive Image Features from Scale-Invariant Keypoints David G. Lowe – IJCV 2004 Brien Flewelling CPSC 643 Presentation 1.
Using Image Priors in Maximum Margin Classifiers Tali Brayer Margarita Osadchy Daniel Keren.
Overview Introduction to local features
Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.
CSE 473/573 Computer Vision and Image Processing (CVIP)
Overview Harris interest points Comparing interest points (SSD, ZNCC, SIFT) Scale & affine invariant interest points Evaluation and comparison of different.
Fitting: The Hough transform. Voting schemes Let each feature vote for all the models that are compatible with it Hopefully the noise features will not.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
A Statistically Selected Part-Based Probabilistic Model for Object Recognition Zhipeng Zhao, Ahmed Elgammal Department of Computer Science, Rutgers, The.
Window-based models for generic object detection Mei-Chen Yeh 04/24/2012.
Face detection Slides adapted Grauman & Liebe’s tutorial
Visual Object Recognition
MSRI workshop, January 2005 Object Recognition Collected databases of objects on uniform background (no occlusions, no clutter) Mostly focus on viewpoint.
Fitting: The Hough transform
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
11/26/2015 Copyright G.D. Hager Class 2 - Schedule 1.Optical Illusions 2.Lecture on Object Recognition 3.Group Work 4.Sports Videos 5.Short Lecture on.
The Viola/Jones Face Detector A “paradigmatic” method for real-time object detection Training is slow, but detection is very fast Key ideas Integral images.
Overview Introduction to local features Harris interest points + SSD, ZNCC, SIFT Scale & affine invariant interest point detectors Evaluation and comparison.
Li Fei-Fei, UIUC Rob Fergus, MIT Antonio Torralba, MIT Recognizing and Learning Object Categories ICCV 2005 Beijing, Short Course, Oct 15.
Lecture 15: Eigenfaces CS6670: Computer Vision Noah Snavely.
CS654: Digital Image Analysis
Presented by David Lee 3/20/2006
776 Computer Vision Jan-Michael Frahm Spring 2012.
Finding Clusters within a Class to Improve Classification Accuracy Literature Survey Yong Jae Lee 3/6/08.
Presented by David Lee 3/20/2006
CS 2770: Computer Vision Feature Matching and Indexing
Feature description and matching
Object detection as supervised classification
Brief Review of Recognition + Context
SIFT keypoint detection
Part 1: Bag-of-words models
Presentation transcript:

Object Recognition

So what does object recognition involve?

Verification: is that a bus?

Detection: are there cars?

Identification: is that a picture of Mao?

Object categorization sky building flag wall banner bus cars bus face street lamp

Challenges 1: view point variation Michelangelo

Challenges 2: illumination slide credit: S. Ullman

Challenges 3: occlusion Magritte, 1957

Challenges 4: scale

Challenges 5: deformation Xu, Beihong 1943

Challenges 7: intra-class variation

Two main approaches Part-based Global sub-window

Global Approaches x1x1 x2x2 x3x3 Vectors in high- dimensional space Aligned images

x1x1 x2x2 x3x3 Vectors in high-dimensional space Global Approaches Training Involves some dimensionality reduction Detector

–Scale / position range to search over Detection

–Scale / position range to search over

Detection –Scale / position range to search over

Detection –Combine detection over space and scale.

PROJECTPROJECT 1

Turk and Pentland, 1991 Belhumeur et al Schneiderman et al Viola and Jones, 2000 Keren et al Osadchy et al Amit and Geman, 1999 LeCun et al Belongie and Malik, 2002 Schneiderman et al Argawal and Roth, 2002 Poggio et al. 1993

Object Detection Problem: Locate instances of object category in a given image. Asymmetric classification problem! BackgroundObject (Category) Very largeRelatively small Complex (thousands of categories) Simple (single category) Large prior to appear in an image Small prior Easy to collect (not easy to learn from examples) Hard to collect

All images Intuition  Denote H to be the acceptance region of a classifier. We propose to minimize the Pr(All images) ( Pr(bkg)) in H except for the object samples. Background Object class All images Background We have a prior on the distribution of all natural images

Image smoothness measure Lower probability Distribution of Natural Images – Boltzmann distribution In frequency domain:

Antiface Lower probability Ω d object images Acceptance region

Main Idea Claim: for random natural images viewed as unit vectors, is large on average. – for all positive class – d is smooth is large on average for random natural image. Anti-Face detector is defined as a vector d satisfying:

Discrimination SMALL LARGE If x is an image and  is a target class:

Cascade of Independent Detectors 7 inner products 4 inner products

Example Samples from the training set 4 Anti-Face Detectors

4 Anti-face Detectors

Eigenface method with the subspace of dimension 100

Ensemble Learning Bagging –reshuffle your training data to create k different training sets and learn f 1 (x),f 2 (x),…,f k (x) –Combine the k different classifiers by majority voting f FINAL (x) =sign[  1/k f i (x) ] Boosting –Assign different weights to training samples in a “smart” way so that different classifiers pay more attention to different samples –Weighted majority voting, the weight of individual classifier is proportional to its accuracy –Ada-boost (1996) was influenced by bagging, and it is superior to bagging

Boosting - Motivation It is usually hard to design an accurate classifier which generalizes well However it is usually easy to find many “rule of thumb” weak classifiers –A classifier is weak if it is only slightly better than random guessing Can we combine several weak classifiers to produce an accurate classifier? –Question people have been working on since 1980’s

Ada Boost Let’s assume we have 2-class classification problem, with yi  {-1,1} Ada boost will produce a discriminant function:  where f t (x) is the “weak” classifier  The final classifier is the sign of the discriminant function, that is f final (x) = sign[g(x)]

Idea Behind Ada Boost Algorithm is iterative Maintains distribution of weights over the training examples Initially distribution of weights is uniform At successive iterations, the weight of misclassified examples is increased, forcing the weak learner to focus on the hard examples in the training set

PROJECT 2

Training with small number of Examples Majority of object detection method require a large number of training examples. Goal: to design a classifier that can learn from a small number of examples Use small number in a existing classifiers Overfiting: learns by hart the training examples, performs poor on unseen examples.

Linear SVM Maximal margin Enough training data Class 1 Class 2 Not Enough training data

Linear SVM –Detection Task Class 1 Class 2

MM with prior Object class

PROJECT 4

Part-Based Approaches Object Bag of ‘words’ Constellation of parts

Of all the sensory impressions proceeding to the brain, the visual experiences are the dominant ones. Our perception of the world around us is based essentially on the messages that reach the brain from our eyes. For a long time it was thought that the retinal image was transmitted point by point to visual centers in the brain; the cerebral cortex was a movie screen, so to speak, upon which the image in the eye was projected. Through the discoveries of Hubel and Wiesel we now know that behind the origin of the visual perception in the brain there is a considerably more complicated course of events. By following the visual impulses along their path to the various cell layers of the optical cortex, Hubel and Wiesel have been able to demonstrate that the message about the image falling on the retina undergoes a step- wise analysis in a system of nerve cells stored in columns. In this system each cell has its specific function and is responsible for a specific detail in the pattern of the retinal image. sensory, brain, visual, perception, retinal, cerebral cortex, eye, cell, optical nerve, image Hubel, Wiesel China is forecasting a trade surplus of $90bn (£51bn) to $100bn this year, a threefold increase on 2004's $32bn. The Commerce Ministry said the surplus would be created by a predicted 30% jump in exports to $750bn, compared with a 18% rise in imports to $660bn. The figures are likely to further annoy the US, which has long argued that China's exports are unfairly helped by a deliberately undervalued yuan. Beijing agrees the surplus is too high, but says the yuan is only one factor. Bank of China governor Zhou Xiaochuan said the country also needed to do more to boost domestic demand so more goods stayed within the country. China increased the value of the yuan against the dollar by 2.1% in July and permitted it to trade within a narrow band, but the US wants the yuan to be allowed to trade freely. However, Beijing has made it clear that it will take its time and tread carefully before allowing the yuan to rise further in value. China, trade, surplus, commerce, exports, imports, US, yuan, bank, domestic, foreign, increase, trade, value Bag of ‘words’ analogy to documents

Interest Point Detectors Basic requirements: –Sparse –Informative –Repeatable Invariance –Rotation –Scale (Similarity) –Affine

Popular Detectors Scale Invariant Affine Invariant Harris-Laplace Affine Difference of GaussiansLaplace of GaussiansScale Saliency (Kadir- Braidy) Harris-Laplace Difference of Gaussians Affine Laplace of Gaussians Affine Affine Saliency (Kadir- Braidy) The are many others… See: 1)“Scale and affine invariant interest point detectors” K. Mikolajczyk, C. Schmid, IJCV, Volume 60, Number )“A comparison of affine region detectors”, K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir and L. Van Gool,

Representation of appearance: Local Descriptors Invariance –Rotation –Scale –Affine Insensitive to small deformations Illumination invariance –Normalize out

SIFT – Scale Invariant Feature Transform Descriptor overview: –Determine scale (by maximizing DoG in scale and in space), local orientation as the dominant gradient direction. Use this scale and orientation to make all further computations invariant to scale and rotation. –Compute gradient orientation histograms of several small windows (128 values for each point) –Normalize the descriptor to make it invariant to intensity change David G. Lowe, "Distinctive image features from scale-invariant keypoints,“ International Journal of Computer Vision, 60, 2 (2004), pp

Feature Detection and Representation Normalize patch Detect patches [Mikojaczyk and Schmid ’02] [Matas et al. ’02] [Sivic et al. ’03] Compute SIFT descriptor [Lowe’99] Slide credit: Josef Sivic

… Feature Detection and Representation

Codewords dictionary formation …

Vector quantization … Slide credit: Josef Sivic

Codewords dictionary formation Fei-Fei et al. 2005

Image patch examples of codewords Sivic et al. 2005

Vector X Representation Learning positive negative SVM classifier positive negative SVM classification

Recognition SVM(X) Contains object Vector X Representation Doesn’t contain object

PROJECT 3

Pros/Cons Pros. –Fast and simple. –Insensitive to pose variation. –No segmentation required during learning. Cons. –No localization. –Requires discriminative or no background.

An object in an image is represented by a collection of parts, characterized by both their visual appearances and locations. Object categories are modeled by the appearance and spatial distributions of these characteristic parts. Constellation of Parts

The correspondence problem Model with P parts Image with N possible locations for each part N P combinations!!! Slide credit: Rob Fergus

How to model location? Explicit: Probability density functions Implicit: Voting scheme

Probability densities –Continuous (Gaussians) –Analogy with springs Parameters of model,  and  –Independence corresponds to zeros in  Explicit shape model Slide credit: Rob Fergus

Different graph structures Fully connected Star structure Tree structure O(N 6 )O(N 2 ) Sparser graphs cannot capture all interactions between parts Slide credit: Rob Fergus

Implicit shape model Spatial occurrence distributions x y s x y s x y s x y s Probabilistic Voting Interest Points Matched Codebook Entries Recognition Learning Learn appearance codebook –Cluster over interest points on training images Learn spatial distributions –Match codebook to training images –Record matching positions on object –Centroid is given Use Hough space voting to find object Leibe and Schiele ’03,’05 Slide credit: Rob Fergus

Pros/Cons Pros –Principle modeling –Models appearance and shape –Provides localization Cons –Computationally expensive –Small number of parts (learning on unsegmented images) or requires bounding box during learning.

Week Shape Model Model parts arrangements Allows many parts but the model is computationally effective context distributions – see each part in the context of other parts.

PROJECT 4