Object Recognizing We will discuss: Features Classifiers Example ‘winning’ system.

Slides:



Advertisements
Similar presentations
Lecture 9 Support Vector Machines
Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
PrasadL18SVM1 Support Vector Machines Adapted from Lectures by Raymond Mooney (UT Austin)
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Support vector machine
Lecture 31: Modern object recognition
Many slides based on P. FelzenszwalbP. Felzenszwalb General object detection with deformable part-based models.
More sliding window detection: Discriminative part-based models Many slides based on P. FelzenszwalbP. Felzenszwalb.
Discriminative and generative methods for bags of features
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.
Rutgers CS440, Fall 2003 Support vector machines Reading: Ch. 20, Sec. 6, AIMA 2 nd Ed.
Reduced Support Vector Machine
Support Vector Machines Kernel Machines
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Support Vector Machine (SVM) Classification
Support Vector Machines
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
2806 Neural Computation Support Vector Machines Lecture Ari Visa.
Lecture 10: Support Vector Machines
Object Recognition Vision Class Object Classes.
Object recognition. Object Classes Individual Recognition.
SVMs, cont’d Intro to Bayesian learning. Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems.
Generic object detection with deformable part-based models
Support Vector Machines
Object Recognizing. Object Classes Individual Recognition.
SVM by Sequential Minimal Optimization (SMO)
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Object Recognizing. Recognition -- topics Features Classifiers Example ‘winning’ system.
“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
Object Detection with Discriminatively Trained Part Based Models
Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
Deformable Part Model Presenter : Liu Changyu Advisor : Prof. Alex Hauptmann Interest : Multimedia Analysis April 11 st, 2013.
Beyond Sliding Windows: Object Localization by Efficient Subwindow Search The best paper prize at CVPR 2008.
Deformable Part Models (DPM) Felzenswalb, Girshick, McAllester & Ramanan (2010) Slides drawn from a tutorial By R. Girshick AP 12% 27% 36% 45% 49% 2005.
Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
An Introduction to Support Vector Machine (SVM)
CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support Vector Machines
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
Object Recognizing. Object Classes Individual Recognition.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Computational Intelligence: Methods and Applications Lecture 24 SVM in the non-linear case Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
CS 2750: Machine Learning Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh February 17, 2016.
More sliding window detection: Discriminative part-based models
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Object Recognizing. Object Classes Individual Recognition.
A Discriminatively Trained, Multiscale, Deformable Part Model Yeong-Jun Cho Computer Vision and Pattern Recognition,2008.
Object Recognizing ..
Object detection with deformable part-based models
Support Vector Machines
“The Truth About Cats And Dogs”
Object Classes Most recent work is at the object level We perceive the world in terms of objects, belonging to different classes. What are the differences.
Support Vector Machines
Support vector machines
Presentation transcript:

Object Recognizing We will discuss: Features Classifiers Example ‘winning’ system

Object Classes

ClassNon-class

Features and Classifiers Same features with different classifiers Same classifier with different features

Generic Features Simple (wavelets)Complex (Geons)

Class-specific Features: Common Building Blocks

Optimal Class Components? Large features are too rare Small features are found everywhere Find features that carry the highest amount of information

Entropy Entropy: x =01H p =0.50.5?

Mutual Information I(C,F) Class: Feature: I(F,C) = H(C) – H(C|F)

Optimal classification features Theoretically: maximizing delivered information minimizes classification error In practice: informative object components can be identified in training images

Selecting Fragments

Adding a New Fragment (max-min selection) ? MIΔ MI = MI [ Δ ; class ] - MI [ ; class ] Select: Max i Min k ΔMI (Fi, Fk) (Min. over existing fragments, Max. over the entire pool)

Highly Informative Face Fragments

Horse-class features Car-class features Pictorial features Learned from examples

Fragments with positions On all detected fragments within their regions

Star model Detected fragments ‘vote’ for the center location Find location with maximal vote

Bag of words

Bag of visual words A large collection of image patches –

Each class has its words historgram – – –

SVM – linear separation in feature space

Optimal Separation SVMPerceptron Find a separating plane such that the closest points are as far as possible

Separating line:w ∙ x + b = 0 Far line:w ∙ x + b = +1 Their distance:w ∙ ∆x = +1 Separation:|∆x| = 1/|w| Margin:2/|w| 0 +1 The Margin

Max Margin Classification (Equivalently, usually used How to solve such constraint optimization? The examples are vectors x i The labels y i are +1 for class, -1 for non-class

Using Lagrange multipliers: Minimize L P = With α i > 0 the Lagrange multipliers

Minimize L p : Set all derivatives to 0: Also for the α i Dual formulation: Maximize the Lagrangian w.r.t. the α i and the above conditions. Put into L p

Dual formulation Mathematically equivalent formulation: Can maximize the Lagrangian with respect to the α i After manipulations – nice concise optimization:

SVM: in simple matrix form We first find the α. From this we can find:w, b, and the support vectors. The matrix H is a simple ‘data matrix’: H ij = y i y j Final classification: w∙x + b ∑α i y i + b Because w = ∑α i y i x i Only with support vectors are used

Full story – separable case Or use ∑α i y i + b

Quadratic Programming QP Minimize (with respect to x) Subject to one or more constraints of the form: Ax < b (inequality constraints) Ex = d (equality constraints)

The non-separable case

It turns out that we can get a very similar formulation of the problem and solution, if we penalize the incorrect classification in a certain way. The penalty is Cξ i where ξ i ≥ 0is the distance of the miss- classified point from the respective plane. We now minimize a penalty with the miss-classifications:

Kernel Classification

Using kernels A kernel K(x,x’) is also associated with a mapping x → φ(x) We can use φ(x) and perform a linear classification in the target space. It turns out that this can be done directly using kernels and without the mapping, the results are equivalent. The optimal separation in the target space is the same as what we will get using the procedure below. It is similar to the linear case, with the kernel replacing the dot-product.

Use K(x i, x j ) Use ∑α i y i K + b

Summary points Linear separation with the largest margin, f(x) = w∙x + b Dual formulation, f(x) = ∑α i y i (x i ∙ x) + b Natural extension to non-separable classes Extension through kernels, f(x) = ∑α i y i K(x i x) + b

Felzenszwalb et al. Felzenszwalb, McAllester, Ramanan CVPR A Discriminatively Trained, Multiscale, Deformable Part Model

Object model using HoG A bicycle and its ‘root filter’ The root filter is a patch of HoG descriptor Image is partitioned into 8x8 pixel cells In each block we compute a histogram of gradient orientations

Using patches with HoG descriptors and classification by SVM

The filter is searched on a pyramid of HoG descriptors, to deal with unknown scale Dealing with scale: multi-scale analysis

A part Pi = (Fi, vi, si, ai, bi). Fi is filter for the i-th part, vi is the center for a box of possible positions for part i relative to the root position, si the size of this box ai and bi are two-dimensional vectors specifying coefficients of a quadratic function measuring a score for each possible placement of the i-th part. That is, a i and b i are two numbers each, and the penalty for deviation ∆x, ∆y from the expected location is a 1 ∆ x + a 2 ∆y + b 1 ∆x 2 + b 2 ∆y 2 Adding Parts

Bicycle model: root, parts, spatial map Person model

The full score of a potential match is: ∑ F i ∙ H i + ∑ a i1 x i + a i2 y + b i1 x 2 + b i2 y 2 F i ∙ H i is the appearance part x i, y i, is the deviation of part p i from its expected location in the model. This is the spatial part. Match Score

The score of a match can be expressed as the dot-product of a vector β of coefficients, with the image: Score = β∙ψ Using the vectors ψ to train an SVM classifier: β∙ψ > 1 for class examples β∙ψ < 1 for class examples Using SVM:

β∙ψ > 1 for class examples β∙ψ < 1 for class examples However, ψ depends on the placement z, that is, the values of ∆x i, ∆y i We need to take the best ψ over all placements. In their notation: Classification then uses β∙f > 1 We need to take the best ψ over all placements. In their notation: Classification then uses β∙f > 1

In analogy to classical SVMs we would like to train from labeled examples D = (..., ) By optimizing the following objective function, Finding β, SVM training:

search with gradient descent over the placement. This includes also the levels in the hierarchy. Start with the root filter, find places of high score for it. For these high-scoring locations, each for the optimal placement of the parts at a level with twice the resolution as the root-filter, using GD. With the optimal placement, use β∙ψ > 1 for class examples β∙ψ < 1 for class examples Recognition

Training -- positive examples with bounding boxes around the objects, and negative examples. Learn root filter using SVM Define fixed number of parts, at locations of high energy in the root filter HoG Use these to start the iterative learning

Hard Negatives The set M of hard-negatives for a known β and data set D These are support vector (y ∙ f =1) or misses (y ∙ f < 1) Optimal SVM training does not need all the examples, hard examples are sufficient. For a given β, use the positive examples + C hard examples Use this data to compute β by standard SVM Iterate (with a new set of C hard examples)

All images contain at least 1 bike

Correct person detections

Difficult images, medium results. About 0.5 precision at 0.5 recall

All images contain at least 1 bird

Average precision: Roughly, AP of 0.3 – in a test with 1000 class images, out of the top 1000 detection, 300 will be true class examples (recall = precision = 0.3).

Future Directions Dealing with very large number of classes –Imagenet, 15,000 categories, 12 million images To consider: human-level performance for at least one class