Object Recognizing ..

Object Recognizing .

Recognition Features Classifiers Example ‘winning’ system

Object Classes We perceive the world in terms of objects, belonging to different classes. What are the differences between dogs and cats What is common across different examples of objects in the class, shoes, trees…

Individual Recognition
4

Object parts Automatic, or query-driven
Window Mirror Window Door knob Headlight Back wheel Bumper Mirror, door knob are probably ‘query driven’ Front wheel Headlight

Class Non-class

Features and Classifiers
Same features with different classifiers Same classifier with different features

Generic Features Simple (wavelets) Complex (Geons)

Class-specific Features: Common Building Blocks

Optimal Class Components?
Large features are too rare Small features are found everywhere Find features that carry the highest amount of information

Mutual information H(C) F=1 F=0 H(C) when F=1 H(C) when F=0
I(C;F) = H(C) – H(C/F)

Mutual Information I(C,F)
Class: Feature: I(F,C) = H(C) – H(C|F)

Optimal classification features
Theoretically: maximizing delivered information minimizes classification error In practice: informative object components can be identified in training images

KL Classification Error
P(C,F) determines the best classification error: p(F|C) p(C) C F Starting from p(C,F). Choose C according to p(c). Generate the features F according to p(F|C). Now F is observed, and you ‘recover’ C according to q(C|F). Here q is your model. The error is the expected KL distance between the true distribution (the selected C) and q(C|F). A specific case is when the model is correct, q=p, then you get only H(C|F). A necessary condition is to have a small residual H. But you also need q to approximate p. If the set of features is very large, or the model is complex, this will require a large number of training points. With limited training data you need fewer features, simple models. p(C|F)

Selecting Fragments

Adding a New Fragment (max-min selection)
? MI = MI [Δ ; class] - MI [ ; class ] Select: Maxi Mink ΔMI (Fi, Fk) (Min. over existing fragments, Max. over the entire pool)

Horse-class features Car-class features Pictorial features
Learned from examples

Fragments with positions
On all detected fragments within their regions

Variability of Airplanes Detected

Class-fragments and Activation
The red bars are faces. The others cars (blue) and horses. The PFS is the face area. The collateral sulcus is I think that ‘place’ area. The so-called random fragments are taken from random places, but many of them are also informative. Malach et al 2008

Bag of words

Bag of visual words A large collection of image patches
–

Generate a dictionary using K-means clustering

Each class has its words historgram
– Limited or no Geometry Simple and popular, no longer state-of-the art.

Class II .

HoG Descriptor Dallal, N & Triggs, B. Histograms of Oriented Gradients for Human Detection What is (c) here? Mention Normalization

SIFT: Scale-invariant Feature Transform
MSER: Maximally Stable Extremal Regions SURF: Speeded-up Robust Features Cross correlation …. HoG and SIFT are the most widely used. MSER – you find connected regions Ri by level sets of the intensity, that is, image regions where I(x) < t. A region is stable if changing t by some delta changes the region only a little bit.

SVM – linear separation in feature space

Optimal Separation Perceptron SVM
How do we actually find this optimal plane? We need to find a plane that maximizes the minimal separation. Frank Rosenblatt, 1962 book, Principles of Neurodynamics. Perceptron SVM The Nature of Statistical Learning Theory, 1995 Rosenblatt, Principles of Neurodynamics 1962. Find a separating plane such that the closest points are as far as possible

The Margin +1 -1 Separating line: w ∙ x + b = 0
We can always call the separating line wx + b = 0, wx = 0 is the same plane, but through the origin. We can have the same plane with larger w and smaller b. We can choose a pair w,b that will make the further line be wx + b = 1 Calculating the margin. It is 2/|w| Separating line: w ∙ x + b = 0 Far line: w ∙ x + b = +1 Their distance: w ∙ ∆x = +1 Separation: |∆x| = 1/|w| Margin: 2/|w|

Max Margin Classification
The examples are vectors xi The labels yi are +1 for class, -1 for non-class (Equivalently, usually used Answer: Lagrange multipliers, constraint optimization How to solve such constraint optimization?

Using Lagrange multipliers:
Using Lagrange multipliers: Minimize LP = What is the first line here, why do I need it? Lp is for Lagrangian ‘primal’ With αi > 0 the Lagrange multipliers

Minimizing the Lagrangian
Minimize Lp : Set all derivatives to 0: I think that the solution to the constrained optimization is found by setting all derivatives to 0, including the α. For getting to the Dual formulation, we need the derivatives w.r.t. the primary. The ‘above conditions’ here are the conditions we got from setting the derivatives to 0. Also for the derivative w.r.t. αi Dual formulation: Maximize the Lagrangian w.r.t. the αi and the above two conditions.

Dual formulation Mathematically equivalent formulation:
Can maximize the Lagrangian with respect to the αi After manipulations – concise matrix form: Main brief point from this slide – the formulation is in terms of a simple Matrix form, and the matrix is a simple ‘data matrix’ The duality is via the Kuhn-Tucker theorem. Will not get all the details. At the optimal point, all derivatives are 0. The L is minimal in the primal variable x, and maximal with respect to the Lagrange multipliers. You can also express this as maximizing with respect to the alphas, and using derived constraints.

SVM: in simple matrix form
We first find the α. From this we can find: w, b, and the support vectors. The matrix H is a simple ‘data matrix’: Hij = yiyj <xi∙xj> For class and non-class points, construct their Data Matrix H, and solve a maximization problem The duality is via the Kuhn-Tucker theorem. Will not get all the details. At the optimal point, all derivatives are 0. The L is minimal in the primal variable x, and maximal with respect to the Lagrange multipliers. You can also express this as maximizing with respect to the alphas, and using derived constraints. The final solution, it terms of the support vectors it seems to be longer, but useful to know that the solution is expressible in terms of the support vectors, alpha = 0 for all the others. Final classification: w∙x + b ∑αi yi <xi x> + b Because w = ∑αi yi xi Only <xi x> with support vectors are used

Full story – separable case
Classification of a new data point x: sgn ( ∑ [αi yi <xi x> + b] )

Quadratic Programming QP
Minimize (with respect to x) Subject to one or more constraints of the form: Ax < b (inequality constraints) Ex = d (equality constraints) The problem can be solved in polynomial time of Pos. def. Q. (NP-hard otherwise) H is pos def. You can see this, it has the form B*B (where B* is transpose). If we compute for any u, u*(B*B)u, this is (Bu)*(Bu) which is v*v for the vector v = Bu. But this is just ||v|| squared, which is > 0 for any non-zero u.

Full story: separable case
Non- The penalty is Cξi where ξi ≥ 0 is the distance of the miss-classified point from the respective plane. C ≥ Classification of a new data point x: sgn ( ∑ [αi yi <xi x> + b] )

Kernel Classification

Full story – Kernal case
Hij = K(xi,xj) Classification of a new data point x: sgn ( ∑ [αi yi <xi x> + b] )

Felzenszwalb Algorithm
Felzenszwalb, McAllester, Ramanan CVPR A Discriminatively Trained, Multiscale, Deformable Part Model Many implementation details, will describe the main points.

Using patches with HoG descriptors and classification by SVM
This is from Dalal-Triggs, HoG + SVM. Felzenszwalb extends this: also uses HoG, the description is more complex and includes parts and their locations. What is the Phi here? I think you simply include the coefficient b, since we only have here wf without b? Using patches with HoG descriptors and classification by SVM Person model: HoG

Object model using HoG A bicycle and its ‘root filter’
I’ll start by introducing the root filter, HoG, SVM, then go to the full scheme with the parts and ‘latent’ SVM. A bicycle and its ‘root filter’ The root filter is a patch of HoG descriptor Image is partitioned into 8x8 pixel cells In each block we compute a histogram of gradient orientations

Dealing with scale: multi-scale analysis
The filter is searched on a pyramid of HoG descriptors, to deal with unknown scale The filter is searched on a pyramid of HoG descriptors, to deal with unknown scale

Adding Parts A part Pi = (Fi, vi, si, ai, bi).
Fi is filter for the i-th part, vi is the center for a box of possible positions for part i relative to the root position, si the size of this box ai and bi are two-dimensional vectors specifying coefficients of a quadratic function measuring a score for each possible placement of the i-th part. That is, ai and bi are two numbers each, and the penalty for deviation ∆x, ∆y from the expected location is a1 ∆ x + a2 ∆y + b1 ∆x2 + b2 ∆y2 In the object-model, the Fi is a vector of coefficients for part-i, learned by SVM during training. The vi is 2 numbers, si, a scale, is 1 number, the ai is 2 numbers, bi is 2 numbers.

Bicycle model: root, parts, spatial map
A bicycle model, and a person. the root, parts, and spatial map. The Filter in the model is the vector of SVM weights. The figure shows the orientations that have positive w values. Person model

We see the root and parts at two levels of the hierarchy
We see the root and parts at two levels of the hierarchy. (we see 4 levels, the root – in light blue, and the parts in yellow, are on two levels of the hierarchy, the algorithm look for a root at level-i and parts on two levels below). On the top-right we see the root the parts, and the ‘spatial maps’ for the parts, which are a1 ∆ x + a2 ∆y + b1 ∆x2 + b2 ∆y2

Match Score The full score of a potential match is:
∑ Fi ∙ Hi + ∑ ai1 xi + ai2 yi + bi1xi2 + bi2yi2 Fi ∙ Hi is the appearance part xi, yi, is the deviation of part pi from its expected location in the model. This is the spatial part. In the object-model, the Fi is a vector of coefficients for part-i, learned by SVM during training. (The coefficient vector w) Hi is a HoG descriptor extracted from the input image for part-i.

Recognition search with gradient descent over the placement. This includes also the levels in the hierarchy. Start with the root filter, find places of high score for it. For these high-scoring locations, each for the optimal placement of the parts at a level with twice the resolution as the root-filter, using GD. Essentially maximize ∑ Fi Hi + ∑ ai1 xi + ai2 y + bi1x2 + bi2y2 Over placements (xi yi) Fi, ai, bi are in the model, from learning. The xi, yi, are the parts locations we optimize over. The Hi is the HoG descriptor at the selected location, so it also changes with the placement. I think that they first compute maps, for FiHi. Final decision β∙ψ > θ implies class

Learn root filter using SVM
Training -- positive examples with bounding boxes around the objects, and negative examples. Learn root filter using SVM Define fixed number of parts, at locations of high energy in the root filter HoG Use these to start the iterative learning Parts – they use a fixed number in a simple-minded scheme. For 6 parts, they define an area of a part: (S * 0.8)/6. S is the size of the object-box, they take 80% of the area and divide to 6 parts. For the first part – they look for a window of this area, by search over aspect ratio and location, that will have the maximal energy in the HoG in the box. They then zero the energy in the sub-window selected for the first part, and repeat the process. They now have initial placements for the roots and parts in all positive images, and they can apply the iterations in (1),(2) above.

Using SVM: The score of a match can be expressed as the dot-product of a vector β of coefficients, with the image: Score = β∙ψ Z is the placement. If you have a specific placement Z all the ∆xi, ∆yi, then the image descriptor is a vector psi. Using the vectors ψ to train an SVM classifier: β∙ψ > 1 for class examples β∙ψ < 1 for class examples

β∙ψ > 1 for class examples
However, ψ depends on the placement z, that is, the values of ∆xi, ∆yi We need to take the best ψ over all placements. In their notation: Classification then uses β∙f > 1 We need to take the best ψ over all placements. In their notation: Classification then uses β∙f > 1

Finding β, SVM training:
In analogy to classical SVMs we would like to train from labeled examples D = (<x1, y1> , <xn, yn>) the training Data The algorithm optimizes the following objective function, A brief description here: we learn beta from the training examples. We do not run here exactly an SVM. We minimize beta with penalty for miss classifications. The penalty in the Sigma above = 0 for correct classification, if we get > +1 for positive and < -1 for negative. This is because f > +1 with y=1 for the positive, or f < - 1 and y = -1 for the negative. On the whole we find a minimal beta, like in SVM, subject to minimal classification errors in the training data.

Hard Negatives The set M of hard-negatives for a known β and data set D These are support vector (y ∙ f =1) or misses (y ∙ f < 1) Optimal SVM training does not need all the examples, hard examples are sufficient. For a given β, use the positive examples + C hard examples Use this data to compute β by standard SVM Iterate (with a new set of C hard examples) These are the examples at the margin (yf = 1) of misses (yf < 1). It is enough to train with the hard examples. This is when β is known, by this needs to be learned. So the algorithm is: Use initially the positive examples P. With an initial assignment of β, select C hard-negatives from D, by the criterion above. Find the optimal β* on these, using their latent-SVM training. Then iterate – with this β find a new set of hard negatives and update β*.

Comments: relations to Star and to SVM
Like a Star model, it is a collection of parts, at expected locations, where parts are defined by image patches The decision about a part detection is done by an SVM, <F H> where F are the learned coefficients and H is the part HoG descriptor The locations of the parts are learned by so-called Latent SVM. The part location is selected to maximize the SVM score The scheme creates a scale pyramid and searches over the best scales.

‘Pascal Challenge’ Airplanes
Obtaining human-level performance?

All images contain at least 1 bike

Bike Recognition

Future challenges: Dealing with very large number of classes
Imagenet, 15,000 categories, 12 million images To consider: human-level performance for at least one class

Object Recognizing ..

Similar presentations

Presentation on theme: "Object Recognizing .."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Object Recognizing ..

Similar presentations

Presentation on theme: "Object Recognizing .."— Presentation transcript:

Similar presentations

About project

Feedback