Object Classes Most recent work is at the object level We perceive the world in terms of objects, belonging to different classes. What are the differences.

Object Classes Most recent work is at the object level
We perceive the world in terms of objects, belonging to different classes. What are the differences between dogs and cats What is common across different examples of objects in the class, shoes, trees…

We want in addition: 1. Individual Recognition
4

2. Object parts and sub-parts Called: Full Interpretation
Window Mirror Window Door knob Headlight Back wheel Bumper Mirror, door knob are probably ‘query driven’ Front wheel Headlight

3. Action recognition (which 2 are different?)
2 here an non-drinking

4. Agents Interactions 3 1 2 4 5 6

Class Non-class

Is this an airplane?

Unsupervised Training Data

Features and Classifiers
In DNN -- the net produces features of the top layer Previous work explored a broad range of features

Features used in the past: Generic Features
Simple (wavelets) Complex (Geons)

Marr-Nishihara Binford 1971, Generalized cylinders

Marr Net 2017 rotated versions of the object in the image
Last column – this are rotated versions of the object in the image rotated versions of the object in the image

Past Class-specific Features: Common Fragments

Optima Features: Mutual Information I(C,F)
Class: Feature: I(F,C) = H(C) – H(C|F)

Star model Detected fragments ‘vote’ for the center location
Find location with maximal vote In variations, a popular state-of-the art scheme

Hierarchies of sub-fragments (a ‘deep net’)
Detect the part itself by simpler sub-parts Repeat at multiple levels, to obtain a hierarchy of parts and sub-parts

Example Hierarchies

Classification by Features Hierarchy
x2 X1 X3 X4 X5 p(c,X,F) = p(c)Πp(xi|xi-) p(Fi|xi)

Global optimum can be found by max-sum message passing (two-pass computation)
X = argmax [p(c,X,F) = p(c)Πp(xi|xi-) p(Fi|xi) ]

Context: Parts and Objects Results of two-pass computation
Outside = clips of the input image

Current use of probabilistic graph models

HoG Descriptor Dallal, N & Triggs, B. Histograms of Oriented Gradients for Human Detection What is (c) here? Mention Normalization SIFT is similar, different details, multi-scale

SVM – linear separation in feature space

Optimal Separation Perceptron SVM
How do we actually find this optimal plane? We need to find a plane that maximizes the minimal separation. Frank Rosenblatt, 1962 book, Principles of Neurodynamics. Perceptron SVM The Nature of Statistical Learning Theory, 1995 Rosenblatt, Principles of Neurodynamics 1962. Find a separating plane such that the closest points are as far as possible

The Margin +1 -1 Separating line: w ∙ x + b = 0
We can always call the separating line wx + b = 0, wx = 0 is the same plane, but through the origin. We can have the same plane with larger w and smaller b. We can choose a pair w,b that will make the further line be wx + b = 1 Calculating the margin. It is 2/|w| Separating line: w ∙ x + b = 0 Far line: w ∙ x + b = +1 Their distance: w ∙ ∆x = +1 Separation: |∆x| = 1/|w| Margin: 2/|w|

Using patches with HoG descriptors and classification by SVM
This is from Dalal-Triggs, HoG + SVM. Felzenszwalb extends this: also uses HoG, the description is more complex and includes parts and their locations. What is the Phi here? I think you simply include the coefficient b, since we only have here wf without b? Using patches with HoG descriptors and classification by SVM Person model: HoG

DPM: Adding Parts In the object-model, the Fi is a vector of coefficients for part-i, learned by SVM during training. The vi is 2 numbers, si, a scale, is 1 number, the ai is 2 numbers, bi is 2 numbers.

Bicycle model: root, parts, spatial map
A bicycle model, and a person. the root, parts, and spatial map. The Filter in the model is the vector of SVM weights. The figure shows the orientations that have positive w values. Person model

Deep Learning

ImageNet

AlexNet

On the history deep learning

A Neural Network Model A network of ‘neurons’ with multiple layers
Repeating structure, linear, non-linear Automatic learning of weights between units

The McCulloch–Pitts neuron (1943)
Relu

Perceptron learning yj = f(xj)

Back-propagation 1986

LeNet 1998 Essentially the same as the current generation

MNIST data set

Hinton Trends in Cognitive Science 2007
The goal: unsupervised Restricted Boltzmann Machines Combining generative model and inference CNN are feed-forward and massively supervised

Basic structure of deep nets.
Not detailed here, but make sure you know the layers structure and repeating 3-layer arrangement

Object Classes Most recent work is at the object level We perceive the world in terms of objects, belonging to different classes. What are the differences.

Similar presentations

Presentation on theme: "Object Classes Most recent work is at the object level We perceive the world in terms of objects, belonging to different classes. What are the differences."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Object Classes Most recent work is at the object level We perceive the world in terms of objects, belonging to different classes. What are the differences.

Similar presentations

Presentation on theme: "Object Classes Most recent work is at the object level We perceive the world in terms of objects, belonging to different classes. What are the differences."— Presentation transcript:

Similar presentations

About project

Feedback