Download presentation
Presentation is loading. Please wait.
Published byElina Alanen Modified over 6 years ago
3
Object Classes Most recent work is at the object level
We perceive the world in terms of objects, belonging to different classes. What are the differences between dogs and cats What is common across different examples of objects in the class, shoes, trees…
4
We want in addition: 1. Individual Recognition
4
5
2. Object parts and sub-parts Called: Full Interpretation
Window Mirror Window Door knob Headlight Back wheel Bumper Mirror, door knob are probably ‘query driven’ Front wheel Headlight
6
3. Action recognition (which 2 are different?)
2 here an non-drinking
7
4. Agents Interactions 3 1 2 4 5 6
8
Class Non-class
9
Class Non-class
10
Is this an airplane?
11
Unsupervised Training Data
12
Features and Classifiers
In DNN -- the net produces features of the top layer Previous work explored a broad range of features
13
Features used in the past: Generic Features
Simple (wavelets) Complex (Geons)
14
Marr-Nishihara Binford 1971, Generalized cylinders
15
Marr Net 2017 rotated versions of the object in the image
Last column – this are rotated versions of the object in the image rotated versions of the object in the image
16
Past Class-specific Features: Common Fragments
17
Optima Features: Mutual Information I(C,F)
Class: Feature: I(F,C) = H(C) – H(C|F)
18
Star model Detected fragments ‘vote’ for the center location
Find location with maximal vote In variations, a popular state-of-the art scheme
19
Hierarchies of sub-fragments (a ‘deep net’)
Detect the part itself by simpler sub-parts Repeat at multiple levels, to obtain a hierarchy of parts and sub-parts
20
Example Hierarchies
21
Classification by Features Hierarchy
x2 X1 X3 X4 X5 p(c,X,F) = p(c)Πp(xi|xi-) p(Fi|xi)
22
Global optimum can be found by max-sum message passing (two-pass computation)
X = argmax [p(c,X,F) = p(c)Πp(xi|xi-) p(Fi|xi) ]
23
Context: Parts and Objects Results of two-pass computation
Outside = clips of the input image
24
Current use of probabilistic graph models
25
HoG Descriptor Dallal, N & Triggs, B. Histograms of Oriented Gradients for Human Detection What is (c) here? Mention Normalization SIFT is similar, different details, multi-scale
26
SVM – linear separation in feature space
27
Optimal Separation Perceptron SVM
How do we actually find this optimal plane? We need to find a plane that maximizes the minimal separation. Frank Rosenblatt, 1962 book, Principles of Neurodynamics. Perceptron SVM The Nature of Statistical Learning Theory, 1995 Rosenblatt, Principles of Neurodynamics 1962. Find a separating plane such that the closest points are as far as possible
28
The Margin +1 -1 Separating line: w ∙ x + b = 0
We can always call the separating line wx + b = 0, wx = 0 is the same plane, but through the origin. We can have the same plane with larger w and smaller b. We can choose a pair w,b that will make the further line be wx + b = 1 Calculating the margin. It is 2/|w| Separating line: w ∙ x + b = 0 Far line: w ∙ x + b = +1 Their distance: w ∙ ∆x = +1 Separation: |∆x| = 1/|w| Margin: 2/|w|
29
Using patches with HoG descriptors and classification by SVM
This is from Dalal-Triggs, HoG + SVM. Felzenszwalb extends this: also uses HoG, the description is more complex and includes parts and their locations. What is the Phi here? I think you simply include the coefficient b, since we only have here wf without b? Using patches with HoG descriptors and classification by SVM Person model: HoG
30
DPM: Adding Parts In the object-model, the Fi is a vector of coefficients for part-i, learned by SVM during training. The vi is 2 numbers, si, a scale, is 1 number, the ai is 2 numbers, bi is 2 numbers.
31
Bicycle model: root, parts, spatial map
A bicycle model, and a person. the root, parts, and spatial map. The Filter in the model is the vector of SVM weights. The figure shows the orientations that have positive w values. Person model
32
Deep Learning
34
ImageNet
37
AlexNet
38
On the history deep learning
39
A Neural Network Model A network of ‘neurons’ with multiple layers
Repeating structure, linear, non-linear Automatic learning of weights between units
40
The McCulloch–Pitts neuron (1943)
Relu
41
Perceptron learning yj = f(xj)
42
Back-propagation 1986
43
LeNet 1998 Essentially the same as the current generation
44
MNIST data set
45
Hinton Trends in Cognitive Science 2007
The goal: unsupervised Restricted Boltzmann Machines Combining generative model and inference CNN are feed-forward and massively supervised
46
Basic structure of deep nets.
Not detailed here, but make sure you know the layers structure and repeating 3-layer arrangement
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.