Unsupervised Learning for Recognition Pietro Perona California Institute of Technology & Universita di Padova 11 th British Machine Vision Conference –

Unsupervised Learning for Recognition Pietro Perona California Institute of Technology & Universita di Padova 11 th British Machine Vision Conference – Manchester, September 2001

Representation and Learning for Visual Object Recognition Pietro Perona California Institute of Technology & Università di Padova First SIAM-EMS Conference – Berlin, 6 Sept. 2001

Representation and Learning for Visual Object Recognition Pietro Perona California Institute of Technology & Università di Padova University of Plymouth, 10 Sept. 2001

OBJECTS ANIMALS INANIMATE PLANTS MAN-MADENATURAL VERTEBRATE ….. MAMMALS BIRDS GROUSEBOARTAPIR CAMERA

S. Thorpe et al. Nature 1996 J. Braun et al. J. Neurosci. 1998 Fei Fei Li et al. Unpublished animal not animal

Issues: Representation Recognition Learning

Meet the xyz

Spot the xyz

Meet the Boletus Edulis

Object categories individual objects `visual’ categories `functional’ categories *

Variability within a category Intrinsic Deformation

Part similarity

Importance of `mutual position’

SVD (2)

Model: constellation of Parts Fischler & Elschlager, 1973 Yuille, ‘91 Brunelli & Poggio, ‘93 Lades, v.d. Malsburg et al. ‘93 Cootes, Lanitis, Taylor et al. ‘95 Amit & Geman, ‘95, ‘99 Perona et al. ‘95, ‘96, ’98, ‘00 Tanaka et al., 1993 Perrett & Oram, 1993

A B D C Deformations

Presence / Absence of Features occlusion

Background clutter

Generative probabilistic model Model (Parameters) Example Object shape pdf e.g. p(x)=G(x| ,  ) Detector specification and prob. of detection 0.8 0.9 0.6 p Poisson (N 2 | 2 ) p Poisson (N 1 | 1 ) p Poisson (N 3 | 3 ) p(x)=A -1 (uniform) 1. Object Part Positions3a. N false detect2. Part Absence N1N1 N3N3 N2N2 Clutter pdf 3b. Position f. detect Prob. of N detect.Pdf of location Final Image

Affine Shape Translation, rotation and scaling Euclidean Shape Add weak perspective projection Affine Shape What is the probability density for the affine shape variables? Feature spaceEuclidean shape Affine shape   

Affine Shape Density [Leung, Burl & Perona ’98] Gaussian figure space density: Affine Shape density: (1)exact if N is odd; (2)good approximation if probability that bases points flip sign is low. goodCareful!

Example Affine Shape Densities Model points Shape density (ground truth) Shape density (approximation)

Generative probabilistic model Model (Parameters) Example Foregrond pdf e.g. p(x)=G(x| ,  ) Prob. of Detection 0.80.9 p Poisson (N 2 | 2 ) p Poisson (N 1 | 1 ) p Poisson (N 3 | 3 ) p(x)=A -1 (uniform) 1. Object Part Positions3a. N false detect2. Part Absence N1N1 N3N3 N2N2 Background pdf 3b. Position f. detect Prob. of N detect.Pdf of location Final Image

Detection by likelihood ratio + + +++ + + + + + + ++ + + + + + + [From Burl et al. – ICCV’95, CVPR’96] + P(object | data) vs. P(clutter | data)

Learning Models `Manually’ Obtain set of training images Label parts by hand, train detectors Learn model from labeled parts Choose parts

Unsupervised learning

Unsupervised detector training - 1 Highly textured neighborhoods are selected automatically produces 100-1000 patterns per image 10

Unsupervised detector training - 2 “Pattern Space” (100+ dimensions)

Unsupervised detector training - 3 100-1000 images ~100 detectors

Parameter Estimation Take training images. Consider set of detectors… Apply detectors…..

Parameter Estimation Signal? Clutter? Correspondence? Chicken-and-egg problem with shape and correspondence. Use EM. optimize for representation (ML on generative models)

ML using EM 1. Current estimate... Image 1 Image 2 Image i 2. Assign probabilities to constellations Large P Small P 3. Use probabilities as weights to reestimate parameters. Example:  Large Px+Small Px pdf new estimate of  + … =

Final Part Selection Parameter Estimation Choice 1 Choice 2 Parameter Estimation Model 1 Model 2 Predict / measure model performance (validation set or directly from model) Preselected Parts (  100)

Frontal Views of Faces 200 Images (100 training, 100 testing) 30 people, different for training and testing

Face images

Background images

Learned face model Preselected Parts Model Foreground pdf Sample Detection Parts in Model Test Error: 6% (4 Parts)

Rear Views of Cars 200 Images (100 training, 100 testing) Only one image per car High-pass filtered

Preselected Parts Model Foreground pdf Sample Detection Parts in Model Learned Model Test Error: 13% (5 Parts)

Detections of Cars

Background Images

“Wildcard” Parts

Parts Shape Context

Dilbert vs. 77 examples 125 examples

Dilbert Model Test Error: 15% (4 Parts) Preselected Parts Model Foreground pdf Sample Detection Parts in Model

Manual vs. Automatic Part Design & Selection Manual Automatic  16% Error  7% Error Task: `E’ vs. No `E’ Similar to manual Used in best models Markus Weber: move task up left color thicker Markus Weber: move task up left color thicker

“Strictly Unsupervised” Learning (Single Class) Training Set 100% Faces (so far)... 66% Faces 50% Faces Test Error 6% 10% 12%

1:2 1:4 1:81:16 Which Part Size and Scale? Markus Weber: Trade-off informativity occlusion sensitivity Markus Weber: Trade-off informativity occlusion sensitivity

Multi-Scale Experiment 123456 Gaussian Pyramid Preselected Parts

Multi-Scale: Detection Performance 2224622 Test Error single scale: 6% (4 parts) multi-scale: 11% (5 parts)

Occlusion Experiment no occlusion: 6% (4 parts) occlusion: 18% (5 parts) Test Error Markus Weber: Say what we do here. Occlusion in TRAINING and TESTING. Is this possible? Fewer Errors below. Markus Weber: Say what we do here. Occlusion in TRAINING and TESTING. Is this possible? Fewer Errors below. Are learning and detection possible under partial occlusion?

View - Based 3D Model

Background Examples

Test Images with Faces

3D Orientation Tuning 0 ° 45 ° 90 ° -15 ° ° 30 ° - 60 ° 75 ° - 105 ° -15 ° - 105 ° Markus Weber: Canonical views add axes info Markus Weber: Canonical views add axes info Frontal Profile 020406080100 50 55 60 65 70 75 80 85 90 95 100 Orientation Tuning angle in degrees % Correct

Johansson’s experiments [‘70s]

What is your brain doing? InputOutput Combinatorial Missing features Noise X i (t)

From trajectories to labels InputOutput x i, v i L i = EL i = 1,…,M

Representation dilemma X WL (t) ??? 2 PROPOSALS: A B

What is this???

learn joint p.d.f. Pr(data | labels) labelling by maximizing likelihood Unfortunately: –High dimensional p.d.f. cumbersome (62 variables -> 10 3 - 10 4 param.) need lots of learning examples – Search cost: M! (try all labellings) E.g. M=16 -> 16!=2*10 13 Probabilistic approach to learning

Approximate decomposition Human body as kinematic chain Markov property: Fewer parameters Find global max with dynamic programming –polynomial cost Pr(A, B, C, D, E) = Pr(A, B, C)Pr(D|B, C)Pr(E|C, D)

Triangulated decomposition (by hand) (a) LE LS LH H N LK LA LF LW LE LS LH LK LA LF LW 1 2 3 4 5 8 7 6 9 10 11 12 13 14 10 2 - 10 3 parameters Markov property Solve in O(M 4 ) [See also recent results on turbo- decoding and bayesian inference]

Training sequences

Unsupervised model A B C D E F G H I J K L A B C D E F G H I J K L Means Correlations

Positive example

Negative example 1

Negative example 2

Person walking left-to-right?

Learning for visual recognition Supervised [Manual alignment/correspondence of training examples] Unsupervised (1 class) [Training images contain examples of 1 class + clutter] Unsupervised (multi-class) [Turn your camera on, come back one year later]

OBJECTS ANIMALS INANIMATE PLANTS MAN-MADENATURAL VERTEBRATE ….. MAMMALS BIRDS GROUSEBOARTAPIR CAMERA

Discovering multiple classes Cars (rear and side view) Leaves (three species) Human Heads (90 o viewing range)

Preselected Parts for Mixture Models HeadsCarsLeaves

Mixture Model of Heads

Tuning of Mixture Models

Summary Probabilistic constellation models Learning based on Maximum Likelihood Unsupervised learning of object categories 3D invariance Biological motion

Main accomplices Markus Weber Thomas Leung Max Welling Yang Song Michael Burl

References [available from: www.vision.caltech.edu] CVPR98 (affine shape) FG00 (viewpoint invariance) ECCV00 (EM algor. for unsupervised learning) CVPR00 (learning of multiple classes) ECCV00, CVPR00, NIPS01, CVPR01 (biological motion) Funded by: National Science Foundation Sloan Foundation INTEL

Unsupervised Learning for Recognition Pietro Perona California Institute of Technology & Universita di Padova 11 th British Machine Vision Conference –

Similar presentations

Presentation on theme: "Unsupervised Learning for Recognition Pietro Perona California Institute of Technology & Universita di Padova 11 th British Machine Vision Conference –"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Unsupervised Learning for Recognition Pietro Perona California Institute of Technology & Universita di Padova 11 th British Machine Vision Conference –

Similar presentations

Presentation on theme: "Unsupervised Learning for Recognition Pietro Perona California Institute of Technology & Universita di Padova 11 th British Machine Vision Conference –"— Presentation transcript:

Similar presentations

About project

Feedback