Download presentation
Presentation is loading. Please wait.
Published byAleesha Pierce Modified over 9 years ago
1
Computer Vision Group University of California Berkeley On Visual Recognition Jitendra Malik UC Berkeley
2
Computer Vision Group University of California Berkeley From Pixels to Perception Tiger Grass Water Sand outdoor wildlife Tiger tail eye legs head back shadow mouse
3
Computer Vision Group University of California Berkeley Object Category Recognition
4
Computer Vision Group University of California Berkeley Defining Categories What is a “visual category”? –Not semantic –Working hypothesis: Two instances of the same category must have “correspondence” (i.e. one can be morphed into the other) e.g. Four-legged animals –Biederman’s estimate of 30,000 basic visual categories
5
Computer Vision Group University of California Berkeley Facts from Biological Vision Timing Abstraction/Generalization Taxonomy and Partonomy
6
Computer Vision Group University of California Berkeley Detection can be very fast On a task of judging animal vs no animal, humans can make mostly correct saccades in 150 ms (Kirchner & Thorpe, 2006) –Comparable to synaptic delay in the retina, LGN, V1, V2, V4, IT pathway. –Doesn’t rule out feed back but shows feed forward only is very powerful
7
Computer Vision Group University of California Berkeley As Soon as You Know It Is There, You Know What It Is Grill-Spector & Kanwisher, Psychological Science, 2005
8
Computer Vision Group University of California Berkeley Abstraction/Generalization Configurations of oriented contours Considerable toleration for small deformations
9
Computer Vision Group University of California Berkeley Attneave’s Cat (1954) Line drawings convey most of the information
10
Computer Vision Group University of California Berkeley Taxonomy and Partonomy Taxonomy: E.g. Cats are in the order Felidae which in turn is in the class Mammalia –Recognition can be at multiple levels of categorization, or be identification at the level of specific individuals, as in faces. Partonomy: Objects have parts, they have subparts and so on. The human body contains the head, which in turn contains the eyes. These notions apply equally well to scenes and to activities. Psychologists have argued that there is a “basic-level” at which categorization is fastest (Eleanor Rosch et al). In a partonomy each level contributes useful information fro recognition.
11
Computer Vision Group University of California Berkeley Matching with Exemplars Use exemplars as templates Correspond features between query and exemplar Evaluate similarity score Query Image Database of Templates
12
Computer Vision Group University of California Berkeley Matching with Exemplars Use exemplars as templates Correspond features between query and exemplar Evaluate similarity score Query Image Database of Templates Best matching template is a helicopter
13
Computer Vision Group University of California Berkeley 3D objects using multiple 2D views View selection algorithm from Belongie, Malik & Puzicha (2001)
14
Computer Vision Group University of California Berkeley Error vs. Number of Views
15
Computer Vision Group University of California Berkeley Three Big Ideas Correspondence based on local shape/appearance descriptors Deformable Template Matching Machine learning for finding discriminative features
16
Computer Vision Group University of California Berkeley Three Big Ideas Correspondence based on local shape/appearance descriptors Deformable Template Matching Machine learning for finding discriminative features
17
Computer Vision Group University of California Berkeley Comparing Pointsets
18
Computer Vision Group University of California Berkeley Shape Context Count the number of points inside each bin, e.g.: Count = 4 Count = 10... FCompact representation of distribution of points relative to each point (Belongie, Malik & Puzicha, 2001)
19
Computer Vision Group University of California Berkeley Shape Context
20
Computer Vision Group University of California Berkeley Geometric Blur (Local Appearance Descriptor) Geometric Blur Descriptor ~ Compute sparse channels from image Extract a patch in each channel Apply spatially varying blur and sub-sample (Idealized signal) Descriptor is robust to small affine distortions Berg & Malik '01
21
Computer Vision Group University of California Berkeley Three Big Ideas Correspondence based on local shape/appearance descriptors Deformable Template Matching Machine learning for finding discriminative features
22
Computer Vision Group University of California Berkeley Modeling shape variation in a category D’Arcy Thompson: On Growth and Form, 1917 –studied transformations between shapes of organisms
23
Computer Vision Group University of California Berkeley Matching Example modeltarget
24
Computer Vision Group University of California Berkeley Handwritten Digit Recognition MNIST 60 000: –linear: 12.0% –40 PCA+ quad: 3.3% –1000 RBF +linear: 3.6% –K-NN: 5% –K-NN (deskewed) : 2.4% –K-NN (tangent dist.) : 1.1% –SVM: 1.1% –LeNet 5: 0.95% MNIST 600 000 (distortions): –LeNet 5: 0.8% –SVM: 0.8% –Boosted LeNet 4: 0.7% MNIST 20 000: –K-NN, Shape Context matching: 0.63%
25
Computer Vision Group University of California Berkeley
26
Computer Vision Group University of California Berkeley EZ-Gimpy Results 171 of 192 images correctly identified: 92 % horse smile canvas spade join here
27
Computer Vision Group University of California Berkeley Three Big Ideas Correspondence based on local shape/appearance descriptors Deformable Template Matching Machine learning for finding discriminative features
28
Computer Vision Group University of California Berkeley Discriminative learning (Frome, Singer, Malik, 2006) weights on patch features in training images distance functions from training images to any other images browsing, retrieval, classification 83/400 79/400
29
Computer Vision Group University of California Berkeley triplets learn from relative similarity image i image j image k want: image-to-image distances based on feature-to- image distances compare image-to-image distances
30
Computer Vision Group University of California Berkeley focal image version image i (focal) 0.3 0.8 0.4 0.2 image j image k -... 0.80.2... 0.30.4 = x ijk... 0.5-0.2 d ik d ij
31
Computer Vision Group University of California Berkeley large-margin formulation slack variables like soft-margin SVM w constrained to be positive L2 regularization
32
Computer Vision Group University of California Berkeley Caltech-101 [Fei-Fei et al. 04] 102 classes, 31-300 images/class
33
Computer Vision Group University of California Berkeley retrieval example query image retrieval results:
34
Computer Vision Group University of California Berkeley Caltech 101 classification results (see Manik Verma’s talks for the best yet..)
35
Computer Vision Group University of California Berkeley 15 training/class, 63.2%
36
Computer Vision Group University of California Berkeley Conclusion Correspondence based on local shape/appearance descriptors Deformable Template Matching Machine learning for finding discriminative features Integrating Perceptual Organization and Recognition
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.