Download presentation
Presentation is loading. Please wait.
Published byDayna Dawson Modified over 9 years ago
1
Visual Grouping and Recognition Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley
2
Collaborators Grouping: Jianbo Shi (CMU), Serge Belongie, Thomas Leung (Compaq CRL) Ecological Statistics: Charless Fowlkes, David Martin, Xiaofeng Ren Recognition: Serge Belongie, Jan Puzicha Grouping: Jianbo Shi (CMU), Serge Belongie, Thomas Leung (Compaq CRL) Ecological Statistics: Charless Fowlkes, David Martin, Xiaofeng Ren Recognition: Serge Belongie, Jan Puzicha
3
From images to objects Labeled sets: tiger, grass etc
4
What enables us to parse a scene? –Low level cues Color/texture Contours Motion –Mid level cues T-junctions Convexity –High level Cues Familiar Object Familiar Motion –Low level cues Color/texture Contours Motion –Mid level cues T-junctions Convexity –High level Cues Familiar Object Familiar Motion
5
Grouping factors
6
But is segmentation a meaningful problem? Difficult to define formally, but humans are remarkably consistent…
7
Human Segmentations (1)
8
Human Segmentations (2)
9
Consistency A BC A,C are refinements of B A,C are mutual refinements A,B,C represent the same percept Attention accounts for differences Image BGL-birdR-bird grass bush head eye beak far body head eye beak body Perceptual organization forms a tree: Two segmentations are consistent when they can be explained by the same segmentation tree (i.e. they could be derived from a single perceptual organization).
10
Ecological Statistics of image segmentation Measure the conditional probability distribution of various grouping cues in human segmented images (Brunswik 1950) Design algorithm for incorporating multiple cues for image segmentation Measure the conditional probability distribution of various grouping cues in human segmented images (Brunswik 1950) Design algorithm for incorporating multiple cues for image segmentation
11
Proximity
12
Similarity of brightness (cf. Coughlan & Yuille, Geman & Jedynek)
13
Convexity
14
Region Area Compare to Alvarez,Gousseau,Morel y = Kx - = 1.008
15
Lengths of curves
16
Image Segmentation as Graph Partitioning Build a weighted graph G=(V,E) from image V:image pixels E:connections between pairs of nearby pixels Partition graph so that similarity within group is large and similarity between groups is small -- Normalized Cuts [Shi&Malik 97]
17
Normalized Cut, A measure of dissimilarity Minimum cut is not appropriate since it favors cutting small pieces. Normalized Cut, Ncut: Minimum cut is not appropriate since it favors cutting small pieces. Normalized Cut, Ncut:
18
Normalized Cut As Generalized Eigenvalue problem after simplification, we get
19
Cue-Integration for Image Segmentation [Malik, Belongie, Shi, Leung 1999]
20
On image segmentation.. Humans are quite consistent, so model the goal as emulating their behavior. Ecological statistics of grouping cues can be learned from image data. We now have a generic image segmentation algorithm (code available) which can be applied for MPEG-4/7 compression and object recognition. Humans are quite consistent, so model the goal as emulating their behavior. Ecological statistics of grouping cues can be learned from image data. We now have a generic image segmentation algorithm (code available) which can be applied for MPEG-4/7 compression and object recognition.
21
Framework for Recognition (1) Segmentation Pixels Segments (2) Association Segments Regions (3) Matching Regions Prototypes Over-segmentation necessary; Under- segmentation fatal Enumerate: # of size k regions in image with n segments is ~(4**k)*n/k ~10 views/object. Matching tolerant to pose/illumination changes, intra-category variation, error in previous steps
22
Matching regions to views GOAL: obtain small misclassification error using few views Matching allowing deformations of prototype views makes this possible GOAL: obtain small misclassification error using few views Matching allowing deformations of prototype views makes this possible
23
Matching with original and deformed prototypes Prototype TestError
24
Deforming Biological Shapes D’Arcy Thompson: On Growth and Form, 1917 –studied transformations between shapes of organisms
25
Find correspondences between points on shape Estimate transformation Measure similarity modeltarget...
26
Finding correspondences between shapes Each shape is represented by a set of sample points Each sample point has a descriptor – the shape context Define cost Wij for matching point i on first shape with point j on second shape. Solve for correspondence as optimum assignment. Each shape is represented by a set of sample points Each sample point has a descriptor – the shape context Define cost Wij for matching point i on first shape with point j on second shape. Solve for correspondence as optimum assignment.
27
Shape Context Count the number of points inside each bin, e.g.: Count = 4 Count = 10... Compact representation of distribution of points relative to each point
29
Comparing Shape Contexts Compute matching costs using Chi Squared Test: Recover correspondences by solving linear assignment problem with costs C ij [Jonker & Volgenant 1987]
30
Matching Example modeltarget
31
Synthetic Test Results Fish - deformation + noiseFish - deformation + outliers ICPShape ContextRPM
32
Measuring Shape Similarity Image appearance around matched points –color or gray-level window –orientation Shape context differences at matched points Bending Energy Image appearance around matched points –color or gray-level window –orientation Shape context differences at matched points Bending Energy
33
COIL Object Database
34
Editing: Prototypes Human Shape Perception Computational Needs for K-NN Human Shape Perception Computational Needs for K-NN
35
Prototype Selection: Coil-20
36
MNIST Handwritten Digits
37
Handwritten Digit Recognition MNIST 60 000: –linear: 12.0% –40 PCA+ quad: 3.3% –1000 RBF +linear: 3.6% –K-NN: 5% –K-NN (deskewed) : 2.4% –K-NN (tangent dist.) : 1.1% –SVM: 1.1% –LeNet 5: 0.95% MNIST 60 000: –linear: 12.0% –40 PCA+ quad: 3.3% –1000 RBF +linear: 3.6% –K-NN: 5% –K-NN (deskewed) : 2.4% –K-NN (tangent dist.) : 1.1% –SVM: 1.1% –LeNet 5: 0.95% MNIST 600 000 (distortions): –LeNet 5: 0.8% –SVM: 0.8% –Boosted LeNet 4: 0.7% MNIST 20 000: –K-NN, Shape Context matching: 0.63%
38
Hand-written Digit Recognition MNIST 60 000: –linear: 12.0% –40 PCA+ quad: 3.3% –1000 RBF +linear: 3.6% –K-NN: 5% –K-NN (deskewed) : 2.4% –K-NN (tangent dist.) : 1.1% –SVM: 1.1% –LeNet 5: 0.95% MNIST 60 000: –linear: 12.0% –40 PCA+ quad: 3.3% –1000 RBF +linear: 3.6% –K-NN: 5% –K-NN (deskewed) : 2.4% –K-NN (tangent dist.) : 1.1% –SVM: 1.1% –LeNet 5: 0.95% MNIST 600 000 (distortions): –LeNet 5: 0.8% –SVM: 0.8% –Boosted LeNet 4: 0.7% MNIST 20 000 –K-NN, Shape context matching: 0.63 %
39
Results: Digit Recognition 1-NN classifier using: Shape context + 0.3 * bending + 1.6 * image appearance
40
Results: Digit Recognition (Detail)
42
Trademark Similarity
43
Future work.. Indexing based on color/texture/shape features before correspondence matching Integrate segmentation and recognition Indexing based on color/texture/shape features before correspondence matching Integrate segmentation and recognition
44
Computing cost on a Pentium PC Segmentation: 2 minutes /image (200x100) Matching : 0.2 sec / match (100 points) Segmentation: 2 minutes /image (200x100) Matching : 0.2 sec / match (100 points)
45
Given a 10 4 speedup.. 5K object categories/sec Humans can recognize 10K -100K objects, so we could be in the ballpark of human level vision by 2020. 5K object categories/sec Humans can recognize 10K -100K objects, so we could be in the ballpark of human level vision by 2020.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.