Download presentation
Presentation is loading. Please wait.
Published byCuthbert Norton Modified over 9 years ago
1
Leo Zhu CSAIL MIT Joint work with Chen, Yuille, Freeman and Torralba 1
2
How to deal with image complexity A general framework for different vision tasks Rich representation and tractable computation 2 Pattern Theory. Grenander 94 Compositionality. Geman 02, 06 Stochastic Grammar. Zhu and Mumford 06
3
Representation Recursive Compositional Models (RCMs) Inference Recursive Optimization Learning Supervised Parameter Estimation Unsupervised Recursive Dictionary Learning RCM-1: Deformable Object RCM-2: Articulated Object RCM-3: Scene (Entire Image) 3
4
Flat MRF Nodes: object parts Edges: spatial relations Limitations: Short range interaction Sparse 4
5
5
6
6 x: image y: (position, scale, orientation) graph=(nodes, edges) a: index of node b: child of a f: appearances on node a g: potentials on edges (a,b)
7
7 Recursion x: image ; y: (position, scale, orientation); Vertical independency; Self-similarity;
8
Representation Recursive Compositional Models (RCMs) Inference Recursive Optimization Learning Supervised Parameter Estimation Unsupervised Recursive Dictionary Learning 8
9
Inference task: Recursive Optimization: 9 Recursion Polynomial-time Complexity:
10
Supervised learning Perceptron algorithm (MLE, max margin – svm) Parameter estimation needs fast inference. 10 Collins 02. Taskar et al. 04
11
Goal: Input: a set of training images with ground truth. Initialize parameter vector. Training algorithm (Collins 02): Loop over training samples: i = 1 to N Step 1: find the best using inference: Step 2: Update the parameters: End of Loop. 11 Inference is critical for learning where
12
Representation Recursive Compositional Models (RCMs) Inference Recursive Optimization (Polynomial-time) Learning Supervised Parameter Estimation RCM-1: Deformable Object 12
13
Potentials for appearance 13 * = [ Gabor, Edge, …]
14
Potentials for shape: triplet descriptors 14 (position, scale, orientation)
15
15
16
16
17
17
18
Segmentation (Accuracy of pixel labeling) The proportion of the correct pixel labels (object or non- object) Parsing (Average Position Error of matching) The average distance between the positions of leaf nodes of the ground truth and those estimated in the parse tree 18 MethodsTestingSegmentationParsingSpeed RCM-122894.71623s Ren (Berkeley)17291 Winn (LOCUS)20093 Levin and Weiss N/A95 Kumar (OBJ CUT)596
19
Multi-level Precision-Recall curves quantify the recognition performance of object parts. High-level regularity (more parts) help recognition (remove ambiguity). 19
20
Modeling: (Representation) Recursive Compositional Models (RCMs) Inference: (Computing) Recursive Optimization (Polynomial-time) Learning: Supervised Parameter Estimation Unsupervised Recursive Learning RCM-1: deformable object 20
21
Task: given 10 training images, n o labeling, no alignment, highly ambiguous features. Induce the structure (nodes and edges) Estimate the parameters. 21 ? Combinatorial Explosion problem Correspondence is unknown
22
Multi-level dictionary (layer-wise greedy) Bottom-Up and Top-Down recursive procedure Three Principles: Recursive Composition Suspicious Coincidence Competitive Exclusion 22 Barlow 94. Recursion
23
23
24
24 Composition Clustering Suspicious Coincidence Competitive Exclusion
25
Unified representation (RCMs) and learning Bridge the gap between the generic features and specific object structures 25
26
26 LevelCompositionClustersSuspicious Coincidence Competitive Exclusion Seconds 041 1167,43114,68426248117 22,034,851741,662995116254 32,135,4671,012,7773055399 4236,95572,6203029 More Sharing
27
27
28
28
29
29
30
30 Fill in missing parts Examine every node from top to bottom
31
31
32
32 MethodsTestingSegmentationParsingSpeed Unsupervised31693.317s Supervised22894.71623s
33
More classes/viewpoints -> more training/detection cost 33
34
No enough data for rare viewpoints/classes 34
35
Joint multi-class multi-view learning Appearance sharing Part sharing 35
36
120 templates: 5 viewpoints & 26 classes 36
37
37
38
38
39
39
40
40
41
41
42
42
43
43
44
44
45
45
46
Representation Recursive Compositional Models (RCMs) Inference Recursive Optimization (Polynomial-time) Learning Supervised Parameter Estimation RCM-1: Deformable Object RCM-2: Articulated Object 46
47
47 y=(switch, position, scale, orientation) Composition Switch multiple poses
48
48
49
49
50
Representation Recursive Compositional Models (RCMs) Inference Recursive Optimization (Polynomial-time) Learning Supervised Parameter Estimation RCM-1: Deformable Object RCM-2: Articulated Object RCM-3: Scene (Entire Image) 50
51
Task: Image Segmentation and Labeling 51
52
52 Geman and Geman 84. L Zhu et al. NIPS 08 Flat MRF: object labeling (recognition only). Lack of long-range interactions. Lack of region-level properties. High-order potentials -> heavy computation
53
53 Geman and Geman 84. L Zhu et al. NIPS 08 Flat MRF: object labeling (recognition only). Joint segmentation-recognition template
54
(segmentation, object) pair: chicken-and-egg of segmentation and recognition. Multi-level low-dimensional abstraction 54 Global: gist of scene object layout Local: concurrent shape and appearance coarse to fine
55
55 f: appearance likelihood g:object layout prior homogeneitylayer-wise consistency object texture color object co- occurrence segmentation prior Recursion y=(segmentation, object) Horse Grass
56
State space: C=21 classes; D=30 templates; K=3 classes / per template Inference (recursive optimization): Supervised learning (perceptron ) 56
57
57
58
58
59
Implementation Details Comparisons 59 TextonBoost Shotton et al. 04 PLSA-MRF Berbeek and Trigg AutoContext Tu 08 Classifier only RCM-3 Average57.7646867.274.5 Global72.2 69 (Classifier) 73.577.775.981.4 DatasetClassesSizeTraining Size Training Time Testing Time MSRC2159145%55h30s
60
60 RCM-1 RCM-2 RCM-3 Triplets of Parts Triplets of Segments Boundary only Region + Boundary
61
Principle: Recursive Composition Composition -> complexity decomposition Recursion -> Universal rules (self-similarity) Recursion and Composition -> sparseness One formula for different tasks. Key: the representation of visual patterns, i.e. y. Low dimension, simple potentials Scaling up: practical Image Understanding System 61
62
Long Zhu, Yuanhao Chen, Antonio Torralba, William Freeman, AlanYuille. Part and Appearance Sharing: Recursive Compositional Models for Multi- View Multi-Object Detection. CVPR. 2010. Long Zhu, Yuanhao Chen, Yuan Lin, Chenxi Lin, Alan Yuille. Recursive Segmentation and Recognition Templates for 2D Parsing. NIPS 2008. Long Zhu, Chenxi Lin, Haoda Huang, Yuanhao Chen, Alan Yuille. Unsupervised Structure Learning: Hierarchical Recursive Composition, Suspicious Coincidence and Competitive Exclusion. ECCV 2008. Long Zhu, Yuanhao Chen, Yifei Lu, Chenxi Lin, Alan Yuille. Max Margin AND/OR Graph Learning for Parsing the Human Body. CVPR 2008. Long Zhu, Yuanhao Chen, Xingyao Ye, Alan Yuille. Structure-Perceptron Learning of a Hierarchical Log-Linear Model. CVPR 2008. Yuanhao Chen, Long Zhu, Chenxi Lin, Alan Yuille, Hongjiang Zhang. Rapid Inference on a Novel AND/OR graph for Object Detection, Segmentation and Parsing. NIPS 2007. Long Zhu, Alan L. Yuille. A Hierarchical Compositional System for Rapid Object Detection. NIPS 2005 62
63
63
64
Polynomial-time inference: Supervised learning Perceptron algorithm (MLE, max margin – svm) Parameter estimation needs fast inference. 64 Recursion Collins 02. Taskar et al. 04
65
65
66
66
67
Task: find a small dictionary D (sparse coding). Multi-level dictionary (layer-wise greedy) Bottom-Up and Top-Down recursive procedure 67 Barlow 94. Recursion
68
68
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.