Download presentation
Presentation is loading. Please wait.
1
Probabilistic Models for Parsing Images Xiaofeng Ren University of California, Berkeley
2
Parsing Images Tiger Grass Water Sand outdoor wildlife Tiger tail eye legs head back shadow mouse
3
A Classical View of Visual Processing Pixels & Pixel Features Contours & Regions Tiger Grass Water Sand Objects & Scenes Low-level Image Processing Mid-level Perceptual Organization High-level Recognition
4
Models for Parsing Images Pixels Contours & Regions Objects & Scenes Low-level Image Processing Mid-level Perceptual Organization High-level Recognition A unified framework incorporating all levels of abstraction
5
Probabilistic Models for Images Markov Random Fields [Geman & Geman 84] Pixels Labels very limited representational power Image restoration Edge detection Texture synthesis Segmentation Super-resolution Contour completion ……… Empirical evidence against pixel-based MRF [Ren & Malik 02]
6
Where is Structure? Our perception of structure is disrupted. We cannot efficiently reason about structure if we cannot represent it.
7
Outline Parsing Images Building a Mid-level Representation Probabilistic Models for Mid-level Vision Contour Completion Figure/Ground Organization Combining Mid- and High-level Vision Object Segmentation Finding People Conclusion & Future Work
8
Outline Parsing Images Building a Mid-level Representation Probabilistic Models for Mid-level Vision Contour Completion Figure/Ground Organization Combining Mid- and High-level Vision Object Segmentation Finding People Conclusion & Future Work
9
Local Edge Detection Use the Pb (probability of boundary) edge detector: combining local brightness, texture and color contrasts.
10
Piece-wise Linear Approximation Recursively split the boundaries (using angles) until each piece is approximately straight
11
Constrained Delaunay Triangulation (CDT) A variant of the standard Delaunay Triangulation Keeps a given set of edges in the triangulation Widely used in geometric modeling and finite elements.
13
Scale Invariance of CDT
14
The CDT Graph: Summary millions of pixels 1000 edges fast to compute scale-invariant completes gaps little loss of structure Pixels Superpixels Principle of Uniform Connectedness: use homogenous regions as entry-level units in perceptual organization. [Palmer and Rock 94] longer ranges of interaction [Ren & Malik; ICCV 2003] [Ren, Fowlkes & Malik; ICCV 2005]
15
Analogy with Natural Language Parsing Sentences & Paragraphs Phrases Words Letters Contours & Regions Objects & Scenes Pixels Contours & Regions Objects & Scenes Pixels Superpixels
16
Outline Parsing Images Building a Mid-level Representation Probabilistic Models for Mid-level Vision Contour Completion Figure/Ground Organization Combining Mid- and High-level Vision Object Segmentation Finding People Conclusion & Future Work
17
Mid-level Vision It is not low-level vision ( which can be computed independently in a local neighborhood ). It is not high-level vision ( which assumes knowledge of particular object categories & scenes ). Problems in mid-level vision Curvilinear grouping Figure/ground organization Region segmentation
18
Mid-level Vision Curvilinear grouping Figure/ground organization Region segmentation Problems in mid-level vision
19
Curvilinear Grouping Boundaries are smooth in nature! A number of associated visual phenomena Good continuation Visual completion Illusory contours
20
Beyond Local Edge Detection There is psychophysical evidence that we are approaching the limit of local edge detection Smoothness of boundaries in natural images provides an important contextual cue.
21
Inference on the CDT Graph Xe Xe {0,1} 1: boundary 0: non-boundary Estimate the marginal P(Xe) Random Field: which defines a joint probability distribution on all {Xe}
22
Conditional Random Fields (CRF) Edge potentials exp( i i ) Junction potentials exp( j j ) [Pietra, Pietra & Lafferty 97] [Lafferty, McCallum & Pereira 01] where X={X 1,X 2,…,X m } Undirected graphical model with potential functions in the exponential family
23
Edge Potential: Local Contrast potentials exp( i i ) = average contrast on each edge e
24
Junction Potential: Degree Xe The degree of the junction depends on the assignments of {Xe} deg=0 (no lines) 0 0 0 deg=1 (line ending) 1 0 0 deg=2 (continuation) 1 0 1 deg=3 (T-junction) 1 1 1 j = ( deg=j ) potentials exp( j j )
25
Junction Potential: Continuity deg=2 (continuation) 1 0 1 = g( )· ( deg=2 )
26
Learning the Parameters 2.460.871.140.01 mid-level representation + probabilistic framework + large annotated datasets Compare to [Geman and Geman 84]
27
Evaluation: Precision vs Recall Precision Recall match to groundtruth Precision = matched pairs total detectionstotal groundtruth Recall = matched pairs High threshold; few detections Low threshold; lots of detections
28
Curvilinear grouping improves boundary detection, both for low-recall and high-recall Horse dataset of [Borenstein and Ullman 02], 175 images training, 175 testing “Mid-level vision is useful” [Ren, Fowlkes & Malik; ICCV 2005]
29
ImagePbCRF
30
ImagePbCRF
31
Mid-level Vision Curvilinear grouping Figure/ground organization Region segmentation Problems in mid-level vision
32
Mid-level Vision Curvilinear grouping Figure/ground organization Region segmentation Problems in mid-level vision
33
Figure/Ground Organization A contour belongs to one of the two (but not both) abutting regions. Figure (face) Ground (shapeless) Figure (Goblet) Ground (Shapeless) Important for the perception of shape
34
Inference on the CDT Graph Xe Xe {-1,1} 1: Left is Figure -1: Right is Figure Local Model: Convexity, Parallelism,… Global Model: Consistency at T-junctions
35
Results Chance50.0% Baseline Size/Convexity55.6% Local Shapemes64.8% Averaging shapemes on segmentation boundaries 72.0% Shapemes + CRF78.3% Dataset Consistency88.0% Using human segmentations [Ren, Fowlkes & Malik; ECCV 2006]
36
Models for Contour Labeling Tiger Grass Water Sand Labels {Xe} Curvilinear Grouping Figure/Ground Assignment Contours & Regions Objects & Scenes Pixels Superpixels CRF
37
Line Labeling > : contour direction + : convex edge - : concave edge Reviving the old tradition with modern technologies, for more realistic applications possible junctions (constraints) CSP [Clowes 1971, Huffman 1971; Waltz 1972]
38
Parsing Images Tiger Grass Water Sand Add region-based variables and cues Joint contour and region inference Add high-level knowledge (objects) Contours & Regions Objects & Scenes Pixels Superpixels
39
Object Segmentation … Object-specific cues: Shape Region support Color/Texture …
40
Inference on the CDT Graph Xe Yt Z Contour variables{Xe} Region variables{Yt} Object variable{Z} Integrating {Xe},{Yt} and{Z}: low/mid/high-level cues Xe Yt Z Encoding location, scale, pose, etc.
41
Grouping Cues Low-level Cues Edge energy along edge e Brightness/texture similarity between two regions s and t Mid-level Cues Edge collinearity and junction frequency at vertex V Consistency between edge e and two adjoining regions s and t High-level Cues Texture similarity of region t to exemplars Compatibility of region support with pose Compatibility of local edge shape with pose Low-level Cues Edge energy along edge e Brightness/texture similarity between two regions s and t Mid-level Cues Edge collinearity and junction frequency at vertex V Consistency between edge e and two adjoining regions s and t High-level Cues Texture similarity of region t to exemplars Compatibility of region support with pose Compatibility of local edge shape with pose L 1 (X e |I) L 2 (Y s,Y t |I) M 1 (X V |I) M 2 (X e,Y s,Y t ) H 1 (Y t |I) H 2 (Y t,Z|I) H 3 (X e,Z|I)
42
Cue Integration in CRF Estimate the marginal posteriors of X, Y and Z
43
Object knowledge helps a lot Mid-level Cues still useful [Ren, Fowlkes & Malik; NIPS 2005]
44
InputInput PbOutput ContourOutput Figure
45
InputInput PbOutput ContourOutput Figure
46
Finding People The challenges: Pose articulation + self-occlusion Clothing Lighting Clutter ……
47
Finding People: Top-Down Objects & Scenes Pixels Top-down approaches 3D model-based fails most of the time 2D template-based needs lots of training data Contours & Regions Objects & Scenes Pixels Superpixels
48
Finding People: Bottom-Up Objects & Scenes Pixels Objects & Scenes Pixels Superpixels Contours & Regions Pixels Superpixels Contours & Regions Objects & Scenes Pixels Superpixels
49
[Ren, Berg & Malik; ICCV 2005]
51
Tracking People as Blobs Blob tracking != Rectangle tracking … k-1, k, k+1, … Figure/Ground Segmentation Object Background Appearance Model Temporal Coherence
54
Preliminary Results Tracking = Repeated Segmentation (video)
55
Conclusion Constrained Delaunay Triangulation (CDT) Conditional Random Fields (CRF) Quantitative evaluations Integration of mid-level with high-level vision
56
Future Work Contours & Regions Objects & Scenes Pixels Superpixels A richer and more consistent mid-level representation Higher-order potential functions Using mid-level representation for general object recognition A high-fidelity tracking system Finding people in static images
57
Thank You
58
Acknowledgements Joint work with Charless Fowlkes, Alex Berg, and Jitendra Malik. References X. Ren, C. Fowlkes and J. Malik. Figure/Ground Assignment in Natural Images. In ECCV 2006. X. Ren, C. Fowlkes and J. Malik. Cue Integration in Figure/Ground Labeling. In NIPS 2005. X. Ren, A. Berg and J. Malik. Recovering Human Body Configurations using Pairwise Constraints between Parts. In ICCV 2005. X. Ren, C. Fowlkes and J. Malik. Scale-Invariant Contour Completion using Conditional Random Fields. In ICCV 2005. X. Ren and J. Malik. Learning a Classification Model for Segmentation. In ICCV 2003. X. Ren and J. Malik. A Multi-Scale Probability Model for Contour Completion based on Image Statistics. In ECCV 2002.
62
Finding People from Bottom-Up Detecting parts Superpixels Assembling parts Integer Quadratic Programming (IQP) Objects & Scenes Pixels Superpixels
63
Finding People in Video Contours & Regions Pixels Superpixels Additional information: Motion Appearance Temporal consistency How much can we do without object model (blob tracking)? ……
65
I stand at the window and see a house, trees, sky. Theoretically I might say there were 327 brightnesses and nuances of colour. Do I have "327"? No. I have sky, house, and trees. ---- Max Wertheimer, 1923
66
Learning the Parameters Maximum-likelihood estimation in CRF Let denote the groundtruth labeling on the CDT graph Maximum-likelihood estimation in CRF Let denote the groundtruth labeling on the CDT graph Gradient descent works well
67
Global Consistency F G F F G G common F G F G G F uncommon Use junction potentials to encode junction type
68
Image GroundtruthLocalGlobal
69
Results Chance50.0% Baseline Size/ConvexityN/A Local Shapemes64.9% Averaging shapemes on segmentation boundaries 66.5% Shapemes + CRF68.9% Dataset Consistency88.0% Without human segmentations
70
Image PbLocalGlobal
71
Outline Parsing Images Building a Mid-level Representation Probabilistic Models for Mid-level Vision Contour Completion Figure/Ground Organization Combining Mid- and High-level Vision Object Segmentation Finding People Conclusion & Future Work
72
Detecting Parts: CDT Candidate parts as parallel line segments (Ebenbreite) Automatic scale selection from bottom-up Feature combination with a logistic classifier Candidate parts as parallel line segments (Ebenbreite) Automatic scale selection from bottom-up Feature combination with a logistic classifier
73
Assembling Parts: IQP Candidates {C i } Parts {L j } ( L j1,C i1 = (L j1 ) ) ( L j2,C i2 = (L j2 ) ) Cost for a partial assignment {(L j1,C i1 ), (L j2,C i2 )}: assignment
74
Testing the Markov Assumption The Markov Model for Contours: Curvature = white noise (independent) Tangent direction t = random walk P( t(s+1) | t(s),…) = P( t(s+1) | t(s) ) Dynamic Programming t(s) t(s+1) s s+1 [Mumford 1994, Williams & Jacobs 1995]
75
Testing the Markov Assumption Segment the contours at high-curvature positions If the Markov assumption holds, Each step, a high curvature event happens w/ probability p; High curvature events are independent from step to step; Therefore if L is the length of contour segment between high curvature points, P(L=k) = p(1-p) k
76
Berkeley Segmentation Dataset [Martin, Fowlkes, Tal and Malik, ICCV 2001] 1,000 images, >14,000 segmentations
77
Exponential vs Power Law Contour segment length L Probability Power Law Scale Invariance Markov Assumption Exponential Law
78
Scale Invariance Arbitrary viewing distance Hierarchy of Parts Finger Leg Torso
79
A Scale-Invariant Representation Tiger Grass Water Sand Scale Space Re-scale ? A scale-invariant representation for contours
80
Gap-Filling Property of CDT A typical scenario of contour completion low contrast high contrast CDT picks the “ right ” edge, completing the gap
81
No Loss of Structure Use P human the soft groundtruth label defined on CDT graphs: precision close to 100% Pb averaged over CDT edges: no worse than the orignal Pb Increase in asymptotic recall rate: completion of gradientless contours
82
Uniform Connectedness Connected regions of homogeneous properties (brightness, color, texture) are perceived as entry-level units. [Palmer & Rock, 1994] “Classical principles of grouping operate after UC, creating superordinate units consisting of two or more entry-level units.” “… UC (uniform connectedness) cannot be reduced to grouping principles, because it is not a form of grouping at all…”
83
Local Model “Bi-gram” model: contrast + continuity binary classification (0,0) vs (1,1) logistic classifier “Tri-gram” model: 11 22 LL Pb L = Xe
84
Building a CRF Model What are the features? edge features: low-level “ edgeness ” (Pb) junction features: Junction type Continuity How to make inference? Loopy Belief Propagation How to learn the parameters? Gradient Descent on Max. Likelihood What are the features? edge features: low-level “ edgeness ” (Pb) junction features: Junction type Continuity How to make inference? Loopy Belief Propagation How to learn the parameters? Gradient Descent on Max. Likelihood X={X 1,X 2,…,X m } Estimate P(X i | )
85
Junction and Continuity Junction types (deg g,deg c ): deg g =1,deg c =0deg g =0,deg c =2 deg g =1,deg c =2 Continuity term for degree-2 junctions deg g +deg c =2 deg g =0,deg c =0
86
Interpreting the Parameters =2.46 =0.87 =1.14 =0.01 =-0.59 =-0.98 Line endings and junctions are rare Completed edges are weak
87
Continuity improves boundary detection in both low-recall and high-recall ranges Global inference helps; mostly in low-recall/high-precision Roughly speaking, CRF>Local>CDT only>Pb
90
ImagePbLocalGlobal
91
Figure/Ground Principles Convexity Parallelism Surroundedness Symmetry Common Fate Familiar Configuration …… F G F G G
92
Figure/Ground Dataset
93
Figure/Ground Assignment in Natural Images Local Model Use shapemes (prototypical local shapes) to capture contextual information Global Model Use CRF to enforce consistency at junctions
94
Shapemes: Prototypical Local Shapes …… local shapes collect cluster Average shape in each shapeme cluster
95
Shapemes for F/G Discrimination LR L:93.84% L:49.80% L:89.59% L:11.69% L:66.52% L: 4.98% Which side is Figure? Train a logistic classifer to linearly combine the shapeme cues
96
CRF for Figure/Ground F={F 1,F 2,…,F m } F i {Left,Right} Put potential functions at junctions One feature for each junction type F G F F G G F G F G G F F G F G { (F,G),(G,F),(F,G) } { (G,F),(F,G) } { (F,G),(F,G),(F,G) }
97
Results
98
CDT vs K-Neighbor An alternative scheme for completion: connect to k-nearest neighbor vertices, subject to visibility CDT achieves higher asymptotic recall rates
99
Inference w/ Belief Propagation Loopy Belief Propagation just like belief propagation iterates message passing until convergence lack of theoretical foundations and known to have convergence issues however becoming popular in practice typically applied on pixel-grid Works well on CDT graphs converges fast (<10 iterations) produces empirically sound results Loopy Belief Propagation just like belief propagation iterates message passing until convergence lack of theoretical foundations and known to have convergence issues however becoming popular in practice typically applied on pixel-grid Works well on CDT graphs converges fast (<10 iterations) produces empirically sound results
100
Shape Context Count the number of edge points inside each bin “log-polar” count=4 count=6 [Belongie, Malik & Punicha, ICCV 2001] [Berg & Malik, CVPR 2001]
101
Compare to DDMCMC We try to solve the same problem A unified framework for image parsing Mid-level representation CDT vs “atomic regions” Probabilistic Model Discriminative vs generative Inference mechanism Belief propagation vs MCMC Quantitative evaluation We try to develop models step by step
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.