Presentation is loading. Please wait.

Presentation is loading. Please wait.

Probabilistic Models for Parsing Images Xiaofeng Ren University of California, Berkeley.

Similar presentations


Presentation on theme: "Probabilistic Models for Parsing Images Xiaofeng Ren University of California, Berkeley."— Presentation transcript:

1 Probabilistic Models for Parsing Images Xiaofeng Ren University of California, Berkeley

2 Parsing Images Tiger Grass Water Sand outdoor wildlife Tiger tail eye legs head back shadow mouse

3 A Classical View of Visual Processing Pixels & Pixel Features Contours & Regions Tiger Grass Water Sand Objects & Scenes Low-level Image Processing Mid-level Perceptual Organization High-level Recognition

4 Models for Parsing Images Pixels Contours & Regions Objects & Scenes Low-level Image Processing Mid-level Perceptual Organization High-level Recognition A unified framework incorporating all levels of abstraction

5 Probabilistic Models for Images  Markov Random Fields [Geman & Geman 84] Pixels Labels very limited representational power  Image restoration  Edge detection  Texture synthesis  Segmentation  Super-resolution  Contour completion ……… Empirical evidence against pixel-based MRF [Ren & Malik 02]

6 Where is Structure? Our perception of structure is disrupted. We cannot efficiently reason about structure if we cannot represent it.

7 Outline  Parsing Images  Building a Mid-level Representation  Probabilistic Models for Mid-level Vision  Contour Completion  Figure/Ground Organization  Combining Mid- and High-level Vision  Object Segmentation  Finding People  Conclusion & Future Work

8 Outline  Parsing Images  Building a Mid-level Representation  Probabilistic Models for Mid-level Vision  Contour Completion  Figure/Ground Organization  Combining Mid- and High-level Vision  Object Segmentation  Finding People  Conclusion & Future Work

9 Local Edge Detection  Use the Pb (probability of boundary) edge detector: combining local brightness, texture and color contrasts.

10 Piece-wise Linear Approximation  Recursively split the boundaries (using angles) until each piece is approximately straight

11 Constrained Delaunay Triangulation (CDT)  A variant of the standard Delaunay Triangulation  Keeps a given set of edges in the triangulation  Widely used in geometric modeling and finite elements.

12

13 Scale Invariance of CDT

14 The CDT Graph: Summary millions of pixels  1000 edges fast to compute scale-invariant completes gaps little loss of structure Pixels Superpixels Principle of Uniform Connectedness: use homogenous regions as entry-level units in perceptual organization. [Palmer and Rock 94] longer ranges of interaction [Ren & Malik; ICCV 2003] [Ren, Fowlkes & Malik; ICCV 2005]

15 Analogy with Natural Language Parsing Sentences & Paragraphs Phrases Words Letters Contours & Regions Objects & Scenes Pixels Contours & Regions Objects & Scenes Pixels Superpixels

16 Outline  Parsing Images  Building a Mid-level Representation  Probabilistic Models for Mid-level Vision  Contour Completion  Figure/Ground Organization  Combining Mid- and High-level Vision  Object Segmentation  Finding People  Conclusion & Future Work

17 Mid-level Vision  It is not low-level vision ( which can be computed independently in a local neighborhood ).  It is not high-level vision ( which assumes knowledge of particular object categories & scenes ).  Problems in mid-level vision Curvilinear grouping Figure/ground organization Region segmentation

18 Mid-level Vision Curvilinear grouping Figure/ground organization Region segmentation  Problems in mid-level vision

19 Curvilinear Grouping  Boundaries are smooth in nature!  A number of associated visual phenomena Good continuation Visual completion Illusory contours

20 Beyond Local Edge Detection  There is psychophysical evidence that we are approaching the limit of local edge detection  Smoothness of boundaries in natural images provides an important contextual cue.

21 Inference on the CDT Graph Xe Xe  {0,1} 1: boundary 0: non-boundary Estimate the marginal P(Xe) Random Field: which defines a joint probability distribution on all {Xe}

22 Conditional Random Fields (CRF) Edge potentials exp(  i  i ) Junction potentials exp(  j  j ) [Pietra, Pietra & Lafferty 97] [Lafferty, McCallum & Pereira 01] where X={X 1,X 2,…,X m } Undirected graphical model with potential functions in the exponential family

23 Edge Potential: Local Contrast potentials exp(  i  i )  = average contrast on each edge e

24 Junction Potential: Degree Xe The degree of the junction depends on the assignments of {Xe} deg=0 (no lines) 0 0 0 deg=1 (line ending) 1 0 0 deg=2 (continuation) 1 0 1 deg=3 (T-junction) 1 1 1  j =  ( deg=j ) potentials exp(  j  j )

25 Junction Potential: Continuity deg=2 (continuation) 1 0 1   = g(  )·  ( deg=2 )

26 Learning the Parameters 2.460.871.140.01 mid-level representation + probabilistic framework + large annotated datasets Compare to [Geman and Geman 84]

27 Evaluation: Precision vs Recall Precision Recall match to groundtruth Precision = matched pairs total detectionstotal groundtruth Recall = matched pairs High threshold; few detections Low threshold; lots of detections

28 Curvilinear grouping improves boundary detection, both for low-recall and high-recall Horse dataset of [Borenstein and Ullman 02], 175 images training, 175 testing “Mid-level vision is useful” [Ren, Fowlkes & Malik; ICCV 2005]

29 ImagePbCRF

30 ImagePbCRF

31 Mid-level Vision Curvilinear grouping Figure/ground organization Region segmentation  Problems in mid-level vision

32 Mid-level Vision Curvilinear grouping Figure/ground organization Region segmentation  Problems in mid-level vision

33 Figure/Ground Organization  A contour belongs to one of the two (but not both) abutting regions. Figure (face) Ground (shapeless) Figure (Goblet) Ground (Shapeless) Important for the perception of shape

34 Inference on the CDT Graph Xe Xe  {-1,1} 1: Left is Figure -1: Right is Figure Local Model: Convexity, Parallelism,… Global Model: Consistency at T-junctions

35 Results Chance50.0% Baseline Size/Convexity55.6% Local Shapemes64.8% Averaging shapemes on segmentation boundaries 72.0% Shapemes + CRF78.3% Dataset Consistency88.0% Using human segmentations [Ren, Fowlkes & Malik; ECCV 2006]

36 Models for Contour Labeling Tiger Grass Water Sand Labels {Xe} Curvilinear Grouping Figure/Ground Assignment Contours & Regions Objects & Scenes Pixels Superpixels CRF

37 Line Labeling > : contour direction + : convex edge - : concave edge  Reviving the old tradition with modern technologies, for more realistic applications possible junctions (constraints) CSP [Clowes 1971, Huffman 1971; Waltz 1972]

38 Parsing Images Tiger Grass Water Sand  Add region-based variables and cues  Joint contour and region inference  Add high-level knowledge (objects) Contours & Regions Objects & Scenes Pixels Superpixels

39 Object Segmentation … Object-specific cues:  Shape  Region support  Color/Texture …

40 Inference on the CDT Graph Xe Yt Z Contour variables{Xe} Region variables{Yt} Object variable{Z} Integrating {Xe},{Yt} and{Z}: low/mid/high-level cues Xe Yt Z Encoding location, scale, pose, etc.

41 Grouping Cues  Low-level Cues  Edge energy along edge e  Brightness/texture similarity between two regions s and t  Mid-level Cues  Edge collinearity and junction frequency at vertex V  Consistency between edge e and two adjoining regions s and t  High-level Cues  Texture similarity of region t to exemplars  Compatibility of region support with pose  Compatibility of local edge shape with pose  Low-level Cues  Edge energy along edge e  Brightness/texture similarity between two regions s and t  Mid-level Cues  Edge collinearity and junction frequency at vertex V  Consistency between edge e and two adjoining regions s and t  High-level Cues  Texture similarity of region t to exemplars  Compatibility of region support with pose  Compatibility of local edge shape with pose L 1 (X e |I) L 2 (Y s,Y t |I) M 1 (X V |I) M 2 (X e,Y s,Y t ) H 1 (Y t |I) H 2 (Y t,Z|I) H 3 (X e,Z|I)

42 Cue Integration in CRF Estimate the marginal posteriors of X, Y and Z

43 Object knowledge helps a lot Mid-level Cues still useful [Ren, Fowlkes & Malik; NIPS 2005]

44 InputInput PbOutput ContourOutput Figure

45 InputInput PbOutput ContourOutput Figure

46 Finding People The challenges:  Pose articulation + self-occlusion  Clothing  Lighting  Clutter ……

47 Finding People: Top-Down Objects & Scenes Pixels Top-down approaches  3D model-based fails most of the time  2D template-based needs lots of training data Contours & Regions Objects & Scenes Pixels Superpixels

48 Finding People: Bottom-Up Objects & Scenes Pixels Objects & Scenes Pixels Superpixels Contours & Regions Pixels Superpixels Contours & Regions Objects & Scenes Pixels Superpixels

49 [Ren, Berg & Malik; ICCV 2005]

50

51 Tracking People as Blobs  Blob tracking != Rectangle tracking … k-1, k, k+1, … Figure/Ground Segmentation Object Background Appearance Model Temporal Coherence

52

53

54 Preliminary Results Tracking = Repeated Segmentation (video)

55 Conclusion  Constrained Delaunay Triangulation (CDT)  Conditional Random Fields (CRF)  Quantitative evaluations  Integration of mid-level with high-level vision

56 Future Work Contours & Regions Objects & Scenes Pixels Superpixels  A richer and more consistent mid-level representation  Higher-order potential functions  Using mid-level representation for general object recognition  A high-fidelity tracking system  Finding people in static images

57 Thank You

58  Acknowledgements  Joint work with Charless Fowlkes, Alex Berg, and Jitendra Malik.  References  X. Ren, C. Fowlkes and J. Malik. Figure/Ground Assignment in Natural Images. In ECCV 2006.  X. Ren, C. Fowlkes and J. Malik. Cue Integration in Figure/Ground Labeling. In NIPS 2005.  X. Ren, A. Berg and J. Malik. Recovering Human Body Configurations using Pairwise Constraints between Parts. In ICCV 2005.  X. Ren, C. Fowlkes and J. Malik. Scale-Invariant Contour Completion using Conditional Random Fields. In ICCV 2005.  X. Ren and J. Malik. Learning a Classification Model for Segmentation. In ICCV 2003.  X. Ren and J. Malik. A Multi-Scale Probability Model for Contour Completion based on Image Statistics. In ECCV 2002.

59

60

61

62 Finding People from Bottom-Up  Detecting parts  Superpixels  Assembling parts  Integer Quadratic Programming (IQP) Objects & Scenes Pixels Superpixels

63 Finding People in Video Contours & Regions Pixels Superpixels  Additional information:  Motion  Appearance  Temporal consistency  How much can we do without object model (blob tracking)? ……

64

65 I stand at the window and see a house, trees, sky. Theoretically I might say there were 327 brightnesses and nuances of colour. Do I have "327"? No. I have sky, house, and trees. ---- Max Wertheimer, 1923

66 Learning the Parameters  Maximum-likelihood estimation in CRF Let denote the groundtruth labeling on the CDT graph  Maximum-likelihood estimation in CRF Let denote the groundtruth labeling on the CDT graph  Gradient descent works well

67 Global Consistency F G F F G G common F G F G G F uncommon Use junction potentials to encode junction type

68 Image GroundtruthLocalGlobal

69 Results Chance50.0% Baseline Size/ConvexityN/A Local Shapemes64.9% Averaging shapemes on segmentation boundaries 66.5% Shapemes + CRF68.9% Dataset Consistency88.0% Without human segmentations

70 Image PbLocalGlobal

71 Outline  Parsing Images  Building a Mid-level Representation  Probabilistic Models for Mid-level Vision  Contour Completion  Figure/Ground Organization  Combining Mid- and High-level Vision  Object Segmentation  Finding People  Conclusion & Future Work

72 Detecting Parts: CDT  Candidate parts as parallel line segments (Ebenbreite)  Automatic scale selection from bottom-up  Feature combination with a logistic classifier  Candidate parts as parallel line segments (Ebenbreite)  Automatic scale selection from bottom-up  Feature combination with a logistic classifier

73 Assembling Parts: IQP Candidates {C i } Parts {L j } ( L j1,C i1 =  (L j1 ) ) ( L j2,C i2 =  (L j2 ) ) Cost for a partial assignment {(L j1,C i1 ), (L j2,C i2 )}: assignment 

74 Testing the Markov Assumption  The Markov Model for Contours:  Curvature = white noise (independent)  Tangent direction t = random walk P( t(s+1) | t(s),…) = P( t(s+1) | t(s) )  Dynamic Programming t(s) t(s+1) s s+1 [Mumford 1994, Williams & Jacobs 1995]

75 Testing the Markov Assumption Segment the contours at high-curvature positions  If the Markov assumption holds,  Each step, a high curvature event happens w/ probability p;  High curvature events are independent from step to step;  Therefore if L is the length of contour segment between high curvature points, P(L=k) = p(1-p) k

76 Berkeley Segmentation Dataset [Martin, Fowlkes, Tal and Malik, ICCV 2001] 1,000 images, >14,000 segmentations

77 Exponential vs Power Law Contour segment length L Probability Power Law Scale Invariance Markov Assumption Exponential Law

78 Scale Invariance  Arbitrary viewing distance  Hierarchy of Parts Finger Leg Torso

79 A Scale-Invariant Representation Tiger Grass Water Sand Scale Space Re-scale ? A scale-invariant representation for contours

80 Gap-Filling Property of CDT  A typical scenario of contour completion low contrast high contrast  CDT picks the “ right ” edge, completing the gap

81 No Loss of Structure Use P human the soft groundtruth label defined on CDT graphs: precision close to 100% Pb averaged over CDT edges: no worse than the orignal Pb Increase in asymptotic recall rate: completion of gradientless contours

82 Uniform Connectedness Connected regions of homogeneous properties (brightness, color, texture) are perceived as entry-level units. [Palmer & Rock, 1994] “Classical principles of grouping operate after UC, creating superordinate units consisting of two or more entry-level units.” “… UC (uniform connectedness) cannot be reduced to grouping principles, because it is not a form of grouping at all…”

83 Local Model “Bi-gram” model:  contrast + continuity  binary classification (0,0) vs (1,1) logistic classifier “Tri-gram” model: 11 22 LL  Pb L = Xe

84 Building a CRF Model  What are the features?  edge features:  low-level “ edgeness ” (Pb)  junction features:  Junction type  Continuity  How to make inference?  Loopy Belief Propagation  How to learn the parameters?  Gradient Descent on Max. Likelihood  What are the features?  edge features:  low-level “ edgeness ” (Pb)  junction features:  Junction type  Continuity  How to make inference?  Loopy Belief Propagation  How to learn the parameters?  Gradient Descent on Max. Likelihood X={X 1,X 2,…,X m } Estimate P(X i |  )

85 Junction and Continuity  Junction types (deg g,deg c ): deg g =1,deg c =0deg g =0,deg c =2 deg g =1,deg c =2  Continuity term for degree-2 junctions deg g +deg c =2  deg g =0,deg c =0

86 Interpreting the Parameters  =2.46  =0.87  =1.14  =0.01  =-0.59  =-0.98 Line endings and junctions are rare Completed edges are weak

87 Continuity improves boundary detection in both low-recall and high-recall ranges Global inference helps; mostly in low-recall/high-precision Roughly speaking, CRF>Local>CDT only>Pb

88

89

90 ImagePbLocalGlobal

91 Figure/Ground Principles  Convexity  Parallelism  Surroundedness  Symmetry  Common Fate  Familiar Configuration …… F G F G G

92 Figure/Ground Dataset

93 Figure/Ground Assignment in Natural Images  Local Model  Use shapemes (prototypical local shapes) to capture contextual information  Global Model  Use CRF to enforce consistency at junctions

94 Shapemes: Prototypical Local Shapes …… local shapes collect cluster Average shape in each shapeme cluster

95 Shapemes for F/G Discrimination LR L:93.84% L:49.80% L:89.59% L:11.69% L:66.52% L: 4.98% Which side is Figure? Train a logistic classifer to linearly combine the shapeme cues

96 CRF for Figure/Ground F={F 1,F 2,…,F m } F i  {Left,Right} Put potential functions at junctions One feature for each junction type F G F F G G F G F G G F F G F G { (F,G),(G,F),(F,G) } { (G,F),(F,G) } { (F,G),(F,G),(F,G) }

97 Results

98 CDT vs K-Neighbor An alternative scheme for completion: connect to k-nearest neighbor vertices, subject to visibility CDT achieves higher asymptotic recall rates

99 Inference w/ Belief Propagation  Loopy Belief Propagation  just like belief propagation  iterates message passing until convergence  lack of theoretical foundations and known to have convergence issues  however becoming popular in practice  typically applied on pixel-grid  Works well on CDT graphs  converges fast (<10 iterations)  produces empirically sound results  Loopy Belief Propagation  just like belief propagation  iterates message passing until convergence  lack of theoretical foundations and known to have convergence issues  however becoming popular in practice  typically applied on pixel-grid  Works well on CDT graphs  converges fast (<10 iterations)  produces empirically sound results

100 Shape Context Count the number of edge points inside each bin “log-polar” count=4 count=6 [Belongie, Malik & Punicha, ICCV 2001] [Berg & Malik, CVPR 2001]

101 Compare to DDMCMC  We try to solve the same problem  A unified framework for image parsing  Mid-level representation  CDT vs “atomic regions”  Probabilistic Model  Discriminative vs generative  Inference mechanism  Belief propagation vs MCMC  Quantitative evaluation  We try to develop models step by step


Download ppt "Probabilistic Models for Parsing Images Xiaofeng Ren University of California, Berkeley."

Similar presentations


Ads by Google