Probabilistic Models for Parsing Images Xiaofeng Ren University of California, Berkeley.

Probabilistic Models for Parsing Images Xiaofeng Ren University of California, Berkeley

Parsing Images Tiger Grass Water Sand outdoor wildlife Tiger tail eye legs head back shadow mouse

A Classical View of Visual Processing Pixels & Pixel Features Contours & Regions Tiger Grass Water Sand Objects & Scenes Low-level Image Processing Mid-level Perceptual Organization High-level Recognition

Models for Parsing Images Pixels Contours & Regions Objects & Scenes Low-level Image Processing Mid-level Perceptual Organization High-level Recognition A unified framework incorporating all levels of abstraction

Probabilistic Models for Images  Markov Random Fields [Geman & Geman 84] Pixels Labels very limited representational power  Image restoration  Edge detection  Texture synthesis  Segmentation  Super-resolution  Contour completion ……… Empirical evidence against pixel-based MRF [Ren & Malik 02]

Where is Structure? Our perception of structure is disrupted. We cannot efficiently reason about structure if we cannot represent it.

Outline  Parsing Images  Building a Mid-level Representation  Probabilistic Models for Mid-level Vision  Contour Completion  Figure/Ground Organization  Combining Mid- and High-level Vision  Object Segmentation  Finding People  Conclusion & Future Work

Local Edge Detection  Use the Pb (probability of boundary) edge detector: combining local brightness, texture and color contrasts.

Piece-wise Linear Approximation  Recursively split the boundaries (using angles) until each piece is approximately straight

Constrained Delaunay Triangulation (CDT)  A variant of the standard Delaunay Triangulation  Keeps a given set of edges in the triangulation  Widely used in geometric modeling and finite elements.

Scale Invariance of CDT

The CDT Graph: Summary millions of pixels  1000 edges fast to compute scale-invariant completes gaps little loss of structure Pixels Superpixels Principle of Uniform Connectedness: use homogenous regions as entry-level units in perceptual organization. [Palmer and Rock 94] longer ranges of interaction [Ren & Malik; ICCV 2003] [Ren, Fowlkes & Malik; ICCV 2005]

Analogy with Natural Language Parsing Sentences & Paragraphs Phrases Words Letters Contours & Regions Objects & Scenes Pixels Contours & Regions Objects & Scenes Pixels Superpixels

Mid-level Vision  It is not low-level vision ( which can be computed independently in a local neighborhood ).  It is not high-level vision ( which assumes knowledge of particular object categories & scenes ).  Problems in mid-level vision Curvilinear grouping Figure/ground organization Region segmentation

Mid-level Vision Curvilinear grouping Figure/ground organization Region segmentation  Problems in mid-level vision

Curvilinear Grouping  Boundaries are smooth in nature!  A number of associated visual phenomena Good continuation Visual completion Illusory contours

Beyond Local Edge Detection  There is psychophysical evidence that we are approaching the limit of local edge detection  Smoothness of boundaries in natural images provides an important contextual cue.

Inference on the CDT Graph Xe Xe  {0,1} 1: boundary 0: non-boundary Estimate the marginal P(Xe) Random Field: which defines a joint probability distribution on all {Xe}

Conditional Random Fields (CRF) Edge potentials exp(  i  i ) Junction potentials exp(  j  j ) [Pietra, Pietra & Lafferty 97] [Lafferty, McCallum & Pereira 01] where X={X 1,X 2,…,X m } Undirected graphical model with potential functions in the exponential family

Edge Potential: Local Contrast potentials exp(  i  i )  = average contrast on each edge e

Junction Potential: Degree Xe The degree of the junction depends on the assignments of {Xe} deg=0 (no lines) 0 0 0 deg=1 (line ending) 1 0 0 deg=2 (continuation) 1 0 1 deg=3 (T-junction) 1 1 1  j =  ( deg=j ) potentials exp(  j  j )

Junction Potential: Continuity deg=2 (continuation) 1 0 1   = g(  )·  ( deg=2 )

Learning the Parameters 2.460.871.140.01 mid-level representation + probabilistic framework + large annotated datasets Compare to [Geman and Geman 84]

Evaluation: Precision vs Recall Precision Recall match to groundtruth Precision = matched pairs total detectionstotal groundtruth Recall = matched pairs High threshold; few detections Low threshold; lots of detections

Curvilinear grouping improves boundary detection, both for low-recall and high-recall Horse dataset of [Borenstein and Ullman 02], 175 images training, 175 testing “Mid-level vision is useful” [Ren, Fowlkes & Malik; ICCV 2005]

ImagePbCRF

Mid-level Vision Curvilinear grouping Figure/ground organization Region segmentation  Problems in mid-level vision

Figure/Ground Organization  A contour belongs to one of the two (but not both) abutting regions. Figure (face) Ground (shapeless) Figure (Goblet) Ground (Shapeless) Important for the perception of shape

Inference on the CDT Graph Xe Xe  {-1,1} 1: Left is Figure -1: Right is Figure Local Model: Convexity, Parallelism,… Global Model: Consistency at T-junctions

Results Chance50.0% Baseline Size/Convexity55.6% Local Shapemes64.8% Averaging shapemes on segmentation boundaries 72.0% Shapemes + CRF78.3% Dataset Consistency88.0% Using human segmentations [Ren, Fowlkes & Malik; ECCV 2006]

Models for Contour Labeling Tiger Grass Water Sand Labels {Xe} Curvilinear Grouping Figure/Ground Assignment Contours & Regions Objects & Scenes Pixels Superpixels CRF

Line Labeling > : contour direction + : convex edge - : concave edge  Reviving the old tradition with modern technologies, for more realistic applications possible junctions (constraints) CSP [Clowes 1971, Huffman 1971; Waltz 1972]

Parsing Images Tiger Grass Water Sand  Add region-based variables and cues  Joint contour and region inference  Add high-level knowledge (objects) Contours & Regions Objects & Scenes Pixels Superpixels

Object Segmentation … Object-specific cues:  Shape  Region support  Color/Texture …

Inference on the CDT Graph Xe Yt Z Contour variables{Xe} Region variables{Yt} Object variable{Z} Integrating {Xe},{Yt} and{Z}: low/mid/high-level cues Xe Yt Z Encoding location, scale, pose, etc.

Grouping Cues  Low-level Cues  Edge energy along edge e  Brightness/texture similarity between two regions s and t  Mid-level Cues  Edge collinearity and junction frequency at vertex V  Consistency between edge e and two adjoining regions s and t  High-level Cues  Texture similarity of region t to exemplars  Compatibility of region support with pose  Compatibility of local edge shape with pose  Low-level Cues  Edge energy along edge e  Brightness/texture similarity between two regions s and t  Mid-level Cues  Edge collinearity and junction frequency at vertex V  Consistency between edge e and two adjoining regions s and t  High-level Cues  Texture similarity of region t to exemplars  Compatibility of region support with pose  Compatibility of local edge shape with pose L 1 (X e |I) L 2 (Y s,Y t |I) M 1 (X V |I) M 2 (X e,Y s,Y t ) H 1 (Y t |I) H 2 (Y t,Z|I) H 3 (X e,Z|I)

Cue Integration in CRF Estimate the marginal posteriors of X, Y and Z

Object knowledge helps a lot Mid-level Cues still useful [Ren, Fowlkes & Malik; NIPS 2005]

InputInput PbOutput ContourOutput Figure

Finding People The challenges:  Pose articulation + self-occlusion  Clothing  Lighting  Clutter ……

Finding People: Top-Down Objects & Scenes Pixels Top-down approaches  3D model-based fails most of the time  2D template-based needs lots of training data Contours & Regions Objects & Scenes Pixels Superpixels

Finding People: Bottom-Up Objects & Scenes Pixels Objects & Scenes Pixels Superpixels Contours & Regions Pixels Superpixels Contours & Regions Objects & Scenes Pixels Superpixels

[Ren, Berg & Malik; ICCV 2005]

Tracking People as Blobs  Blob tracking != Rectangle tracking … k-1, k, k+1, … Figure/Ground Segmentation Object Background Appearance Model Temporal Coherence

Preliminary Results Tracking = Repeated Segmentation (video)

Conclusion  Constrained Delaunay Triangulation (CDT)  Conditional Random Fields (CRF)  Quantitative evaluations  Integration of mid-level with high-level vision

Future Work Contours & Regions Objects & Scenes Pixels Superpixels  A richer and more consistent mid-level representation  Higher-order potential functions  Using mid-level representation for general object recognition  A high-fidelity tracking system  Finding people in static images

Thank You

 Acknowledgements  Joint work with Charless Fowlkes, Alex Berg, and Jitendra Malik.  References  X. Ren, C. Fowlkes and J. Malik. Figure/Ground Assignment in Natural Images. In ECCV 2006.  X. Ren, C. Fowlkes and J. Malik. Cue Integration in Figure/Ground Labeling. In NIPS 2005.  X. Ren, A. Berg and J. Malik. Recovering Human Body Configurations using Pairwise Constraints between Parts. In ICCV 2005.  X. Ren, C. Fowlkes and J. Malik. Scale-Invariant Contour Completion using Conditional Random Fields. In ICCV 2005.  X. Ren and J. Malik. Learning a Classification Model for Segmentation. In ICCV 2003.  X. Ren and J. Malik. A Multi-Scale Probability Model for Contour Completion based on Image Statistics. In ECCV 2002.

Finding People from Bottom-Up  Detecting parts  Superpixels  Assembling parts  Integer Quadratic Programming (IQP) Objects & Scenes Pixels Superpixels

Finding People in Video Contours & Regions Pixels Superpixels  Additional information:  Motion  Appearance  Temporal consistency  How much can we do without object model (blob tracking)? ……

I stand at the window and see a house, trees, sky. Theoretically I might say there were 327 brightnesses and nuances of colour. Do I have "327"? No. I have sky, house, and trees. ---- Max Wertheimer, 1923

Learning the Parameters  Maximum-likelihood estimation in CRF Let denote the groundtruth labeling on the CDT graph  Maximum-likelihood estimation in CRF Let denote the groundtruth labeling on the CDT graph  Gradient descent works well

Global Consistency F G F F G G common F G F G G F uncommon Use junction potentials to encode junction type

Image GroundtruthLocalGlobal

Results Chance50.0% Baseline Size/ConvexityN/A Local Shapemes64.9% Averaging shapemes on segmentation boundaries 66.5% Shapemes + CRF68.9% Dataset Consistency88.0% Without human segmentations

Image PbLocalGlobal

Detecting Parts: CDT  Candidate parts as parallel line segments (Ebenbreite)  Automatic scale selection from bottom-up  Feature combination with a logistic classifier  Candidate parts as parallel line segments (Ebenbreite)  Automatic scale selection from bottom-up  Feature combination with a logistic classifier

Assembling Parts: IQP Candidates {C i } Parts {L j } ( L j1,C i1 =  (L j1 ) ) ( L j2,C i2 =  (L j2 ) ) Cost for a partial assignment {(L j1,C i1 ), (L j2,C i2 )}: assignment 

Testing the Markov Assumption  The Markov Model for Contours:  Curvature = white noise (independent)  Tangent direction t = random walk P( t(s+1) | t(s),…) = P( t(s+1) | t(s) )  Dynamic Programming t(s) t(s+1) s s+1 [Mumford 1994, Williams & Jacobs 1995]

Testing the Markov Assumption Segment the contours at high-curvature positions  If the Markov assumption holds,  Each step, a high curvature event happens w/ probability p;  High curvature events are independent from step to step;  Therefore if L is the length of contour segment between high curvature points, P(L=k) = p(1-p) k

Berkeley Segmentation Dataset [Martin, Fowlkes, Tal and Malik, ICCV 2001] 1,000 images, >14,000 segmentations

Exponential vs Power Law Contour segment length L Probability Power Law Scale Invariance Markov Assumption Exponential Law

Scale Invariance  Arbitrary viewing distance  Hierarchy of Parts Finger Leg Torso

A Scale-Invariant Representation Tiger Grass Water Sand Scale Space Re-scale ? A scale-invariant representation for contours

Gap-Filling Property of CDT  A typical scenario of contour completion low contrast high contrast  CDT picks the “ right ” edge, completing the gap

No Loss of Structure Use P human the soft groundtruth label defined on CDT graphs: precision close to 100% Pb averaged over CDT edges: no worse than the orignal Pb Increase in asymptotic recall rate: completion of gradientless contours

Uniform Connectedness Connected regions of homogeneous properties (brightness, color, texture) are perceived as entry-level units. [Palmer & Rock, 1994] “Classical principles of grouping operate after UC, creating superordinate units consisting of two or more entry-level units.” “… UC (uniform connectedness) cannot be reduced to grouping principles, because it is not a form of grouping at all…”

Local Model “Bi-gram” model:  contrast + continuity  binary classification (0,0) vs (1,1) logistic classifier “Tri-gram” model: 11 22 LL  Pb L = Xe

Building a CRF Model  What are the features?  edge features:  low-level “ edgeness ” (Pb)  junction features:  Junction type  Continuity  How to make inference?  Loopy Belief Propagation  How to learn the parameters?  Gradient Descent on Max. Likelihood  What are the features?  edge features:  low-level “ edgeness ” (Pb)  junction features:  Junction type  Continuity  How to make inference?  Loopy Belief Propagation  How to learn the parameters?  Gradient Descent on Max. Likelihood X={X 1,X 2,…,X m } Estimate P(X i |  )

Junction and Continuity  Junction types (deg g,deg c ): deg g =1,deg c =0deg g =0,deg c =2 deg g =1,deg c =2  Continuity term for degree-2 junctions deg g +deg c =2  deg g =0,deg c =0

Interpreting the Parameters  =2.46  =0.87  =1.14  =0.01  =-0.59  =-0.98 Line endings and junctions are rare Completed edges are weak

Continuity improves boundary detection in both low-recall and high-recall ranges Global inference helps; mostly in low-recall/high-precision Roughly speaking, CRF>Local>CDT only>Pb

ImagePbLocalGlobal

Figure/Ground Principles  Convexity  Parallelism  Surroundedness  Symmetry  Common Fate  Familiar Configuration …… F G F G G

Figure/Ground Dataset

Figure/Ground Assignment in Natural Images  Local Model  Use shapemes (prototypical local shapes) to capture contextual information  Global Model  Use CRF to enforce consistency at junctions

Shapemes: Prototypical Local Shapes …… local shapes collect cluster Average shape in each shapeme cluster

Shapemes for F/G Discrimination LR L:93.84% L:49.80% L:89.59% L:11.69% L:66.52% L: 4.98% Which side is Figure? Train a logistic classifer to linearly combine the shapeme cues

CRF for Figure/Ground F={F 1,F 2,…,F m } F i  {Left,Right} Put potential functions at junctions One feature for each junction type F G F F G G F G F G G F F G F G { (F,G),(G,F),(F,G) } { (G,F),(F,G) } { (F,G),(F,G),(F,G) }

Results

CDT vs K-Neighbor An alternative scheme for completion: connect to k-nearest neighbor vertices, subject to visibility CDT achieves higher asymptotic recall rates

Inference w/ Belief Propagation  Loopy Belief Propagation  just like belief propagation  iterates message passing until convergence  lack of theoretical foundations and known to have convergence issues  however becoming popular in practice  typically applied on pixel-grid  Works well on CDT graphs  converges fast (<10 iterations)  produces empirically sound results  Loopy Belief Propagation  just like belief propagation  iterates message passing until convergence  lack of theoretical foundations and known to have convergence issues  however becoming popular in practice  typically applied on pixel-grid  Works well on CDT graphs  converges fast (<10 iterations)  produces empirically sound results

Shape Context Count the number of edge points inside each bin “log-polar” count=4 count=6 [Belongie, Malik & Punicha, ICCV 2001] [Berg & Malik, CVPR 2001]

Compare to DDMCMC  We try to solve the same problem  A unified framework for image parsing  Mid-level representation  CDT vs “atomic regions”  Probabilistic Model  Discriminative vs generative  Inference mechanism  Belief propagation vs MCMC  Quantitative evaluation  We try to develop models step by step

Probabilistic Models for Parsing Images Xiaofeng Ren University of California, Berkeley.

Similar presentations

Presentation on theme: "Probabilistic Models for Parsing Images Xiaofeng Ren University of California, Berkeley."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Probabilistic Models for Parsing Images Xiaofeng Ren University of California, Berkeley.

Similar presentations

Presentation on theme: "Probabilistic Models for Parsing Images Xiaofeng Ren University of California, Berkeley."— Presentation transcript:

Similar presentations

About project

Feedback