Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Vision Group University of California Berkeley 1 Scale-Invariant Random Fields for Mid-level Vision Xiaofeng Ren, Charless Fowlkes and Jitendra.

Similar presentations


Presentation on theme: "Computer Vision Group University of California Berkeley 1 Scale-Invariant Random Fields for Mid-level Vision Xiaofeng Ren, Charless Fowlkes and Jitendra."— Presentation transcript:

1 Computer Vision Group University of California Berkeley 1 Scale-Invariant Random Fields for Mid-level Vision Xiaofeng Ren, Charless Fowlkes and Jitendra Malik University of California at Berkeley

2 Computer Vision Group University of California Berkeley 2 Overview Scale Invariance in Natural Images A Scale-Invariant Representation from bottom-up using Constrained Delaunay Triangulation (CDT) Scale-Invariant Models for Contour Completion –A baseline local model –A global model using Conditional Random Fields (CRF) –Results and Quantitative Evaluations Extensions of the CDT/CRF Framework –Image segmentation –Figure/ground organization –Finding people in a single image

3 Computer Vision Group University of California Berkeley 3 Overview Scale Invariance in Natural Images A Scale-Invariant Representation from bottom-up using Constrained Delaunay Triangulation (CDT) Scale-Invariant Models for Contour Completion –A baseline local model –A global model using Conditional Random Fields (CRF) –Results and Quantitative Evaluations Extensions of the CDT/CRF Framework –Image segmentation –Figure/ground organization –Finding people in a single image

4 Computer Vision Group University of California Berkeley 4 Scale Invariance An intuitive understanding A mathematical characterization … or ? ? X1X1 X2X2 F X1 ( · ) = F( · ; s 1 ) F X2 ( · ) = F( · ; s 2 )

5 Computer Vision Group University of California Berkeley 5 Sources of Scale Invariance Arbitrary viewing distance Hierarchy of Parts Finger Leg Torso

6 Computer Vision Group University of California Berkeley 6 Scale Invariance in Natural Images Scale Space Theory and Gaussian Pyramids –Witkin, Lindenberg, Koenderink, … Empirical Studies: Power Laws in Natural Images –Power spectra (Ruderman, 1994) –Wavelet coefficients (e.g. Simoncelli) –Probabilistic Models (e.g. Mumford) –Power laws in the statistics of boundary contours [Ren and Malik 2002]

7 Computer Vision Group University of California Berkeley 7 How to Handle Scale Invariance? Top-down Search Bottom-up Scale Selection –Scale space (e.g. zero crossing of the Laplacian) –Lowe’s SIFT operator –Our boundary-oriented approach based on piecewise approximation of edges and constrained Delaunay triangulation (CDT) … 1 2 3 4 5 6 …

8 Computer Vision Group University of California Berkeley 8 Overview Scale Invariance in Natural Images A Scale-Invariant Representation from bottom-up using Constrained Delaunay Triangulation (CDT) Scale-Invariant Models for Contour Completion –A baseline local model –A global model using Conditional Random Fields (CRF) –Results and Quantitative Evaluations Extensions of the CDT/CRF Framework –Image segmentation –Figure/ground organization –Finding people in a single image

9 Computer Vision Group University of California Berkeley 9 Low-level Contour Detection Use Pb (probability of boundary) as input –Combines local brightness, texture and color cues –Trained from human-marked segmentation boundaries –Outperforms existing local boundary detectors including Canny Canny’s hysteresis thresholding Use Pb (probability of boundary) as input –Combines local brightness, texture and color cues –Trained from human-marked segmentation boundaries –Outperforms existing local boundary detectors including Canny Canny’s hysteresis thresholding [Martin, Fowlkes & Malik 02]

10 Computer Vision Group University of California Berkeley 10 Piecewise Linear Approximation Recursively split the boundaries until each piece is approximately straight

11 Computer Vision Group University of California Berkeley 11 Delaunay Triangulation Standard in computational geometry Dual of the Voronoi Diagram Unique triangulation that maximizes the minimum angle –avoiding long skinny triangles Efficient and simple randomized algorithm Standard in computational geometry Dual of the Voronoi Diagram Unique triangulation that maximizes the minimum angle –avoiding long skinny triangles Efficient and simple randomized algorithm

12 Computer Vision Group University of California Berkeley 12 Constrained Delaunay Triangulation A variant of the standard Delaunay Triangulation Keeps a given set of edges in the triangulation A variant of the standard Delaunay Triangulation Keeps a given set of edges in the triangulation

13 Computer Vision Group University of California Berkeley 13 “Gap-Filling” Property of CDT A typical scenario of contour completion low contrast high contrast CDT picks the “right” edge, completing the gap

14 Computer Vision Group University of California Berkeley 14

15 Computer Vision Group University of California Berkeley 15 The CDT Graph scale-invariant fast to compute <1000 edges completes gaps little loss of structure

16 Computer Vision Group University of California Berkeley 16 Berkeley Segmentation Dataset [Martin et al, ICCV 2001] 1,000 images, >14,000 segmentations

17 Computer Vision Group University of California Berkeley 17 Evaluating Boundary Operators Precision-Recall Curves –threshold the output boundary map –bipartite matching with the groundtruth Precision-Recall Curves –threshold the output boundary map –bipartite matching with the groundtruth m pixels on human-marked boundaries n detected pixels above a given threshold k matched pairs Precision = k/n, percentage of true positives Recall = k/m, percentage of groundtruth being detected Project CDT edges back to the pixel-grid

18 Computer Vision Group University of California Berkeley 18 No Loss of Structure Use P human the soft groundtruth label defined on CDT graphs: precision close to 100% Pb averaged over CDT edges: no worse than the orignal Pb Increase in asymptotic recall rate: completion of gradientless contours

19 Computer Vision Group University of California Berkeley 19 CDT vs. K-Neighbor Completion An alternative scheme for completion: connect to k- nearest neighbor vertices, subject to visibility CDT achieves higher asymptotic recall rates

20 Computer Vision Group University of California Berkeley 20 The CDT Graph scale-invariant fast to compute <1000 edges completes gaps little loss of structure

21 Computer Vision Group University of California Berkeley 21 “Superpixels” Idea: use oversegmentation as preprocessing and work with homogenous regions instead of pixels. Empirical Validation [Ren and Malik 2003] How much do we lose when we go from pixels to superpixels?

22 Computer Vision Group University of California Berkeley 22 So the CDT graph is nice to work with… We can develop sophisticated models on top of the CDT graph, e.g. for Mid-level vision

23 Computer Vision Group University of California Berkeley 23 Mid-level Vision What is Mid-level Vision? –It’s not low-level vision (e.g. edge detection) –It’s not high-level vision (e.g. object recognition) region grouping /image segmentation contour grouping /visual completion figure/ground organization Key Problems in Mid-level Vision

24 Computer Vision Group University of California Berkeley 24 Mid-level Vision What is Mid-level Vision? –It’s not low-level vision (e.g. edge detection) –It’s not high-level vision (e.g. object recognition) region grouping /image segmentation contour grouping /visual completion figure/ground organization Key Problems in Mid-level Vision

25 Computer Vision Group University of California Berkeley 25 Boundary Detection Edge detection: 20 years after Canny Pb (Probability of Boundary): learning to combine brightness, color and texture contrasts There is psychophysical evidence that we might have been approaching the limit of local edge detection

26 Computer Vision Group University of California Berkeley 26 Curvilinear Continuity Boundaries are smooth in nature A number of associated phenomena –Good continuation –Visual completion –Illusory contours Well studied in human vision –Wertheimer, Kanizsa, von der Heydt, Kellman, Field, Geisler, … Extensively explored in computer vision –Shashua, Zucker, Mumford, Williams, Jacobs, Elder, Jermyn, Wang, … Is the net effect of completion positive? Or negative? Lack of quantitative evaluation Boundaries are smooth in nature A number of associated phenomena –Good continuation –Visual completion –Illusory contours Well studied in human vision –Wertheimer, Kanizsa, von der Heydt, Kellman, Field, Geisler, … Extensively explored in computer vision –Shashua, Zucker, Mumford, Williams, Jacobs, Elder, Jermyn, Wang, … Is the net effect of completion positive? Or negative? Lack of quantitative evaluation

27 Computer Vision Group University of California Berkeley 27 Overview Scale Invariance in Natural Images A Scale-Invariant Representation from bottom-up using Constrained Delaunay Triangulation (CDT) Scale-Invariant Models for Contour Completion –A baseline local model –A global model using Conditional Random Fields (CRF) –Results and Quantitative Evaluations Extensions of the CDT/CRF Framework –Image segmentation –Figure/ground organization –Finding people in a single image

28 Computer Vision Group University of California Berkeley 28 Inference on the CDT Graph Xe Local inference: Xe Global inference:

29 Computer Vision Group University of California Berkeley 29 Baseline Local Model “Bi-gram” model:  contrast + continuity  binary classification (0,0) vs (1,1) logistic classifier “Tri-gram” model: 11 22 LL  Pb L = Xe

30 Computer Vision Group University of California Berkeley 30 For each edge i, define a set of features {g 1,g 2,…,g h } Potential function exp(  i ) at edge i For each junction j, define a set of features {f 1,f 2,…,f k } Potential function exp(  j ) at juncion j X={X 1,X 2,…,X m } [Pietra, Pietra & Lafferty 97] [Lafferty, McCallum & Pereira 01] Conditional Random Fields

31 Computer Vision Group University of California Berkeley 31 X={X 1,X 2,…,X m } Conditional Random Fields Potential function on edges {exp(  i )} Potential function on junctions {exp(  j )} This defines a probability distribution over X: where Estimate P(X i |  ) an undirected graphical model with potential functions in the exponential family

32 Computer Vision Group University of California Berkeley 32 Building a CRF Model What are the features? –edge features: Pb, G –junction features: Junction type Continuity How to make inference? –Loopy Belief Propagation How to learn the parameters? –Gradient Descent on Max. Likelihood What are the features? –edge features: Pb, G –junction features: Junction type Continuity How to make inference? –Loopy Belief Propagation How to learn the parameters? –Gradient Descent on Max. Likelihood X={X 1,X 2,…,X m } Estimate P(X i |  )

33 Computer Vision Group University of California Berkeley 33 Junctions and Continuity Junction types (deg g,deg c ): deg g =1,deg c =0deg g =0,deg c =2deg g =1,deg c =2 Continuity term for degree-2 junctions deg g +deg c =2  deg g =0,deg c =0

34 Computer Vision Group University of California Berkeley 34 Inference w/ Belief Propagation Loopy Belief Propagation –just like belief propagation –iterates message passing until convergence –lack of theoretical foundations and known to have convergence issues –however becoming popular in practice –typically applied on pixel-grid Works well on CDT graphs –converges fast (<10 iterations) –produces empirically sound results Loopy Belief Propagation –just like belief propagation –iterates message passing until convergence –lack of theoretical foundations and known to have convergence issues –however becoming popular in practice –typically applied on pixel-grid Works well on CDT graphs –converges fast (<10 iterations) –produces empirically sound results

35 Computer Vision Group University of California Berkeley 35 Maximum Likelihood Learning Maximum-likelihood estimation in CRF Let denote the groundtruth labeling on the CDT graph Maximum-likelihood estimation in CRF Let denote the groundtruth labeling on the CDT graph Many possible optimization techniques –gradient descent, iterative scaling, conjugate gradient, … Gradient descent works well Many possible optimization techniques –gradient descent, iterative scaling, conjugate gradient, … Gradient descent works well

36 Computer Vision Group University of California Berkeley 36 Interpreting the Parameters  =2.46  =0.87  =1.14  =0.01  =-0.59  =-0.98 Line endings and junctions are rare Completed edges are weak

37 Computer Vision Group University of California Berkeley 37 Quantitative Evaluation Baseball player dataset [Mori et al 04] –30 news photos of baseball players in various poses, 15 training and 15 testing Horse dataset [Borenstein & Ullman 02] –350 images of standing horses facing left, 175 training and 175 testing Berkeley Segmentation Dataset [Martin et al 01] –300 Corel images of various natural scenes and ~2500 segmentations, 200 training and 100 testing Baseball player dataset [Mori et al 04] –30 news photos of baseball players in various poses, 15 training and 15 testing Horse dataset [Borenstein & Ullman 02] –350 images of standing horses facing left, 175 training and 175 testing Berkeley Segmentation Dataset [Martin et al 01] –300 Corel images of various natural scenes and ~2500 segmentations, 200 training and 100 testing

38 Computer Vision Group University of California Berkeley 38 Continuity improves boundary detection in both low-recall and high-recall ranges Global inference helps; mostly in low-recall/high-precision Roughly speaking, CRF>Local>CDT only>Pb

39 Computer Vision Group University of California Berkeley 39

40 Computer Vision Group University of California Berkeley 40

41 Computer Vision Group University of California Berkeley 41 ImagePbLocalGlobal

42 Computer Vision Group University of California Berkeley 42 ImagePbLocalGlobal

43 Computer Vision Group University of California Berkeley 43 ImagePbLocalGlobal

44 Computer Vision Group University of California Berkeley 44 Conclusion I Constrained Delaunay Triangulation is a scale-invariant discretization of images with little loss of structure; Moving from 100,000 pixels to <1000 edges, CDT achieves great statistical and computational efficiency; Curvilinear Continuity improves boundary detection; –the local model of continuity is simple yet very effective –global inference of continuity further improves performance –Conditional Random Fields w/ loopy belief propagation works well on CDT graphs Curvilinear Continuity improves boundary detection; –the local model of continuity is simple yet very effective –global inference of continuity further improves performance –Conditional Random Fields w/ loopy belief propagation works well on CDT graphs Mid-level vision is useful.

45 Computer Vision Group University of California Berkeley 45 Overview Scale Invariance in Natural Images A Scale-Invariant Representation from bottom-up using Constrained Delaunay Triangulation (CDT) Scale-Invariant Models for Contour Completion –A baseline local model –A global model using Conditional Random Fields (CRF) –Results and Quantitative Evaluations Extensions of the CDT/CRF Framework –Image segmentation –Figure/ground organization –Finding people in a single image

46 Computer Vision Group University of California Berkeley 46 Overview Scale Invariance in Natural Images A Scale-Invariant Representation from bottom-up using Constrained Delaunay Triangulation (CDT) Scale-Invariant Models for Contour Completion –A baseline local model –A global model using Conditional Random Fields (CRF) –Results and Quantitative Evaluations Extensions of the CDT/CRF Framework –Image segmentation –Figure/ground organization –Finding people in a single image

47 Computer Vision Group University of California Berkeley 47 Summary CRF Conditional Random Fields on triangulated images, trained to integrate low/mid/high-level grouping cues

48 Computer Vision Group University of California Berkeley 48 Inference on the CDT Graph Xe Yt Z Contour variables{Xe} Region variables{Yt} Object variable{Z} Integrating {Xe},{Yt} and{Z}: low/mid/high-level cues Xe Yt Z

49 Computer Vision Group University of California Berkeley 49 Grouping Cues Low-level Cues –Edge energy along edge e –Brightness/texture similarity between two regions s and t Mid-level Cues –Edge collinearity and junction frequency at vertex V –Consistency between edge e and two adjoining regions s and t High-level Cues –Texture similarity of region t to exemplars –Compatibility of region support with pose –Compatibility of local edge shape with pose Low-level Cues –Edge energy along edge e –Brightness/texture similarity between two regions s and t Mid-level Cues –Edge collinearity and junction frequency at vertex V –Consistency between edge e and two adjoining regions s and t High-level Cues –Texture similarity of region t to exemplars –Compatibility of region support with pose –Compatibility of local edge shape with pose L 1 (X e |I) L 2 (Y s,Y t |I) M 1 (X V |I) M 2 (X e,Y s,Y t ) H 1 (Y t |I) H 2 (Y t,Z|I) H 3 (X e,Z|I)

50 Computer Vision Group University of California Berkeley 50 Conditional Random Fields for Cue Integration Estimate the marginal posteriors of X, Y and Z

51 Computer Vision Group University of California Berkeley 51 Encoding Object Knowledge (Region-based) Support Mask Yt Z (Edge-based) Shapemes Xe Z

52 Computer Vision Group University of California Berkeley 52 H 3 (X e,Z|I): local shape and pose distribution ON(x,y,i) shapeme j (vertical pairs) distribution ON(x,y,j) Let S(x,y) be the shapeme at image location (x,y); (x o,y o ) be the object location in Z. Compute average log likelihood S ON (e,Z) as: Then we have: S OFF (e,Z) is defined similarly. shapeme i (horizontal line)

53 Computer Vision Group University of California Berkeley 53

54 Computer Vision Group University of California Berkeley 54

55 Computer Vision Group University of California Berkeley 55 Results InputInput PbOutput ContourOutput Figure

56 Computer Vision Group University of California Berkeley 56 InputInput PbOutput ContourOutput Figure

57 Computer Vision Group University of California Berkeley 57 InputInput PbOutput ContourOutput Figure

58 Computer Vision Group University of California Berkeley 58 Conclusion II Constrained Delaunay Triangulation provides a scale- invariant discrete structure which enables efficient probabilistic inference. Conditional random fields combine joint contour and region grouping and can be efficiently trained. Mid-level cues are useful for figure/ground segmentation, even when object-specific cues are present. Constrained Delaunay Triangulation provides a scale- invariant discrete structure which enables efficient probabilistic inference. Conditional random fields combine joint contour and region grouping and can be efficiently trained. Mid-level cues are useful for figure/ground segmentation, even when object-specific cues are present.

59 Computer Vision Group University of California Berkeley 59 Overview Scale Invariance in Natural Images A Scale-Invariant Representation from bottom-up using Constrained Delaunay Triangulation (CDT) Scale-Invariant Models for Contour Completion –A baseline local model –A global model using Conditional Random Fields (CRF) –Results and Quantitative Evaluations Extensions of the CDT/CRF Framework –Image segmentation –Figure/ground organization –Finding people in a single image

60 Computer Vision Group University of California Berkeley 60 Figure/Ground Organization A classical problem in Gestalt psychology – [Rubin 1921] “Perceptual organization after grouping” Gestalt principles for figure/ground –surroundedness, size, convexity, parallelism, symmetry, lower-region, common fate, familiar configuration, … Very few computational studies [Hinton 86], [von der Heydt 93], … A classical problem in Gestalt psychology – [Rubin 1921] “Perceptual organization after grouping” Gestalt principles for figure/ground –surroundedness, size, convexity, parallelism, symmetry, lower-region, common fate, familiar configuration, … Very few computational studies [Hinton 86], [von der Heydt 93], …

61 Computer Vision Group University of California Berkeley 61 Shapemes: Prototypical Shapes local figure/ground cues: size/convexity, parallelism, …

62 Computer Vision Group University of California Berkeley 62 Reasoning about T-junction F G F F G G common F G F G G F uncommon

63 Computer Vision Group University of California Berkeley 63 CRF for Figure/Ground F={F 1,F 2,…,F m } F i  {Left,Right} One feature for each junction type F G F F G G F G F G G F F G F G

64 Computer Vision Group University of California Berkeley 64 Results on Natural Images Chance error rate Baseline size/convexity Local model w/ shapemes Using human segmentations: –Averaging local cues on human-marked boundaries –CRF w/ junction type and continuity Using low-level edges –Averaging local cues –CRF w/ junction type and continuity Dataset inconsistency Chance error rate Baseline size/convexity Local model w/ shapemes Using human segmentations: –Averaging local cues on human-marked boundaries –CRF w/ junction type and continuity Using low-level edges –Averaging local cues –CRF w/ junction type and continuity Dataset inconsistency 50% 44% 36% 29% 22% 34% 32% 12% 50% 44% 36% 29% 22% 34% 32% 12%

65 Computer Vision Group University of California Berkeley 65

66 Computer Vision Group University of California Berkeley 66

67 Computer Vision Group University of California Berkeley 67 Overview Scale Invariance in Natural Images A Scale-Invariant Representation from bottom-up using Constrained Delaunay Triangulation (CDT) Scale-Invariant Models for Contour Completion –A baseline local model –A global model using Conditional Random Fields (CRF) –Results and Quantitative Evaluations Extensions of the CDT/CRF Framework –Image segmentation –Figure/ground organization –Finding people in a single image

68 Computer Vision Group University of California Berkeley 68 Finding People The Challenges: pose, clothing, lighting,, clutter, … The Setting: single image, no pose restriction Approaches: Top-down holistic template matching Assembly of 2D parts from bottom-up The Assembly Problem: Dynamic programming on tree-model Brute-force search / pruning Sampling

69 Computer Vision Group University of California Berkeley 69 Our Pipeline Preprocessing with Constrained Delaunay Triangulation Detecting Candidate Parts using Parallelism Learning Pairwise Constraints between Parts Assembling Parts by Integer Quadratic Programming (IQP) Preprocessing with Constrained Delaunay Triangulation Detecting Candidate Parts using Parallelism Learning Pairwise Constraints between Parts Assembling Parts by Integer Quadratic Programming (IQP)

70 Computer Vision Group University of California Berkeley 70 Detecting Parts with Parallelism Candidate parts as parallel line segments (Ebenbreite) Automatic scale selection from bottom-up Feature combination with a logistic classifier Candidate parts as parallel line segments (Ebenbreite) Automatic scale selection from bottom-up Feature combination with a logistic classifier

71 Computer Vision Group University of California Berkeley 71 Assembly as Assignment Candidates {C i }Parts {L j } ( L j1,C i1 =  (L j1 ) ) ( L j2,C i2 =  (L j2 ) ) Cost for a partial assignment {(L j1,C i1 ), (L j2,C i2 )}: assignment 

72 Computer Vision Group University of California Berkeley 72

73 Computer Vision Group University of California Berkeley 73

74 Computer Vision Group University of California Berkeley 74

75 Computer Vision Group University of California Berkeley 75 Conclusion (again) Scale-invariance is a key issue in vision and we develop a scale-invariant representation using Constrained Delaunay Triangulation (CDT). The CDT graph is a superpixel representation with little loss of structure and enables efficient training and testing of probabilistic models. On top of the CDT graph we use Conditional Random Fields (CRF) to solve mid-level vision problems including contour completion, image segmentation and figure/ground organization. By quantitative evaluation on large datasets of natural images, we show that our CRF models significantly improve performance. Scale-invariance is a key issue in vision and we develop a scale-invariant representation using Constrained Delaunay Triangulation (CDT). The CDT graph is a superpixel representation with little loss of structure and enables efficient training and testing of probabilistic models. On top of the CDT graph we use Conditional Random Fields (CRF) to solve mid-level vision problems including contour completion, image segmentation and figure/ground organization. By quantitative evaluation on large datasets of natural images, we show that our CRF models significantly improve performance.

76 Computer Vision Group University of California Berkeley 76 Thank You

77 Computer Vision Group University of California Berkeley 77

78 Computer Vision Group University of California Berkeley 78

79 Computer Vision Group University of California Berkeley 79

80 Computer Vision Group University of California Berkeley 80 Contour Geometry First-Order Markov Model –[Mumford 94, Williams & Jacobs 95] –Curvature: white noise ( independent from position to position ) –Tangent t(s): random walk –Markov assumption: the tangent at the next position, t(s+1), only depends on the current tangent t(s) First-Order Markov Model –[Mumford 94, Williams & Jacobs 95] –Curvature: white noise ( independent from position to position ) –Tangent t(s): random walk –Markov assumption: the tangent at the next position, t(s+1), only depends on the current tangent t(s) t (s) t (s+1) s s+1

81 Computer Vision Group University of California Berkeley 81 Contours are Smooth P( t (s+1) | t (s) ) marginal distribution of tangent change t (s) t (s+1) s s+1

82 Computer Vision Group University of California Berkeley 82 Testing the Markov Assumption Segment the contours at high-curvature positions

83 Computer Vision Group University of California Berkeley 83 Prediction: Exponential If the first-order Markov assumption holds… At every step, there is a constant probability p that a high curvature event will occur High curvature events are independent from step to step –Let L be the length of a segment between high-curvature points – P( L>=k ) = (1-p)k – P( L=k ) = p(1-p)k –L has an exponential distribution If the first-order Markov assumption holds… At every step, there is a constant probability p that a high curvature event will occur High curvature events are independent from step to step –Let L be the length of a segment between high-curvature points – P( L>=k ) = (1-p)k – P( L=k ) = p(1-p)k –L has an exponential distribution

84 Computer Vision Group University of California Berkeley 84 Empirical: Power Law Contour segment length L Probability

85 Computer Vision Group University of California Berkeley 85 Scale Invariance of CDT


Download ppt "Computer Vision Group University of California Berkeley 1 Scale-Invariant Random Fields for Mid-level Vision Xiaofeng Ren, Charless Fowlkes and Jitendra."

Similar presentations


Ads by Google