Download presentation
Presentation is loading. Please wait.
1
Visual Grouping and Recognition Jitendra Malik U.C. Berkeley Jitendra Malik U.C. Berkeley
2
Collaborators Grouping: Jianbo Shi (CMU), Serge Belongie (UCSD), Thomas Leung (Fuji) Database of human segmented images and ecological statistics: David Martin, Charless Fowlkes, Xiaofeng Ren Recognition: Serge Belongie, Jan Puzicha Grouping: Jianbo Shi (CMU), Serge Belongie (UCSD), Thomas Leung (Fuji) Database of human segmented images and ecological statistics: David Martin, Charless Fowlkes, Xiaofeng Ren Recognition: Serge Belongie, Jan Puzicha
4
The visual system performs Inference of lightness, shape and spatial relations Perceptual Organization Active interaction with environment Inference of lightness, shape and spatial relations Perceptual Organization Active interaction with environment
5
A brief history of vision science 1850-1900 –Trichromacy, stereopsis, eye movements, contrast, visual acuity.. 1900-1950 –Apparent movement, grouping, figure-ground.. 1950-2000 –Ecological optics, geometrical analysis of shape cues, physiology of V1 and extra-striate areas.. 1850-1900 –Trichromacy, stereopsis, eye movements, contrast, visual acuity.. 1900-1950 –Apparent movement, grouping, figure-ground.. 1950-2000 –Ecological optics, geometrical analysis of shape cues, physiology of V1 and extra-striate areas..
6
Physiological Optics 1840-1894
7
The Empiricist-Nativist debate
8
The debate.. (and sometimes both were right !) Helmholtz argued that perception is unconscious inference. Associations are earned through experience. Hering proposed physiological mechanisms—opponent color channels, contrast mechanisms, conjunctive and disjunctive eye movements.. Helmholtz argued that perception is unconscious inference. Associations are earned through experience. Hering proposed physiological mechanisms—opponent color channels, contrast mechanisms, conjunctive and disjunctive eye movements..
9
The Twentieth Century.. The Gestalt movement emphasized perceptual organization. –Grouping –Figure/ground –Configuration effects on perception of brightness and lightness The Gestalt movement emphasized perceptual organization. –Grouping –Figure/ground –Configuration effects on perception of brightness and lightness
10
Gibson’s ecological optics (1950) Emphasized richness of information about shape and surface layout available to a moving observer –Optical flow –Texture Gradients –( and the classical cues such as stereopsis etc) Emphasized richness of information about shape and surface layout available to a moving observer –Optical flow –Texture Gradients –( and the classical cues such as stereopsis etc)
12
Visual Processing Areas
13
The visual system performs Inference of lightness, shape and spatial relations Perceptual Organization Active interaction with environment Inference of lightness, shape and spatial relations Perceptual Organization Active interaction with environment
14
From Images to Objects
15
What enables us to parse a scene? –Low level cues Color/texture Contours Motion –Mid level cues T-junctions Convexity –High level Cues Familiar Object Familiar Motion –Low level cues Color/texture Contours Motion –Mid level cues T-junctions Convexity –High level Cues Familiar Object Familiar Motion
16
Grouping factors
17
Grouping Factors
18
The Figure-Ground Problem
19
Focus of this talk Provide a mathematical foundation for the grouping problem in terms of the ecological statistics of natural images. –This research agenda was first proposed by Egon Brunswik, more than 50 years ago, who sought to justify Gestalt grouping factors in probabilistic terms. Provide a mathematical foundation for the grouping problem in terms of the ecological statistics of natural images. –This research agenda was first proposed by Egon Brunswik, more than 50 years ago, who sought to justify Gestalt grouping factors in probabilistic terms.
20
Outline of talk Creating a dataset of human segmented images Measuring ecological statistics of various Gestalt grouping factors Using these measurements to calibrate and validate approaches to grouping Creating a dataset of human segmented images Measuring ecological statistics of various Gestalt grouping factors Using these measurements to calibrate and validate approaches to grouping
21
Outline of talk Creating a dataset of human segmented images Measuring ecological statistics of various Gestalt grouping factors Using these measurements to calibrate and validate approaches to grouping Creating a dataset of human segmented images Measuring ecological statistics of various Gestalt grouping factors Using these measurements to calibrate and validate approaches to grouping
22
What kind of segmentations? What is a valid segmentation? Is there a correct segmentation? What granularity? What is a valid segmentation? Is there a correct segmentation? What granularity?
23
The Image Dataset 1000 Corel images –Photographs of natural scenes –Texture is common –Large variety of subject matter –481 x 321 x 24b 1000 Corel images –Photographs of natural scenes –Texture is common –Large variety of subject matter –481 x 321 x 24b
28
Establishing Ground truth Def: Segmentation = Partition of image pixels into exclusive sets Custom tool to facilitate manual segmentation –Java application, on website Multiple segmentations/image Currently: 1000 images, 5000 segmentations, 20 subjects –Data collection ongoing Naïve subjects (UCB undergrads) given simple, non-technical instructions Def: Segmentation = Partition of image pixels into exclusive sets Custom tool to facilitate manual segmentation –Java application, on website Multiple segmentations/image Currently: 1000 images, 5000 segmentations, 20 subjects –Data collection ongoing Naïve subjects (UCB undergrads) given simple, non-technical instructions
29
Directions to Image Segmentors You will be presented a photographic image Divide the image into some number of segments, where the segments represent “things” or “parts of things” in the scene The number of segments is up to you, as it depends on the image. Something between 2 and 30 is likely to be appropriate. It is important that all of the segments have approximately equal importance. You will be presented a photographic image Divide the image into some number of segments, where the segments represent “things” or “parts of things” in the scene The number of segments is up to you, as it depends on the image. Something between 2 and 30 is likely to be appropriate. It is important that all of the segments have approximately equal importance.
32
Segmentations are not identical
33
But are they consistent?
34
Perceptual organization produces a hierarchy image backgroundleft birdright bird grassbush headeye beakfar body headeye beak body Each subject picks a cross section from this hierarchy
35
S1S1 Quantifying inconsistency.. How much is segmentation S 1 a refinement of segmentation S 2 at pixel p i ? S2S2 refinement of E(S 1,S 2,p i ) = |(R(S 1,p i )\R(S 2,p i )| |R(S 1,p i )|
36
Segmentation Error Measure One-way Local Refinement Error: LRE(S 1,S 2,p i ) = ||(R(S 1,p i ) \ R(S 2,p i )|| ||R(S 1,p i )|| Segmentation Error defined to allow refinement in either direction at each pixel: SE(S 1,S 2 ) = 1/n i min { LRE(S 1,S 2,p i ), LRE(S 2,S 1,p i ) } One-way Local Refinement Error: LRE(S 1,S 2,p i ) = ||(R(S 1,p i ) \ R(S 2,p i )|| ||R(S 1,p i )|| Segmentation Error defined to allow refinement in either direction at each pixel: SE(S 1,S 2 ) = 1/n i min { LRE(S 1,S 2,p i ), LRE(S 2,S 1,p i ) }
37
Distribution of SE over Dataset
38
Gray, Color, InvNeg Datasets Explore how various high/low-level cues affect the task of image segmentation by subjects –Color = full color image –Gray = luminance image –InvNeg = inverted negative luminance image Explore how various high/low-level cues affect the task of image segmentation by subjects –Color = full color image –Gray = luminance image –InvNeg = inverted negative luminance image
39
ColorGrayInvNeg
41
ColorGrayInvNeg
42
Gray vs. Color vs. InvNeg Segmentations SE (gray, gray) = 0.047 SE (gray, color) = 0.047 SE (gray, invneg) = 0.059 Color may affect attention, but doesn’t seem to affect perceptual organization InvNeg seems to interfere with high-level cues 2500 gray segmentations 2500 color segmentations 200 invneg segmentations SE (gray, gray) = 0.047 SE (gray, color) = 0.047 SE (gray, invneg) = 0.059 Color may affect attention, but doesn’t seem to affect perceptual organization InvNeg seems to interfere with high-level cues 2500 gray segmentations 2500 color segmentations 200 invneg segmentations
43
Outline of talk Creating a dataset of human segmented images Measuring ecological statistics of various Gestalt grouping factors Using these measurements to calibrate and validate approaches to grouping Creating a dataset of human segmented images Measuring ecological statistics of various Gestalt grouping factors Using these measurements to calibrate and validate approaches to grouping
44
Natural images aren’t generic signals Filter statistics are far from Gaussian.. –Ruderman 1994,1997 –Field, Olshausen 1996 –Huang,Mumford 1999,2000 –Buccigrossi,Simoncelli 1999 These properties (e.g. scale-invariance, sparsity, heavy tails) can be exploited for image compression. Filter statistics are far from Gaussian.. –Ruderman 1994,1997 –Field, Olshausen 1996 –Huang,Mumford 1999,2000 –Buccigrossi,Simoncelli 1999 These properties (e.g. scale-invariance, sparsity, heavy tails) can be exploited for image compression.
45
P (SameSegment | Proximity)
46
P (SameSegment | Luminance)
47
Quantifying the power of cues Bayes Risk Mutual information Bayes Risk Mutual information
48
Bayes Risk for Proximity Cue
49
Mutual information where x is a cue and y is indicator of being in same segment
50
Bayes Risk for Various Cues Given Proximity
51
Mutual Information for Various Cues Given Proximity
52
Power of various cues Bayes RiskMutual Info. Proximity0.3350.044 Luminance0.3690.016 Color0.3690.014 Intervening Contour0.3030.081 Texture0.3000.112
53
Spatial priors on image regions and contours
54
Distribution of Region Area y = Kx - = 0.913
55
Distribution of length Decompose contours at high curvature extrema
56
Distribution of Length
57
Slope = 2.05 in Log-Log Plot I.e, frequency 1 / ( length )^2 ( for region area it’s roughly 1/area ) Slope = 2.05 in Log-Log Plot I.e, frequency 1 / ( length )^2 ( for region area it’s roughly 1/area )
58
Conditioned on Region Size
59
Scale invariance of contour statistics Chi-square distance 00.04090.0538 0.040900.0531 0.0538 0.05310
60
Marginal Distribution of Curvature
61
Distribution of Region Convexity
62
Outline of talk Creating a dataset of human segmented images Measuring ecological statistics of various Gestalt grouping factors Using these measurements to calibrate and validate approaches to grouping Creating a dataset of human segmented images Measuring ecological statistics of various Gestalt grouping factors Using these measurements to calibrate and validate approaches to grouping
63
Computational Mechanisms for Visual Grouping Jitendra Malik, Serge Belongie, Jianbo Shi, Thomas Leung U.C. Berkeley Jitendra Malik, Serge Belongie, Jianbo Shi, Thomas Leung U.C. Berkeley
64
Edge-based image segmentation Edge detection by gradient operators Linking by dynamic programming, voting, relaxation, … Montanari 71, Parent&Zucker 89, Guy&Medioni 96, Shaashua&Ullman 88 Williams&Jacobs 95, Geiger&Kumaran 96, Heitger&von der Heydt 93 -Natural for encoding curvilinear grouping -Hard decisions often made prematurely - Produce meaningless clutter in textured regions Edge detection by gradient operators Linking by dynamic programming, voting, relaxation, … Montanari 71, Parent&Zucker 89, Guy&Medioni 96, Shaashua&Ullman 88 Williams&Jacobs 95, Geiger&Kumaran 96, Heitger&von der Heydt 93 -Natural for encoding curvilinear grouping -Hard decisions often made prematurely - Produce meaningless clutter in textured regions
65
Edges in textured regions are meaningless clutter imageorientation energy
66
Region-based image segmentation 1970s produced region growing, split-and-merge, etc... 1980s led to approaches based on a global criterion for image segmentation –Markov Random Fields e.g. Geman&Geman 84 –Variational approaches e.g. Mumford&Shah 89 –Expectation-Maximization e.g. Ayer&Sawhney 95, Weiss 97 Global method, but computational complexity precludes exact MAP estimation –Curvilinear grouping not easily enforced – Unable to handle line-drawings – Problems due to local minima 1970s produced region growing, split-and-merge, etc... 1980s led to approaches based on a global criterion for image segmentation –Markov Random Fields e.g. Geman&Geman 84 –Variational approaches e.g. Mumford&Shah 89 –Expectation-Maximization e.g. Ayer&Sawhney 95, Weiss 97 Global method, but computational complexity precludes exact MAP estimation –Curvilinear grouping not easily enforced – Unable to handle line-drawings – Problems due to local minima
67
Our Approach Global decision good, local bad –Formulate as hierarchical graph partitioning Efficient computation –Draw on ideas from spectral graph theory to define an eigenvalue problem which can be solved for finding segmentation. Develop suitable encoding of visual cues in terms of graph weights. Global decision good, local bad –Formulate as hierarchical graph partitioning Efficient computation –Draw on ideas from spectral graph theory to define an eigenvalue problem which can be solved for finding segmentation. Develop suitable encoding of visual cues in terms of graph weights.
68
Image Segmentation as Graph Partitioning Build a weighted graph G=(V,E) from image V:image pixels E:connections between pairs of nearby pixels Partition graph so that similarity within group is large and similarity between groups is small -- Normalized Cuts [Shi&Malik 97]
69
Normalized Cuts as a Spring-Mass system Each pixel is a point mass; each connection is a spring: Fundamental modes are generalized eigenvectors of Each pixel is a point mass; each connection is a spring: Fundamental modes are generalized eigenvectors of
70
Some Terminology for Graph Partitioning How do we bipartition a graph:
71
Normalized Cut, A measure of dissimilarity Minimum cut is not appropriate since it favors cutting small pieces. Normalized Cut, Ncut: Minimum cut is not appropriate since it favors cutting small pieces. Normalized Cut, Ncut:
72
Normalized Cut and Normalized Association Minimizing similarity between the groups, and maximizing similarity within the groups can be achieved simultaneously.
73
Solving the Normalized Cut problem Exact discrete solution to Ncut is NP- complete even on regular grid, –[Papadimitriou’97] Drawing on spectral graph theory, good approximation can be obtained by solving a generalized eigenvalue problem. Exact discrete solution to Ncut is NP- complete even on regular grid, –[Papadimitriou’97] Drawing on spectral graph theory, good approximation can be obtained by solving a generalized eigenvalue problem.
74
Some definitions
75
Normalized Cut As Generalized Eigenvalue problem Rewriting Normalized Cut in matrix form:
76
More math…
77
Normalized Cut As Generalized Eigenvalue problem after simplification, we get
78
Normalized Cut As Generalized Eigenvalue problem The eigenvector with the second smallest eigenvalue of the generalized eigensystem: is the solution to the constrained Raleigh quotient: The eigenvector with the second smallest eigenvalue of the generalized eigensystem: is the solution to the constrained Raleigh quotient:
79
Interpretation as a Dynamical System The equivalent spring-mass system: The generalized eigenvectors are the fundamental modes of oscillation. The equivalent spring-mass system: The generalized eigenvectors are the fundamental modes of oscillation.
80
Video
81
Computational Aspects Solving for the generalized eigensystem: (D-W) is of size, but it is sparse with O(N) nonzero entries, where N is the number of pixels. Using Lanczos algorithm. Solving for the generalized eigensystem: (D-W) is of size, but it is sparse with O(N) nonzero entries, where N is the number of pixels. Using Lanczos algorithm.
82
Overall Procedure Construct a weighted graph G=(V,E) from an image Connect each pair of pixels, and assign graph edge weight, Solve for the smallest few eigenvectors, Recursively subdivide if Ncut value is below a pre- specified value. Construct a weighted graph G=(V,E) from an image Connect each pair of pixels, and assign graph edge weight, Solve for the smallest few eigenvectors, Recursively subdivide if Ncut value is below a pre- specified value.
83
Normalized Cuts Approach Global decision good, local bad –Formulate as hierarchical graph partitioning Efficient computation –Draw on ideas from spectral graph theory to define an eigenvalue problem which can be solved for finding segmentation. Develop suitable encoding of visual cues in terms of graph weights. Global decision good, local bad –Formulate as hierarchical graph partitioning Efficient computation –Draw on ideas from spectral graph theory to define an eigenvalue problem which can be solved for finding segmentation. Develop suitable encoding of visual cues in terms of graph weights.
84
Cue Integration based on Texton histograms based on Intervening contour based on Texton histograms based on Intervening contour
85
Filters for Texture Description Elongated directional Gaussian derivatives 2nd derivative and Hilbert transform L 1 normalized for scale invariance 6 orientations, 3 scales Zero mean Elongated directional Gaussian derivatives 2nd derivative and Hilbert transform L 1 normalized for scale invariance 6 orientations, 3 scales Zero mean
86
Textons K-means on vectors of filter responses
87
Textons (cont.)
88
Benefits of the Texton Representation Discrete point sets well suited to tools of computational geometry, point process statistics Defining Local Scale Selection Measuring Texture Similarity Discrete point sets well suited to tools of computational geometry, point process statistics Defining Local Scale Selection Measuring Texture Similarity
89
Texton Histograms i j k Chi square test: 0.1 0.8
90
Intervening Contours as and are more likely to belong to the same region than are and.
91
Estimating for contour cue ImageOrientation Energy Estimate where is the maximum orientation energy along segment ij
92
Orientation Energy Gaussian 2nd derivative and its Hilbert pair Can detect combination of bar and edge features; also insensitive to linear shading [Perona&Malik 90] Multiple scales
93
Challenges of Cue Integration Contour cue tends to fragment textured regions Texture cue tends to create 1D regions from contours Contour cue tends to fragment textured regions Texture cue tends to create 1D regions from contours
94
Texture as a problem for contour processing imageorientation energy
95
Contour as a problem for texture processing Segmentation based on Gaussian Mixture Model EM
96
Cue Integration Gate contour vs. texture cue based on region-boundary vs. region-interior label Compute boundary vs. interior label using statistical test on region uniformity Multiply to get combined weight: Gate contour vs. texture cue based on region-boundary vs. region-interior label Compute boundary vs. interior label using statistical test on region uniformity Multiply to get combined weight:
97
Motion Segmentation with Normalized Cuts Networks of spatial-temporal connections:
98
Motion Segmentation with Normalized Cuts Motion “proto-volume” in space-time Group correspondence Motion “proto-volume” in space-time Group correspondence
99
Results video
100
Results
102
Stereoscopic data
103
Framework for Recognition (1) Segmentation Pixels Segments (2) Association Segments Regions (3) Matching Regions Prototypes Over-segmentation necessary; Under- segmentation fatal Enumerate: # of size k regions in image with n segments is ~(4**k)*n/k ~10 views/object. Matching tolerant to pose/illumination changes, intra-category variation, error in previous steps
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.