Download presentation
Presentation is loading. Please wait.
1
Context Aware Spatial Priors using Entity Relations (CASPER) Geremy Heitz Jonathan Laserson Daphne Koller December 10 th, 2007 DAGS
2
Outline Goal – Scene Understanding Existing Methods CASPER Preliminary Experiments Future Direction – Going Discriminative
3
Building Tree Car
4
Representation Building Tree Car Building Car l = bag of object categories ρ = location of centroids We model P( ρ, l) Why? Because we use a generative model P(ρ, l | I) ~ P(ρ, l) P(I|ρ, l) I = the Image
5
Building Tree Car Building Car Building Tree Car Building Car Tree Car Which one makes more sense? Does Context matter?
6
Can it help Object Recognition? LOOPS
7
Outline Goal – Scene Understanding Existing Methods CASPER Preliminary Experiments Future Direction – Going Discriminative
8
Fixed Order Model Each image has the same bag of objects example: 1 car, 2 buildings, 1 tree Object centroids are drawn jointly 1 P(ρ, l) = 1 {l = l_fixed_order} P(ρ | l) Similar to constellations (Fergus) Problem: We don't always know the exact set of objects
9
TDP (Sudderth, 2005) Each image has a different bag of objects Object centroids are drawn independently P(ρ, l) = P(l) П P(ρ i | l i ) Problems: This doesn't take pairwise constraints into account We have lost context
10
Outline Goal – Scene Understanding Existing Methods CASPER Preliminary Experiments Future Direction – Going Discriminative
11
CASPER Each image has a different bag of objects Object centroids are drawn jointly given l P(ρ,l) = P(l) P(ρ | l) Questions: How do we represent P(l)? How do we represent P(ρ | l)? How do we learn? How do we infer?
12
P(l) Dirichlet Process We don’t want to get into that now Other options Multinomial Uniform
13
P( ρ | l) - Desiderata Correlations between ρ's Sharing of parameters between l's Intuitive parameterization Continuous Multivariate Distribution Easy to learn parameters Easy to evaluate likelihood Easy to condition Gaussian?
14
MV Gaussian - Options Learn a different Gaussian for every l Can't share parameters Large number (∞) of l's Gaussian Process ρ(x) ~ GP(mu(x), K(x,x’)) Every finite set of x’s produces a Gaussian ρ [ρ(x 1 ) ρ(x 2 ) … ρ(x k )] ~ Gaussian x t is a hidden function of the class l t Mu(x t ) = Ax t K(x t,x t’ ) = c exp(-||B(x t -x t’ )|| 2 ) Two objects of the same class -> same x? Is correlation the natural space?
15
Car Spatial Distribution - Options “Singleton Expert” P(ρ i |l i ) Gaussian over absolute object location “Pairwise Expert”P(ρ i -ρ j | li,lj ) Gaussian offset between objects Expert can be one of K mixture components Tree Car k = 1 k = 2 k = 1
16
CASPER P(ρ|l) How to use experts? Introduce an auxiliary variable d P(ρ|d,l) d tells us which experts are ‘on’ Building Tree Car Building Car For each edge e= (l i,l j ), d e indexes all possible experts for this edge Default is a uniform expert P(ρ|d,l) ~ POE d POE d = П P(ρ i |l i ) П P(ρ i -ρ j | dij,li,lj ) Product of Gaussians is a Gaussian
17
CASPER P(ρ|d,l) POE d = Z d N(ρ; μ d, Σ d ) P(ρ|d,l) = N(ρ; μ d, Σ d ) = 1/Z d POE d P(d|l) ~ Z d (Multinomial) P(ρ,d|l) ~ POE d Car3 Car2 Car1 Car2 Car1 Car Car3 Car2 Example: P(ρ,d|l) ~ P(ρ 2 -ρ 1 | d 12 ) P(ρ 3 -ρ 2 | d 32 ) d1 d2 Car2 P(ρ|d 1,l) = P(ρ|d 2,l) but Z d 2 >Z d 1 hence POE d 2 > POE d 1
18
Learning the Experts Training set with supervised (ρ,l) pairs (one pair for each image) Gibbs over the hidden variables d e Loop over edges Update expert sufficient statistics with each update Does it converge? not as much as we want it to Work in progress Building Tree Car Building Car
19
Outline Goal – Scene Understanding Existing Methods CASPER Preliminary Experiments Future Direction – Going Discriminative
20
Preliminary Experiments LabelMe Datasets STREETS BEDROOMS
21
* * * * * * * * * * * * * ** * * * * * * * * * * * FEATURES Harris Interest Operator -> y i SIFT Descriptor -> w i Instance membership -> t i INSTANCES Centroid -> ρ t Class label -> l t * * Car ρtρt (y i, w i, t i ) (ρ t, l t ) Observed P(I| ρ,l) = P(y, w|ρ,l)
22
What do the true ρ’s look like? Car -> Car Lamp -> Lamp Bed -> Lamp
23
Learning/Inference in Full Model TDP - Three stage Gibbs: Assign features to instances (Sample t i for every feature) Assign expert components (Sample d e for every edge) Assign instances to classes (Sample l t, ρ t for every instance) Training Supervise (t,l) variables Gibbs over d and ρ Testing Introduce new images Gibbs (t,l,d,ρ) of new images Independent-TDP: ρ’s are independent CASPER-TDP: ρ’s are distributed according to CASPER
24
Learned Experts
25
* * * * * * * * * * * * * ** * * * * * * * * * * * FEATURES * * (y i, w i, t i ) * * * * *
26
IMAGE GROUNDTRUTH IND – N = 0.1IND – N = 0.5
27
Evaluation – Gen Model N = 0.1N = 0.3N = 0.5 Bed 0.61110.62860.5882 Lamp 0.30770.16670.0000 Painting 0.53330.33330.2857 Window 0.90910.76920.5455 Table 0.66670.42110.3529 “Synthetic Appearance” Visual words give strong indicator for the class Evaluated on Detection Performance Precision/Recall F1 score for centroid and class identification Results here with Independent TDP Can we hope to do this well?
28
Evaluation - Context INDEPENDENTCASPER Bed 0.58820.5714 Lamp 0.0000 Painting 0.28570.1333 Window 0.54550.4000 Table 0.35290.1250 Independent-TDP vs CASPER-TDP N = 0.5 Why isn’t context helping here?
29
Problems with this Setup Bad Feelings Supervised setting – Detection Our model is not trained to maximize detection ability We will lose to many/most discriminative approaches Context is NOT the main reason why TDP fails Unsupervised setting Likelihood? Does anyone care? Object discovery? Context is a lower-order consideration How would we show that CASPER > Independent?
30
Outline Goal – Scene Understanding Existing Methods CASPER Preliminary Experiments Future Direction – Going Discriminative
31
Going Discriminative Up to now we have been generative: P(I, ρ, l) = P(I | ρ, l) P(ρ, l) How do we convert this into discriminative? Include CASPER distribution over (ρ,l) Include term with boosted object detectors Slap on a partition function P(ρ, l | I) = 1/Z * CASPER * DETECTORS
32
Discriminative Framework Boosted Detectors “Over detect” Each “candidate” has: location ρ t, class variable l t detection score D I (l t ) P(ρ, l | I) ~ P(ρ, l) Π D I (l t ) Goal: Reassign detection candidates to classes Respects the “detection strength” Respects the context between objects D I (face) = 0.09 D I (face) = 0.92
33
Similarities to Steve’s work “Over detection” using boosted detectors But some detections don’t make sense in context 3D information allows him to “sort out” which detections are correct
34
CASPER Learning/Inference Gibbs Inference Loop over images Loop over detection candidates t Sample (l t | everything else) Loop over pairs of candidates Sample (d e | everything else) Training l t is known, Gibbs over d e Evaluation Precision/Recall for detections
35
Possible Datasets
36
Short Term Plan Learn the boosted detectors Determine our baseline performance Add Gibbs inference Submit to a conference that is far far away… ICML = Helsinki, Finland
38
Alternate Names Spatial Priors for Arbitrary Groups of Objects
39
Product of Experts Precision Space View P1(x) = N(a, A) P2(x) = N(b, B) P1(x)P2(x) = Z N(c, C) Z = N(a ; b, B+A) C -1 = A -1 + B -1 c = C(A -1 a + B -1 b) What does this mean? Precision matrices of the experts ADD Even if each expert has a singular A -1 the sum is PSD
40
CASPER Detection Detection strength component DI(lt) = P(lt | I[ρt]) Occurrence component P(l) = Π P(lt)lt ~ Multinomial CASPER component P(ρ,d | l) ~ POEd 0.92 0.09
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.