Context Aware Spatial Priors using Entity Relations (CASPER) Geremy Heitz Jonathan Laserson Daphne Koller December 10 th, 2007 DAGS.

Slides:

Advertisements

Similar presentations

Weakly supervised learning of MRF models for image region labeling Jakob Verbeek LEAR team, INRIA Rhône-Alpes.

Advertisements

Mixture Models and the EM Algorithm

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.

LECTURE 11: BAYESIAN PARAMETER ESTIMATION

Expectation Maximization

Supervised Learning Recap

Mixture of trees model: Face Detection, Pose Estimation and Landmark Localization Presenter: Zhang Li.

Segmentation and Fitting Using Probabilistic Methods

Unsupervised Learning of Visual Object Categories Michael Pfeiffer

EE-148 Expectation Maximization Markus Weber 5/11/99.

Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

Quantifying and Transferring Contextual Information in Object Detection Professor: S. J. Wang Student : Y. S. Wang 1.

1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.

Training Regimes Motivation  Allow state-of-the-art subcomponents  With “Black-box” functionality  This idea also occurs in other application areas.

Generic Object Recognition -- by Yatharth Saraf A Project on.

Learning Spatial Context: Using stuff to find things Geremy Heitz Daphne Koller Stanford University October 13, 2008 ECCV 2008.

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Unsupervised discovery of visual object class hierarchies Josef Sivic (INRIA / ENS), Bryan Russell (MIT), Andrew Zisserman (Oxford), Alyosha Efros (CMU)

Expectation Maximization for GMM Comp344 Tutorial Kai Zhang.

Visual Recognition Tutorial

Learning Bayesian Networks

Computer vision: models, learning and inference Chapter 10 Graphical Models.

Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.

Learning Spatial Context: Can stuff help us find things? Geremy Heitz Daphne Koller April 14, 2008 DAGS Stuff (n): Material defined by a homogeneous or.

Crash Course on Machine Learning

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

SVCL Automatic detection of object based Region-of-Interest for image compression Sunhyoung Han.

EM and expected complete log-likelihood Mixture of Experts

Fast Max–Margin Matrix Factorization with Data Augmentation Minjie Xu, Jun Zhu & Bo Zhang Tsinghua University.

Recognition using Regions (Demo) Sudheendra V. Outline Generating multiple segmentations –Normalized cuts [Ren & Malik (2003)] Uniform regions –Watershed.

Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.

Unsupervised Constraint Driven Learning for Transliteration Discovery M. Chang, D. Goldwasser, D. Roth, and Y. Tu.

Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

Features-based Object Recognition P. Moreels, P. Perona California Institute of Technology.

The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.

An Asymptotic Analysis of Generative, Discriminative, and Pseudolikelihood Estimators by Percy Liang and Michael Jordan (ICML 2008 ) Presented by Lihan.

MSRI workshop, January 2005 Object Recognition Collected databases of objects on uniform background (no occlusions, no clutter) Mostly focus on viewpoint.

MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and.

Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.

CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.

Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter.

Gaussian Processes Li An Li An

Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.

Object Recognition Part 2 Authors: Kobus Barnard, Pinar Duygulu, Nado de Freitas, and David Forsyth Slides by Rong Zhang CSE 595 – Words and Pictures Presentation.

Discussion of Pictorial Structures Pedro Felzenszwalb Daniel Huttenlocher Sicily Workshop September, 2006.

Lecture 2: Statistical learning primer for biologists

Recognition Using Visual Phrases

CSE 517 Natural Language Processing Winter 2015

Towards Total Scene Understanding: Classiﬁcation, Annotation and Segmentation in an Automatic Framework N 工科所錢雅馨 2011/01/16 Li-Jia Li, Richard.

Lecture 3: MLE, Bayes Learning, and Maximum Entropy

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

1.Learn appearance based models for concepts 2.Compute posterior probabilities or Semantic Multinomial (SMN) under appearance models. -But, suffers from.

RADFORD M. NEAL GEOFFREY E. HINTON 발표: 황규백

Systematics in Hfitter. Reminder: profiling nuisance parameters Likelihood ratio is the most powerful discriminant between 2 hypotheses What if the hypotheses.

An unsupervised conditional random fields approach for clustering gene expression time series Chang-Tsun Li, Yinyin Yuan and Roland Wilson Bioinformatics,

Machine Learning Expectation Maximization and Gaussian Mixtures CSE 473 Chapter 20.3.

Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.

CSE 4705 Artificial Intelligence

Outline Scene Understanding Motivation for Context

Classification of unlabeled data:

Latent Variables, Mixture Models and EM

Course Outline MODEL INFORMATION COMPLETE INCOMPLETE

Unsupervised Learning of Models for Recognition

Cascaded Classification Models

Topic models for corpora and for graphs

Presented by Wanxue Dong

Presentation transcript:

Context Aware Spatial Priors using Entity Relations (CASPER) Geremy Heitz Jonathan Laserson Daphne Koller December 10 th, 2007 DAGS

Outline Goal – Scene Understanding Existing Methods CASPER Preliminary Experiments Future Direction – Going Discriminative

Building Tree Car

Representation Building Tree Car Building Car l = bag of object categories ρ = location of centroids We model P( ρ, l) Why? Because we use a generative model P(ρ, l | I) ~ P(ρ, l) P(I|ρ, l) I = the Image

Building Tree Car Building Car Building Tree Car Building Car Tree Car Which one makes more sense? Does Context matter?

Can it help Object Recognition? LOOPS

Outline Goal – Scene Understanding Existing Methods CASPER Preliminary Experiments Future Direction – Going Discriminative

Fixed Order Model Each image has the same bag of objects example: 1 car, 2 buildings, 1 tree Object centroids are drawn jointly 1 P(ρ, l) = 1 {l = l_fixed_order} P(ρ | l) Similar to constellations (Fergus) Problem: We don't always know the exact set of objects

TDP (Sudderth, 2005) Each image has a different bag of objects Object centroids are drawn independently P(ρ, l) = P(l) П P(ρ i | l i ) Problems: This doesn't take pairwise constraints into account We have lost context

Outline Goal – Scene Understanding Existing Methods CASPER Preliminary Experiments Future Direction – Going Discriminative

CASPER Each image has a different bag of objects Object centroids are drawn jointly given l P(ρ,l) = P(l) P(ρ | l) Questions: How do we represent P(l)? How do we represent P(ρ | l)? How do we learn? How do we infer?

P(l) Dirichlet Process We don’t want to get into that now Other options Multinomial Uniform

P( ρ | l) - Desiderata Correlations between ρ's Sharing of parameters between l's Intuitive parameterization Continuous Multivariate Distribution Easy to learn parameters Easy to evaluate likelihood Easy to condition Gaussian?

MV Gaussian - Options Learn a different Gaussian for every l Can't share parameters Large number (∞) of l's Gaussian Process ρ(x) ~ GP(mu(x), K(x,x’)) Every finite set of x’s produces a Gaussian ρ [ρ(x 1 ) ρ(x 2 ) … ρ(x k )] ~ Gaussian x t is a hidden function of the class l t Mu(x t ) = Ax t K(x t,x t’ ) = c exp(-||B(x t -x t’ )|| 2 ) Two objects of the same class -> same x? Is correlation the natural space?

Car Spatial Distribution - Options “Singleton Expert” P(ρ i |l i ) Gaussian over absolute object location “Pairwise Expert”P(ρ i -ρ j | li,lj ) Gaussian offset between objects Expert can be one of K mixture components Tree Car k = 1 k = 2 k = 1

CASPER P(ρ|l) How to use experts? Introduce an auxiliary variable d P(ρ|d,l) d tells us which experts are ‘on’ Building Tree Car Building Car For each edge e= (l i,l j ), d e indexes all possible experts for this edge Default is a uniform expert P(ρ|d,l) ~ POE d POE d = П P(ρ i |l i ) П P(ρ i -ρ j | dij,li,lj ) Product of Gaussians is a Gaussian

CASPER P(ρ|d,l) POE d = Z d N(ρ; μ d, Σ d ) P(ρ|d,l) = N(ρ; μ d, Σ d ) = 1/Z d POE d P(d|l) ~ Z d (Multinomial) P(ρ,d|l) ~ POE d Car3 Car2 Car1 Car2 Car1 Car Car3 Car2 Example: P(ρ,d|l) ~ P(ρ 2 -ρ 1 | d 12 ) P(ρ 3 -ρ 2 | d 32 ) d1 d2 Car2 P(ρ|d 1,l) = P(ρ|d 2,l) but Z d 2 >Z d 1 hence POE d 2 > POE d 1

Learning the Experts Training set with supervised (ρ,l) pairs (one pair for each image) Gibbs over the hidden variables d e Loop over edges Update expert sufficient statistics with each update Does it converge? not as much as we want it to Work in progress Building Tree Car Building Car

Outline Goal – Scene Understanding Existing Methods CASPER Preliminary Experiments Future Direction – Going Discriminative

Preliminary Experiments LabelMe Datasets STREETS BEDROOMS

* * * * * * * * * * * * * ** * * * * * * * * * * * FEATURES Harris Interest Operator -> y i SIFT Descriptor -> w i Instance membership -> t i INSTANCES Centroid -> ρ t Class label -> l t * * Car ρtρt (y i, w i, t i ) (ρ t, l t ) Observed P(I| ρ,l) = P(y, w|ρ,l)

What do the true ρ’s look like? Car -> Car Lamp -> Lamp Bed -> Lamp

Learning/Inference in Full Model TDP - Three stage Gibbs: Assign features to instances (Sample t i for every feature) Assign expert components (Sample d e for every edge) Assign instances to classes (Sample l t, ρ t for every instance) Training Supervise (t,l) variables Gibbs over d and ρ Testing Introduce new images Gibbs (t,l,d,ρ) of new images Independent-TDP: ρ’s are independent CASPER-TDP: ρ’s are distributed according to CASPER

Learned Experts

* * * * * * * * * * * * * ** * * * * * * * * * * * FEATURES * * (y i, w i, t i ) * * * * *

IMAGE GROUNDTRUTH IND – N = 0.1IND – N = 0.5

Evaluation – Gen Model N = 0.1N = 0.3N = 0.5 Bed Lamp Painting Window Table “Synthetic Appearance” Visual words give strong indicator for the class Evaluated on Detection Performance Precision/Recall F1 score for centroid and class identification Results here with Independent TDP Can we hope to do this well?

Evaluation - Context INDEPENDENTCASPER Bed Lamp Painting Window Table Independent-TDP vs CASPER-TDP N = 0.5 Why isn’t context helping here?

Problems with this Setup Bad Feelings Supervised setting – Detection Our model is not trained to maximize detection ability We will lose to many/most discriminative approaches Context is NOT the main reason why TDP fails Unsupervised setting Likelihood? Does anyone care? Object discovery? Context is a lower-order consideration How would we show that CASPER > Independent?

Outline Goal – Scene Understanding Existing Methods CASPER Preliminary Experiments Future Direction – Going Discriminative

Going Discriminative Up to now we have been generative: P(I, ρ, l) = P(I | ρ, l) P(ρ, l) How do we convert this into discriminative? Include CASPER distribution over (ρ,l) Include term with boosted object detectors Slap on a partition function P(ρ, l | I) = 1/Z * CASPER * DETECTORS

Discriminative Framework Boosted Detectors “Over detect” Each “candidate” has: location ρ t, class variable l t detection score D I (l t ) P(ρ, l | I) ~ P(ρ, l) Π D I (l t ) Goal: Reassign detection candidates to classes Respects the “detection strength” Respects the context between objects D I (face) = 0.09 D I (face) = 0.92

Similarities to Steve’s work “Over detection” using boosted detectors But some detections don’t make sense in context 3D information allows him to “sort out” which detections are correct

CASPER Learning/Inference Gibbs Inference Loop over images Loop over detection candidates t Sample (l t | everything else) Loop over pairs of candidates Sample (d e | everything else) Training l t is known, Gibbs over d e Evaluation Precision/Recall for detections

Possible Datasets

Short Term Plan Learn the boosted detectors Determine our baseline performance Add Gibbs inference Submit to a conference that is far far away… ICML = Helsinki, Finland

Alternate Names Spatial Priors for Arbitrary Groups of Objects

Product of Experts Precision Space View P1(x) = N(a, A) P2(x) = N(b, B) P1(x)P2(x) = Z N(c, C) Z = N(a ; b, B+A) C -1 = A -1 + B -1 c = C(A -1 a + B -1 b) What does this mean? Precision matrices of the experts ADD Even if each expert has a singular A -1 the sum is PSD

CASPER Detection Detection strength component DI(lt) = P(lt | I[ρt]) Occurrence component P(l) = Π P(lt)lt ~ Multinomial CASPER component P(ρ,d | l) ~ POEd