Learning Inhomogeneous Gibbs Models Ce Liu

Slides:



Advertisements
Similar presentations
A Two-Step Approach to Hallucinating Faces: Global Parametric Model and Local Nonparametric Model Ce Liu Heung-Yeung Shum Chang Shui Zhang CVPR 2001.
Advertisements

Bayesian Belief Propagation
CSCE643: Computer Vision Bayesian Tracking & Particle Filtering Jinxiang Chai Some slides from Stephen Roth.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Introduction of Markov Chain Monte Carlo Jeongkyun Lee.
Markov-Chain Monte Carlo
Learning to estimate human pose with data driven belief propagation Gang Hua, Ming-Hsuan Yang, Ying Wu CVPR 05.
Computer Vision Lab. SNU Young Ki Baik An Introduction to MCMC for Machine Learning (Markov Chain Monte Carlo)
Optimization & Learning for Registration of Moving Dynamic Textures Junzhou Huang 1, Xiaolei Huang 2, Dimitris Metaxas 1 Rutgers University 1, Lehigh University.
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.
BAYESIAN INFERENCE Sampling techniques
Statistics of natural images May 30, 2010 Ofer Bartal Alon Faktor 1.
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
Lecture 5: Learning models using EM
Dimensional reduction, PCA
Machine Learning CUNY Graduate Center Lecture 7b: Sampling.
Proposes functions to measure Gestalt features of shapes Adapts [Zhu, Wu Mumford] FRAME method to shapes Exhibits effect of MRF model obtained by putting.
L15:Microarray analysis (Classification). The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Machine Learning CMPT 726 Simon Fraser University
Independent Component Analysis (ICA) and Factor Analysis (FA)
Today Introduction to MCMC Particle filters and MCMC
Statistical analysis and modeling of neural data Lecture 4 Bijan Pesaran 17 Sept, 2007.
Bayesian Learning Rong Jin.
Nonlinear Dimensionality Reduction Approaches. Dimensionality Reduction The goal: The meaningful low-dimensional structures hidden in their high-dimensional.
Image Analysis and Markov Random Fields (MRFs) Quanren Xiong.
Bayes Factor Based on Han and Carlin (2001, JASA).
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
1 Physical Fluctuomatics 5th and 6th Probabilistic information processing by Gaussian graphical model Kazuyuki Tanaka Graduate School of Information Sciences,
INDEPENDENT COMPONENT ANALYSIS OF TEXTURES based on the article R.Manduchi, J. Portilla, ICA of Textures, The Proc. of the 7 th IEEE Int. Conf. On Comp.
TTH 1:30-2:48 Winter DL266 CIS 788v04 Zhu Topic 5. Human Faces Human face is extensively studied.
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
An Efficient Approach to Learning Inhomogenous Gibbs Models Ziqiang Liu, Hong Chen, Heung-Yeung Shum Microsoft Research Asia CVPR 2003 Presented by Derek.
Overview Particle filtering is a sequential Monte Carlo methodology in which the relevant probability distributions are iteratively estimated using the.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
Markov Random Fields Probabilistic Models for Images
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Maximum a posteriori sequence estimation using Monte Carlo particle filters S. J. Godsill, A. Doucet, and M. West Annals of the Institute of Statistical.
Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,
5. Maximum Likelihood –II Prof. Yuille. Stat 231. Fall 2004.
Data Mining Course 2007 Eric Postma Clustering. Overview Three approaches to clustering 1.Minimization of reconstruction error PCA, nlPCA, k-means clustering.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
RECITATION 2 APRIL 28 Spline and Kernel method Gaussian Processes Mixture Modeling for Density Estimation.
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
10 October, 2007 University of Glasgow 1 EM Algorithm with Markov Chain Monte Carlo Method for Bayesian Image Analysis Kazuyuki Tanaka Graduate School.
Markov Chain Monte Carlo in R
Introduction to Sampling based inference and MCMC
Biointelligence Laboratory, Seoul National University
Learning Deep Generative Models by Ruslan Salakhutdinov
LECTURE 11: Advanced Discriminant Analysis
LECTURE 10: EXPECTATION MAXIMIZATION (EM)
Overview G. Jogesh Babu.
Graduate School of Information Sciences, Tohoku University
Outline Statistical Modeling and Conceptualization of Visual Patterns
Outline S. C. Zhu, X. Liu, and Y. Wu, “Exploring Texture Ensembles by Efficient Markov Chain Monte Carlo”, IEEE Transactions On Pattern Analysis And Machine.
Graduate School of Information Sciences, Tohoku University
Outline Texture modeling - continued Julesz ensemble.
Expectation-Maximization & Belief Propagation
Graduate School of Information Sciences, Tohoku University
Markov Networks.
Graduate School of Information Sciences, Tohoku University
Yalchin Efendiev Texas A&M University
Maximum Likelihood Estimation (MLE)
Outline Texture modeling - continued Markov Random Field models
Presentation transcript:

Learning Inhomogeneous Gibbs Models Ce Liu

How to Describe the Virtual World

Histogram  Histogram: marginal distribution of image variances  Non Gaussian distributed

Texture Synthesis (Heeger et al, 95)  Image decomposition by steerable filters  Histogram matching

FRAME (Zhu et al, 97)  Homogeneous Markov random field (MRF)  Minimax entropy principle to learn homogeneous Gibbs distribution  Gibbs sampling and feature selection

Our Problem  To learn the distribution of structural signals  Challenges How to learn non-Gaussian distributions in high dimensions with small observations?How to learn non-Gaussian distributions in high dimensions with small observations? How to capture the sophisticated properties of the distribution?How to capture the sophisticated properties of the distribution? How to optimize parameters with global convergence?How to optimize parameters with global convergence?

Inhomogeneous Gibbs Models (IGM) A framework to learn arbitrary high-dimensional distributions 1D histograms on linear features to describe high- dimensional distribution1D histograms on linear features to describe high- dimensional distribution Maximum Entropy Principle– Gibbs distributionMaximum Entropy Principle– Gibbs distribution Minimum Entropy Principle– Feature PursuitMinimum Entropy Principle– Feature Pursuit Markov chain Monte Carlo in parameter optimizationMarkov chain Monte Carlo in parameter optimization Kullback-Leibler Feature (KLF)Kullback-Leibler Feature (KLF)

1D Observation: Histograms  Feature  (x): R d → R Linear feature  (x)=  T xLinear feature  (x)=  T x Kernel distance  (x)=||  x||Kernel distance  (x)=||  x||  Marginal distribution  Histogram

Intuition

Learning Descriptive Models =

 Sufficient features can make the learnt model f(x) converge to the underlying distribution p(x)  Linear features and histograms are robust compared with other high-order statistics  Descriptive models

Maximum Entropy Principle  Maximum Entropy Model To generalize the statistical properties in the observedTo generalize the statistical properties in the observed To make the learnt model present information no more than what is availableTo make the learnt model present information no more than what is available  Mathematical formulation

Intuition of Maximum Entropy Principle

 Solution form of maximum entropy model  Parameter: Inhomogeneous Gibbs Distribution Gibbs potential

Estimating Potential Function  Distribution form  Normalization  Maximizing Likelihood Estimation (MLE)  1 st and 2 nd order derivatives

Parameter Learning  Monte Carlo integration  Algorithm

Gibbs Sampling x y

Minimum Entropy Principle  Minimum entropy principle To make the learnt distribution close to the observedTo make the learnt distribution close to the observed  Feature selection

Feature Pursuit  A greedy procedure to learn the feature set  Reference model  Approximate information gain

Proposition The approximate information gain for a new feature is The approximate information gain for a new feature is and the optimal energy function for this feature is and the optimal energy function for this feature is

Kullback-Leibler Feature  Kullback-Leibler Feature  Pursue feature by Hybrid Monte CarloHybrid Monte Carlo Sequential 1D optimizationSequential 1D optimization Feature selectionFeature selection

Acceleration by Importance Sampling  Gibbs sampling is too slow…  Importance sampling by the reference model

Flowchart of IGM IGM Syn Samples Obs Samples Feature Pursuit KL Feature KL<  Output MCMC Obs Histograms N Y

Toy Problems (1) Synthesized samples Gibbs potential Observed histograms Synthesized histograms Feature pursuit Mixture of two Gaussians Circle

Toy Problems (2) Swiss Roll

Applied to High Dimensions  In high-dimensional space Too many features to constrain every dimensionToo many features to constrain every dimension MCMC sampling is extremely slowMCMC sampling is extremely slow  Solution: dimension reduction by PCA  Application: learning face prior model 83 landmarks defined to represent face (166d)83 landmarks defined to represent face (166d) 524 samples524 samples

Face Prior Learning (1) Observed face examplesSynthesized face samples without any features

Face Prior Learning (2) Synthesized with 10 featuresSynthesized with 20 features

Face Prior Learning (3) Synthesized with 30 featuresSynthesized with 50 features

Observed Histograms

Synthesized Histograms

Gibbs Potential Functions

Learning Caricature Exaggeration

Synthesis Results

Learning 2D Gibbs Process

Thank you! CSAIL