Computer vision: models, learning and inference Chapter 8 Regression.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Pattern Recognition and Machine Learning
CSC321: Introduction to Neural Networks and Machine Learning Lecture 24: Non-linear Support Vector Machines Geoffrey Hinton.
ECG Signal processing (2)
Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva Joint work-in-progress with Zoubin Ghahramani.
Classification / Regression Support Vector Machines
CS479/679 Pattern Recognition Dr. George Bebis
Pattern Recognition and Machine Learning
An Introduction of Support Vector Machine
Support Vector Machines
1 Lecture 5 Support Vector Machines Large-margin linear classifier Non-separable case The Kernel trick.
SVM—Support Vector Machines
Pattern Recognition and Machine Learning: Kernel Methods.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
CS Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct
Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction.
Computer vision: models, learning and inference Chapter 18 Models for style and identity.
Supervised Learning Recap
Computer vision: models, learning and inference
Computer vision: models, learning and inference
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Pattern Recognition and Machine Learning
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Margins, support vectors, and linear programming Thanks to Terran Lane and S. Dreiseitl.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
SVMs Finalized. Where we are Last time Support vector machines in grungy detail The SVM objective function and QP Today Last details on SVMs Putting it.
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Lecture 10: Support Vector Machines
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Computer vision: models, learning and inference
Computer vision: models, learning and inference Chapter 3 Common probability distributions.
SVMs, cont’d Intro to Bayesian learning. Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Computer vision: models, learning and inference Chapter 5 The Normal Distribution.
Computer vision: models, learning and inference Chapter 6 Learning and Inference in Vision.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
PATTERN RECOGNITION AND MACHINE LEARNING
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Outline Separating Hyperplanes – Separable Case
Computer vision: models, learning and inference Chapter 19 Temporal models.
Computer vision: models, learning and inference Chapter 19 Temporal models.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Christopher M. Bishop, Pattern Recognition and Machine Learning.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
Linear Models for Classification
Learning Kernel Classifiers Chap. 3.3 Relevance Vector Machine Chap. 3.4 Bayes Point Machines Summarized by Sang Kyun Lee 13 th May, 2005.
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
Computer vision: models, learning and inference Chapter 2 Introduction to probability.
Fitting normal distribution: ML 1Computer vision: models, learning and inference. ©2011 Simon J.D. Prince.
Kernel Regression Prof. Bennett Math Model of Learning and Discovery 1/28/05 Based on Chapter 2 of Shawe-Taylor and Cristianini.
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
CS Statistical Machine learning Lecture 7 Yuan (Alan) Qi Purdue CS Sept Acknowledgement: Sargur Srihari’s slides.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Support Vector Machines
PREDICT 422: Practical Machine Learning
Computer vision: models, learning and inference
CEE 6410 Water Resources Systems Analysis
Computer vision: models, learning and inference
CH 5: Multivariate Methods
Computer vision: models, learning and inference
Computer vision: models, learning and inference
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Pattern Recognition and Machine Learning
Biointelligence Laboratory, Seoul National University
Support Vector Machines
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Linear Discrimination
Support Vector Machines 2
Presentation transcript:

Computer vision: models, learning and inference Chapter 8 Regression

Structure 22Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Linear regression Bayesian solution Non-linear regression Kernelization and Gaussian processes Sparse linear regression Dual linear regression Relevance vector regression Applications

Models for machine vision 3Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Body Pose Regression Encode silhouette as 100x1 vector, encode body pose as 55 x1 vector. Learn relationship 4Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Type 1: Model Pr(w|x) - Discriminative How to model Pr(w|x)? – Choose an appropriate form for Pr(w) – Make parameters a function of x – Function takes parameters  that define its shape Learning algorithm: learn parameters  from training data x,w Inference algorithm: just evaluate Pr(w|x) 5Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Linear Regression For simplicity we will assume that each dimension of world is predicted separately. Concentrate on predicting a univariate world state w. Choose normal distribution over world w Make Mean a linear function of data x Variance constant 6Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Linear Regression 7Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Neater Notation To make notation easier to handle, we Attach a 1 to the start of every data vector Attach the offset to the start of the gradient vector  New model: 8Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Combining Equations We have one equation for each x,w pair: The likelihood of the whole dataset is the product of these individual distributions and can be written as where 9Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Learning Maximum likelihood Substituting in Take derivative, set result to zero and re-arrange: 10Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

11Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Regression Models 12Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Structure 13 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Linear regression Bayesian solution Non-linear regression Kernelization and Gaussian processes Sparse linear regression Dual linear regression Relevance vector regression Applications

Bayesian Regression Likelihood Prior (We concentrate on  – come back to  2 later!) Bayes rule’ 14Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Posterior Dist. over Parameters where 15Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Inference 16Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Practical Issue Problem: In high dimensions, the matrix A may be too big to invert Solution: Re-express using Matrix Inversion Lemma 17Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Final expression: inverses are (I x I), not (D x D)

Fitting Variance We’ll fit the variance with maximum likelihood Optimize the marginal likelihood (likelihood after gradients have been integrated out) 18Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Structure 19 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Linear regression Bayesian solution Non-linear regression Kernelization and Gaussian processes Sparse linear regression Dual linear regression Relevance vector regression Applications

Regression Models 20Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Non-Linear Regression GOAL: Keep the math of linear regression, but extend to more general functions KEY IDEA: You can make a non-linear function from a linear weighted sum of non-linear basis functions 21Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Non-linear regression Linear regression: Non-Linear regression: where In other words, create z by evaluating x against basis functions, then linearly regress against z. 22Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Example: polynomial regression 23Computer vision: models, learning and inference. ©2011 Simon J.D. Prince A special case of Where

Radial basis functions 24Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Arc Tan Functions 25Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Non-linear regression Linear regression: Non-Linear regression: where In other words, create z by evaluating x against basis functions, then linearly regress against z. 26Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Maximum Likelihood 27Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Same as linear regression, but substitute in Z for X:

Structure 28 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Linear regression Bayesian solution Non-linear regression Kernelization and Gaussian processes Sparse linear regression Dual linear regression Relevance vector regression Applications

Regression Models 29Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Bayesian Approach Learn  2 from marginal likelihood as before Final predictive distribution: 30Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

31Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

The Kernel Trick Notice that the final equation doesn’t need the data itself, but just dot products between data items of the form z i T z j So, we take data x i and x j pass through non-linear function to create z i and z j and then take dot products of different z i T z j 32Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

The Kernel Trick So, we take data x i and x j pass through non-linear function to create z i and z j and then take dot products of different z i T z j Key idea: Define a “kernel” function that does all of this together. Takes data x i and x j Returns a value for dot product z i T z j If we choose this function carefully, then it will correspond to some underlying z=f[x]. Never compute z explicitly - can be very high or infinite dimension 33Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Gaussian Process Regression Before 34Computer vision: models, learning and inference. ©2011 Simon J.D. Prince After

Example Kernels (Equivalent to having an infinite number of radial basis functions at every position in space. Wow!) 35Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

RBF Kernel Fits 36Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Fitting Variance We’ll fit the variance with maximum likelihood Optimize the marginal likelihood (likelihood after gradients have been integrated out) Have to use non-linear optimization 37Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Structure 38 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Linear regression Bayesian solution Non-linear regression Kernelization and Gaussian processes Sparse linear regression Dual linear regression Relevance vector regression Applications

Regression Models 39Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Sparse Linear Regression Perhaps not every dimension of the data x is informative A sparse solution forces some of the coefficients in  to be zero Method: – apply a different prior on  that encourages sparsity – product of t-distributions 40Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Sparse Linear Regression Apply product of t-distributions to parameter vector 41Computer vision: models, learning and inference. ©2011 Simon J.D. Prince As before, we use Now the prior is not conjugate to the normal likelihood. Cannot compute posterior in closed from

Sparse Linear Regression To make progress, write as marginal of joint distribution 42Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Diagonal matrix with hidden variables {h d } on diagonal

Sparse Linear Regression 43Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Substituting in the prior Still cannot compute, but can approximate

Sparse Linear Regression 44Computer vision: models, learning and inference. ©2011 Simon J.D. Prince To fit the model, update variance  2 and hidden variables {h d }. To choose hidden variables To choose variance where

Sparse Linear Regression After fitting, some of hidden variables become very big, implies prior tightly fitted around zero, can be eliminated from model 45Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Sparse Linear Regression Doesn’t work for non-linear case as we need one hidden variable per dimension – becomes intractable with high dimensional transformation. To solve this problem, we move to the dual model. 46Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Structure 47 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Linear regression Bayesian solution Non-linear regression Kernelization and Gaussian processes Sparse linear regression Dual linear regression Relevance vector regression Applications

Dual Linear Regression KEY IDEA: Gradient  is just a vector in the data space Can represent as a weighted sum of the data points Now solve for  One parameter per training example. 48Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Dual Linear Regression 49Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Original linear regression: Dual variables: Dual linear regression:

Maximum likelihood 50Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Maximum likelihood solution: Dual variables: Same result as before:

Bayesian case 51Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Compute distribution over parameters: Gives result: where

Bayesian case 52Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Predictive distribution: where: Notice that in both the maximum likelihood and Bayesian case depend on dot products X T X. Can be kernelized!

Structure 53 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Linear regression Bayesian solution Non-linear regression Kernelization and Gaussian processes Sparse linear regression Dual linear regression Relevance vector regression Applications

Regression Models 54Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Relevance Vector Machine Combines ideas of Dual regression (1 parameter per training example) Sparsity (most of the parameters are zero) i.e., model that only depends sparsely on training data. 55Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Relevance Vector Machine 56Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Using same approximations as for sparse model we get the problem: To solve, update variance  2 and hidden variables {h d } alternately. Notice that this only depends on dot-products and so can be kernelized

Structure 57 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince Linear regression Bayesian solution Non-linear regression Kernelization and Gaussian processes Sparse linear regression Dual linear regression Relevance vector regression Applications

Body Pose Regression (Agarwal and Triggs 2006) Encode silhouette as 100x1 vector, encode body pose as 55 x1 vector. Learn relationship 58Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Shape Context Returns 60 x 1 vector for each of 400 points around the silhouette 59Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Dimensionality Reduction Cluster 60D space (based on all training data) into 100 vectors Assign each 60x1 vector to closest cluster (Voronoi partition) Final data vector is 100x1 histogram over distribution of assignments 60Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Results 2636 training examples, solution depends on only 6% of these 6 degree average error 61Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Displacement experts 62Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Regression Not actually used much in vision But main ideas all apply to classification: – Non-linear transformations – Kernelization – Dual parameters – Sparse priors 63Computer vision: models, learning and inference. ©2011 Simon J.D. Prince