Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
Lecture 4 A family of graphical models We will see some example models: – Mixture models – Factor analysis – Hidden Markov Models – Dynamic Bayesian Networks – Coupled Hidden Markov Models Inference and Learning
So why are graphical models relevant to Cognitive CV? Precisely because they allows us to see different methods as instances of a broader probabilistic framework These methods are the basis for our model of perception guided by expectation We can put our model of expectation on a solid theoretical foundation We can develop well-founded methods of learning rather than just being stuck with hand-coded models
Reminder from previous lecture … A probabilistic graphical model is a type of probabilistic network that has roots in AI, statistics and neural networks Provides a clean mathematical formalism that makes it possible to understand the relationships between a wide variety of network based approaches to computation Allows to see different methods as instances of a broader probabilistic framework
A taxonomy of graphical models © Julie Vogel
Notation - reminder Squares denote discrete nodes Circles denote continuous valued nodes Clear denotes hidden node Shaded denotes observed node B A C
Mixture model as a graphical model Data point drawn from a fixed set of class in Y, but the class label for each data point is ‘missing’ X = class label x (hidden) Y = observed data point P(X=x,Y=y) = P(X=x).P(Y=y|X=x) Learning problem is to find P(Y|X) Inference problem is P(X=x|Y=y) Let’s try to put this in a vision context … Y X
Mixture model as a graphical model Let’s say our class label X has 4 possible values (wall,bush,sign, road) Each pixel in the image has a grey level y But we don’t know which class each pixel belongs to P(X=x,Y=y) = P(X=x).P(Y=y|X=x) Learning problem is to find P(Y|X) Inference problem is P(X=x|Y=y) Haven’t said how to learn P(Y|X) yet
Mixture model as a graphical model But we might have a spec for P(Y|X) y p(y|x) x=sign x=bush x=road x=wall
Mixture model as a graphical model How is this a generative model? – From lecture 2: – Estimate P(X) somehow – Calculate P(X,Y) using the training data (the set of pixels in our image) using P(X,Y) = P(Y|X). P(X) from the graphical model – P(Y|X) is the previous slide – we didn’t say how we learned it – Calculate P(X|Y) using Bayes Rule. This would assign each pixel to a class based on its grey level (classifier mode) To use generatively, use P(X) and sample P(X,Y) although this would produce a set of pixels without any spatial structure!!!
Factor analysis as graphical model In fact, this example of a mixture model where the underlying variable X is continuous (e.g. a Gaussian or normal distribution) is actually factor analysis
Factor analysis as graphical model y y p(y|x) w Factor analysis represents a high-dimensional vector Y as a linear combination of low dimensional features X
State space models State space models “roll out” their structure over time The graphical model shows which variable sets dictate how the model parameters will change over time – Hidden Markov Models – Dynamic Bayesian Networks – Kalman Filter Models – Coupled Hidden Markov Models
Hidden Markov Models Y1Y1 Y2Y2 Y3Y3 YTYT X1X1 X2X2 X3X3 XTXT
Dynamic Bayesian Networks DBNs have more than one (here up to N) underlying models X of behaviour and can switch from model to model with time Y1Y1 Y2Y2 Y3Y3 YTYT X11X11 X22X22 X33X33 XTNXTN
Coupled Hidden Markov Models Observation variable 1 Observation variable 2 Oliver, Rosario & Pentland, 1999)
Coupled Hidden Markov Models Observation variable 1 Observation variable 2 For a coupled model, the timescales associated with each underlying hidden model do not need to be the same Useful for coupling multi-modal signals such as audio and video
Inference and Learning Inference – Calculating a probability over a set of nodes given the values of other nodes – Estimating the value of hidden nodes given the values of the observed nodes – Computing statistical and information theoretic quantities like fit between model and observed data (likelihood), mutual information … Exact and approximate methods
Inference and Learning Learning – Learn parameters and/or structure from data – Maximise fit between model and observed data – Fixed model structure - discover best parameter set – Learn structure using information criteria (“scoring”) Structure Observability FullPartial KnownClosed formExpectation Maximisation (EM) UnknownLocal searchStructural EM
Summary Graphical models can be seen a family of models that allow different vision problems to be solved For example, a mixture model is a directed Bayes net which could be used to classify pixels State space models, for example the HMM, show how variable sets determine model parameters over time
Next time … Bayes Rule and Bayesian Networks A lot of excellent reference material on graphical models can be found at: