Download presentation
1
A Unifying Review of Linear Gaussian Models
Summary Presentation 2/15/10 – Dae Il Kim Department of Computer Science Graduate Student Advisor: Erik Sudderth Ph.D.
2
Overview Introduce the Basic Model
Discrete Time Linear Dynamical System (Kalman Filter) Some nice properties of Gaussian distributions Graphical Model: Static Model (Factor Analysis, PCA, SPCA) Learning & Inference: Static Model Graphical Model: Gaussian Mixture & Vector Quantization Learning & Inference: GMMs & Quantization Graphical Model: Discrete-State Dynamic Model (HMMs) Independent Component Analysis Conclusion
3
The Basic Model Basic Model: Discrete Time Linear Dynamical System (Kalman Filter) Additive Gaussian Noise A = k x k state transition matrix C = p x k observation / generative matrix Generative Model Variations of this model produce: Factor Analysis Principal Component Analysis Mixtures of Gaussians Vector Quantization Independent Component Analysis Hidden Markov Models
4
Nice Properties of Gaussians
Conditional Independence Markov Property Inference in these models Learning via Expectation Maximization (EM)
5
Graphical Model for Static Models
Generative Model Additive Gaussian Noise Intution: White noise to generate a spherical ball (Q=I) of density in k-dimensional space (latent space). Stretched and then rotated into p-dimensional observation space by matrix C, where it would look like a k-dimensional pancake. This is then convolved with the covariance density from v. described by R to get the final model for y. Factor Analysis: Q = I & R is diagonal SPCA: Q = I & R = αI PCA: Q = I & R = lime0eI
6
Example of the generative process for PCA
Bishop (2006) Intution: White noise to generate a spherical ball (Q=I) of density in k-dimensional space (latent space). Stretched and then rotated into p-dimensional observation space by matrix C, where it would look like a k-dimensional pancake. This is then convolved with the covariance density from v. described by R to get the final model for y. 2-dimensional observation space Marginal distribution for p(x) 1-dimensional latent space Z = latent variable X = observed variable
7
Learning & Inference: Static Models
Analytically integrating over the joint, we obtain the marginal distribution of y. We can calculate our poterior using Bayes rule Our posterior now becomes another Gaussian Where beta is equal to: Note: Filtering and Smoothing reduce to the same problem in the static model since the time dependence is gone. We want to find P(x.|y.) over a single hidden state given the single observation. Inference can be performed simply by linear matrix projection and the result is also Gaussian.
8
Graphical Model: Gaussian Mixture Models & Vector Quantization
Note: Each state x. is generated independently according to a fixed discrete probability histogram controlled by the mean and covariance of w. Generative Model Additive Gaussian Noise (Winner Takes All - WTA)[x] = new vector with unity in the position of the largest coordinate of the input and zeros in all other positions. [0 0 1 ] This model becomes a Vector Quantization model when:
9
Learning & Inference: GMMs & Quantization
Computing the Likelihood for the data is straightforward Pi is the probability assigned by the Gaussian N(mu,Q) to the region of k-space in which the jth coordinate is larger than all the others. Calculating the posterior responsibility for each cluster is analagous to the E-Step in this model.
10
Gaussian Mixture Models
Joint Distribution p(y,x) Marginal Distribution p(y) Pi is the probability assigned by the Gaussian N(mu,Q) to the region of k-space in which the jth coordinate is larger than all the others.
11
Graphical Model: Discrete-State Dynamic Models
Additive Gaussian Noise Generative Model Intuition: As before, any point in the state-space is surrounded by a ball (or ovoid) of density defined by Q, which is stretched by C into a pancake in observation space to be convolved with the observation noise covariance R. However, unlike the static case, where the ball was centered in the origin of state-space, the center of the ball shifts from time step to time step. This flow is routed by the eigenvalues and eigenvectors of the matrix A. Once we move to a new point, we center our ball on that point, pick a new state, and then flow to that new point and apply noise.
12
Independent Component Analysis
ICA can be seen as a linear generative model with non-gaussian priors for the hidden variables or as a nonlinear generative model with gaussian priors for the hidden variables. g(.) is a general nonlinearity that is invertible and differentiable Generative Model The posterior density p(x.|y.) is a delta function at x. = C-1y. The ICA algorithm can be defined by learning the unmixing or recognition weights W rather than the generative mixing weights C. Note that any generative nonlinearity g(.) results in a non-Gaussian prior p(x), which in turn results in a nonlinear f(x) in the maximum likelihood rule. The gradient learning rule to increase the likelihood:
13
Conclusion Many more potential models!
The posterior density p(x.|y.) is a delta function at x. = C-1y. The ICA algorithm can be defined by learning the unmixing or recognition weights W rather than the generative mixing weights C. Note that any generative nonlinearity g(.) results in a non-Gaussian prior p(x), which in turn results in a nonlinear f(x) in the maximum likelihood rule. Many more potential models!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.