Learning Multiplicative Interactions many slides from Hinton
Two different meanings of “multiplicative” multiplicative interactions between latent variables. If we take two density models and multiply together their probability distributions at each point in data-space, we get a “product of experts”. –The product oftwo Gaussian experts is a Gaussian. If we take two variables and we multiply them together to provide input to a third variable we get a “multiplicative interaction”. –The distribution of the product of two Gaussian- distributed variables is NOT Gaussian distributed. It is a heavy-tailed distribution. One Gaussian determines the standard deviation of the other Gaussian. –Heavy-tailed distributions are the signatures of
Learning multiplicative interactions form a bi-partite graph. It is fairly easy to learn multiplicative interactions if all of the variables are observed. –This is possible if we control the variables used to create a training set (e.g. pose, lighting, identity …) It is also easy to learn energy-based models in which all but one of the terms in each multiplicative interaction are observed. –Inference is still easy. If more than one of the terms in each multiplicative interaction are unobserved, the interactions between hidden variables make inference difficult. –Alternating Gibbs can be used if the latent variables
Higher order Boltzmann machines (Sejnowski, ~1986) The usual energy function is quadratic in the states: But we could use higher order interactions: Hidden unit h acts as a switch. When h is on, it switches in the pairwise interaction between unit i and unit j. – Units i and j can also be viewed as switches that control the pairwise interactions between j and h or between i and h.
Using higher-order Boltzmann machines to model image transformations (Memisevic and Hinton, 2007) A global transformation specifies which pixel goes to which other pixel. Conversely, each pair of similar intensity pixels, one in each image, votes for a particular global transformation. image transformation image(t)image(t+1)
Using higher-order Boltzmann machines to model image transformations (1) (2)
Making the reconstruction easier Condition on the first image so that only one visible group needs to be reconstructed. – Given the hidden states and the previous image, the pixels in the second image are conditionally independent. image transformation image(t)image(t+1)
The main problem with 3-way interactions
Factoring three-way interactions We use factors that correspond to 3-way outer- products. E s i s j s h w ijh i, j,h unfactored E s i s j s h w if w jf w hf fi, j,h factored w jf w hf w if
(Ranzato, Krizhevsky and Hinton, 2010) Joint 3-way model Model the covariance structure of natural images. The visible units are two identical copies Factored 3-Way Restricted Boltzmann Machines For Modeling Natural Images
A powerful module for deep learning
Producing reconstructions using hybrid Monte Carlo
(Hinton et al., 2011) describe a generative model of the relationship between two images The model is defined as a factored three-way Boltzmann machine, in which hidden variables collaborate to define the joint correlation matrix for image pairs Modeling the joint density of two images under a variety of tranformations
Model
Three-way contrastive Divergence
Thank you