Aapo Hyvärinen and Ella Bingham

Connection between Multilayer Perceptrons and Regression Using Independent Component Analysis
Aapo Hyvärinen and Ella Bingham Preliminary version appeared in Proc. ICANN'99 Summarized by Seong-woo Chung

(C) 2001, SNU CSE Biointelligence Lab
Introduction Express observed random variables x1, x2, …, xq as linear combinations of unknown component variables, denoted by s1, s2, …, sn ( n>=q for nonsingular joint density) The variables in x are divided into two parts, observed and missing So k first variables form the vector of the observed variables xo=(x1, …, xk)T , and the remaining variables forum the vector of the missing variables xm=(xk+1, …, xq)T (C) 2001, SNU CSE Biointelligence Lab

Introduction(Continued)
The problem is to predict xm for a given observation of xo The regression is conventionally defined as the conditional expectation Model the joint density of x by ICA, and then, for a given sample of incomplete data, predict the missing values in xm using the conditional expectation, which is well defined once the ICA model has been estimated (C) 2001, SNU CSE Biointelligence Lab

Regression by ICA and by an MLP: The connection
Denote the probability densities of the si by pi , and gi(u) = p´i(u) / pi(u) + cu The regression function for data modeled by ICA, is given by the output of an MLP with one hidden layer The weight vectors of the MLP are simple functions of the mixing matrix, and the nonlinear activation functions of the MLP are functions of the probability densities of the si The vector AoTxo can be interpreted as an initial linear estimate of s The nonlinear aspect of g() consists largely of thresholding the linear estimates of s, to obtain s= g(AoTxo) The final linear layer is basically a linear reconstruction of the form xm = Amŝ (C) 2001, SNU CSE Biointelligence Lab

Simulation Simulation data is 100-dimensional and there are data samples The independent components, generated according to some probability density are mixed using a randomly generated n×n mixing matrix The mixtures x are divided into observed (xo) and missing (xm) The dimensionality of xo is 99 and the dimensionality of xm is 1 The variables in xo are uncorrelated and their variance is set to one A training data set of size and a test data set of size 1000 The ICA estimation on the training data set give the estimated values for the source signals s and the mixing matrix A The value of the missing variable xm is predicted either using numerical integration or using approximation method (C) 2001, SNU CSE Biointelligence Lab

Simulation – Strongly Supergaussian Data
(C) 2001, SNU CSE Biointelligence Lab

Simulation – Laplace Distributed Data

Simulation – Very Weakly Supergaussian Data

Conclusion Approximation If the distributions of the independent components are close to gaussian, it gives excellent results If they are strongly supergaussian, the approximation is less accurate but still quite reasonable in the range we experimented with Regression The stronger the supergaussianity, the better the quality of the regression In contrast, for weakly supergaussian components, ICA regression does not really explain the data (C) 2001, SNU CSE Biointelligence Lab

Discussion Regression by ICA is computationally demanding, due to the integration The integration may be approximated by the computationally simple procedure of computing the outputs of an MLP The output of each hidden-layer neuron corresponds to estimation of one of the independent components The choice of the nonlinearity is a problem of estimating the probability densities of the independent components (C) 2001, SNU CSE Biointelligence Lab

Aapo Hyvärinen and Ella Bingham

Similar presentations

Presentation on theme: "Aapo Hyvärinen and Ella Bingham"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Aapo Hyvärinen and Ella Bingham

Similar presentations

Presentation on theme: "Aapo Hyvärinen and Ella Bingham"— Presentation transcript:

Similar presentations

About project

Feedback