Aapo Hyvärinen and Ella Bingham

Slides:



Advertisements
Similar presentations
Biointelligence Laboratory, Seoul National University
Advertisements

« هو اللطیف » By : Atefe Malek. khatabi Spring 90.
On-Line Probabilistic Classification with Particle Filters Pedro Højen-Sørensen, Nando de Freitas, and Torgen Fog, Proceedings of the IEEE International.
Visual Recognition Tutorial
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
ICA Alphan Altinok. Outline  PCA  ICA  Foundation  Ambiguities  Algorithms  Examples  Papers.
Radial Basis Function Networks
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Survey on ICA Technical Report, Aapo Hyvärinen, 1999.
Biointelligence Laboratory, Seoul National University
Radial Basis Function Networks:
Yaomin Jin Design of Experiments Morris Method.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Eurostat Statistical matching when samples are drawn according to complex survey designs Training Course «Statistical Matching» Rome, 6-8 November 2013.
Gaussian Processes Li An Li An
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Computer Vision Lecture 6. Probabilistic Methods in Segmentation.
Feature selection with Neural Networks Dmitrij Lagutin, T Variable Selection for Regression
Geology 6600/7600 Signal Analysis 02 Sep 2015 © A.R. Lowry 2015 Last time: Signal Analysis is a set of tools used to extract information from sequences.
CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
פרקים נבחרים בפיסיקת החלקיקים אבנר סופר אביב
Principal Component Analysis (PCA)
Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression.
Introduction to Independent Component Analysis Math 285 project Fall 2015 Jingmei Lu Xixi Lu 12/10/2015.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
- 1 - Preliminaries Multivariate normal model (section 3.6, Gelman) –For a multi-parameter vector y, multivariate normal distribution is where  is covariance.
Computacion Inteligente Least-Square Methods for System Identification.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
Chapter 7. Classification and Prediction
Deep Feedforward Networks
Ch 12. Continuous Latent Variables ~ 12
Regression Analysis Module 3.
Logistic Regression Gary Cottrell 6/8/2018
Multiple Imputation using SOLAS for Missing Data Analysis
Ch3: Model Building through Regression
Overview of Supervised Learning
Kernel Stick-Breaking Process
Hidden Markov Models Part 2: Algorithms
Hidden Markov Autoregressive Models
Akio Utsugi National Institute of Bioscience and Human-technology,
Neural Networks Advantages Criticism
Training a Neural Network
SMEM Algorithm for Mixture Models
Neuro-Computing Lecture 4 Radial Basis Function Network
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Artificial Intelligence Chapter 3 Neural Networks
Presented by Nagesh Adluru
Word Embedding Word2Vec.
Multi-Layer Perceptron
Biointelligence Laboratory, Seoul National University
Artificial Intelligence Chapter 3 Neural Networks
A Fast Fixed-Point Algorithm for Independent Component Analysis
The Naïve Bayes (NB) Classifier
Introduction to Radial Basis Function Networks
Artificial Intelligence Chapter 3 Neural Networks
Parametric Methods Berlin Chen, 2005 References:
Learning From Observed Data
Biointelligence Laboratory, Seoul National University
Multivariate Methods Berlin Chen
Artificial Intelligence Chapter 3 Neural Networks
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
Multivariate Methods Berlin Chen, 2005 References:
Machine Learning – a Probabilistic Perspective
Restructuring Sparse High Dimensional Data for Effective Retrieval
Artificial Intelligence Chapter 3 Neural Networks
Uncertainty Propagation
Presentation transcript:

Connection between Multilayer Perceptrons and Regression Using Independent Component Analysis Aapo Hyvärinen and Ella Bingham Preliminary version appeared in Proc. ICANN'99 Summarized by Seong-woo Chung 2001.9.14

(C) 2001, SNU CSE Biointelligence Lab Introduction Express observed random variables x1, x2, …, xq as linear combinations of unknown component variables, denoted by s1, s2, …, sn ( n>=q for nonsingular joint density) The variables in x are divided into two parts, observed and missing So k first variables form the vector of the observed variables xo=(x1, …, xk)T , and the remaining variables forum the vector of the missing variables xm=(xk+1, …, xq)T (C) 2001, SNU CSE Biointelligence Lab

Introduction(Continued) The problem is to predict xm for a given observation of xo The regression is conventionally defined as the conditional expectation Model the joint density of x by ICA, and then, for a given sample of incomplete data, predict the missing values in xm using the conditional expectation, which is well defined once the ICA model has been estimated (C) 2001, SNU CSE Biointelligence Lab

Regression by ICA and by an MLP: The connection Denote the probability densities of the si by pi , and gi(u) = p´i(u) / pi(u) + cu The regression function for data modeled by ICA, is given by the output of an MLP with one hidden layer The weight vectors of the MLP are simple functions of the mixing matrix, and the nonlinear activation functions of the MLP are functions of the probability densities of the si The vector AoTxo can be interpreted as an initial linear estimate of s The nonlinear aspect of g() consists largely of thresholding the linear estimates of s, to obtain s= g(AoTxo) The final linear layer is basically a linear reconstruction of the form xm = Amŝ (C) 2001, SNU CSE Biointelligence Lab

(C) 2001, SNU CSE Biointelligence Lab Simulation Simulation data is 100-dimensional and there are 101000 data samples The independent components, generated according to some probability density are mixed using a randomly generated n×n mixing matrix The mixtures x are divided into observed (xo) and missing (xm) The dimensionality of xo is 99 and the dimensionality of xm is 1 The variables in xo are uncorrelated and their variance is set to one A training data set of size 100000 and a test data set of size 1000 The ICA estimation on the training data set give the estimated values for the source signals s and the mixing matrix A The value of the missing variable xm is predicted either using numerical integration or using approximation method (C) 2001, SNU CSE Biointelligence Lab

Simulation – Strongly Supergaussian Data (C) 2001, SNU CSE Biointelligence Lab

Simulation – Laplace Distributed Data (C) 2001, SNU CSE Biointelligence Lab

Simulation – Very Weakly Supergaussian Data (C) 2001, SNU CSE Biointelligence Lab

(C) 2001, SNU CSE Biointelligence Lab Conclusion Approximation If the distributions of the independent components are close to gaussian, it gives excellent results If they are strongly supergaussian, the approximation is less accurate but still quite reasonable in the range we experimented with Regression The stronger the supergaussianity, the better the quality of the regression In contrast, for weakly supergaussian components, ICA regression does not really explain the data (C) 2001, SNU CSE Biointelligence Lab

(C) 2001, SNU CSE Biointelligence Lab Discussion Regression by ICA is computationally demanding, due to the integration The integration may be approximated by the computationally simple procedure of computing the outputs of an MLP The output of each hidden-layer neuron corresponds to estimation of one of the independent components The choice of the nonlinearity is a problem of estimating the probability densities of the independent components (C) 2001, SNU CSE Biointelligence Lab