An Information-Maximization Approach to Blind Separation and Blind Deconvolution A.J. Bell and T.J. Sejnowski Computational Modeling of Intelligence 11.03.11.(Fri)

Slides:



Advertisements
Similar presentations
The Helmholtz Machine P Dayan, GE Hinton, RM Neal, RS Zemel
Advertisements

Independent Component Analysis
Independent Component Analysis: The Fast ICA algorithm
Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Supervised Learning Recap
DATA-MINING Artificial Neural Networks Alexey Minin, Jass 2006.
Artificial Neural Networks - Introduction -
Artificial Neural Networks - Introduction -
REAL-TIME INDEPENDENT COMPONENT ANALYSIS IMPLEMENTATION AND APPLICATIONS By MARCOS DE AZAMBUJA TURQUETI FERMILAB May RTC 2010.
Independent Component Analysis (ICA)
Application of Statistical Techniques to Neural Data Analysis Aniket Kaloti 03/07/2006.
Fundamental limits in Information Theory Chapter 10 :
MACHINE LEARNING - Doctoral Class - EDIC EPFL A.. Billard MACHINE LEARNING Information Theory and The Neuron - II Aude.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Giansalvo EXIN Cirrincione unit #7/8 ERROR FUNCTIONS part one Goal for REGRESSION: to model the conditional distribution of the output variables, conditioned.
Independent Component Analysis (ICA) and Factor Analysis (FA)
A Quick Practical Guide to PCA and ICA Ted Brookings, UCSB Physics 11/13/06.
Independent Component Analysis From PCA to ICA Bell Sejnowski algorithm Kurtosis method Demonstrations.
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Linear and Non-Linear ICA-BSS I C A  Independent Component Analysis B S S  Blind Source Separation Carlos G. Puntonet Dept.of Architecture.
Information Fusion Yu Cai. Research Article “Comparative Analysis of Some Neural Network Architectures for Data Fusion”, Authors: Juan Cires, PA Romo,
CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.
Hazırlayan NEURAL NETWORKS Radial Basis Function Networks II PROF. DR. YUSUF OYSAL.
Survey on ICA Technical Report, Aapo Hyvärinen, 1999.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
ERP DATA ACQUISITION & PREPROCESSING EEG Acquisition: 256 scalp sites; vertex recording reference (Geodesic Sensor Net)..01 Hz to 100 Hz analogue filter;
§4 Continuous source and Gaussian channel
Natural Gradient Works Efficiently in Learning S Amari (Fri) Computational Modeling of Intelligence Summarized by Joon Shik Kim.
Heart Sound Background Noise Removal Haim Appleboim Biomedical Seminar February 2007.
Outline What Neural Networks are and why they are desirable Historical background Applications Strengths neural networks and advantages Status N.N and.
Parallel ICA Algorithm and Modeling Hongtao Du March 25, 2004.
Independent Component Analysis Zhen Wei, Li Jin, Yuxue Jin Department of Statistics Stanford University An Introduction.
Mestrado em Ciência de Computadores Mestrado Integrado em Engenharia de Redes e Sistemas Informáticos VC 14/15 – TP19 Neural Networks & SVMs Miguel Tavares.
Blind Source Separation by Independent Components Analysis Professor Dr. Barrie W. Jervis School of Engineering Sheffield Hallam University England
Channel Capacity.
1 A Randomized Space-Time Transmission Scheme for Secret-Key Agreement Xiaohua (Edward) Li 1, Mo Chen 1 and E. Paul Ratazzi 2 1 Department of Electrical.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
An Introduction to Blind Source Separation Kenny Hild Sept. 19, 2001.
EXPERIMENT DESIGN  Variations in Channel Density  The original 256-channel data were downsampled:  127 channel datasets  69 channels datasets  34.
BCS547 Neural Decoding.
Computational Intelligence: Methods and Applications Lecture 8 Projection Pursuit & Independent Component Analysis Włodzisław Duch Dept. of Informatics,
CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.
Dr.Abeer Mahmoud ARTIFICIAL INTELLIGENCE (CS 461D) Dr. Abeer Mahmoud Computer science Department Princess Nora University Faculty of Computer & Information.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Independent Component Analysis Independent Component Analysis.
Introduction to Independent Component Analysis Math 285 project Fall 2015 Jingmei Lu Xixi Lu 12/10/2015.
An Introduction of Independent Component Analysis (ICA) Xiaoling Wang Jan. 28, 2003.
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
CSC2535: Computation in Neural Networks Lecture 7: Independent Components Analysis Geoffrey Hinton.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
CSC2535: Computation in Neural Networks Lecture 11 Extracting coherent properties by maximizing mutual information across space or time Geoffrey Hinton.
Deep Feedforward Networks
Computation and cognition: Unsupervised learning
Ch 12. Continuous Latent Variables ~ 12
LECTURE 11: Advanced Discriminant Analysis
Machine Learning Independent Component Analysis Supervised Learning
Blind Signal Separation using Principal Components Analysis
Propagating Uncertainty In POMDP Value Iteration with Gaussian Process
Mixture Density Networks
Bayesian belief networks 2. PCA and ICA
Aapo Hyvärinen and Ella Bingham
EE513 Audio Signals and Systems
A Fast Fixed-Point Algorithm for Independent Component Analysis
Introduction to Radial Basis Function Networks
Parametric Methods Berlin Chen, 2005 References:
Introduction to Neural Network
What is Artificial Intelligence?
Presentation transcript:

An Information-Maximization Approach to Blind Separation and Blind Deconvolution A.J. Bell and T.J. Sejnowski Computational Modeling of Intelligence (Fri) Summarized by Joon Shik Kim

Abstract Self-organizing learning algorithm that maximizes the information transferred in a network of nonlinear units. The nonlinearities are able to pick up higher-order moments of the input distribution and perform true redundancy reduction between units in the output representation. We apply the network to the source separaton (or cocktail party) problem, successfully separating unknown mixtures of up to 10 speakers.

Cocktail Party Problem

Introduction (1/2) The development of information-theoretic unsupervised learning rules for neural networks The use, in signal processing, of higher- order statistics for separating out mixtures of independent sources (blind separation) or reversing the effect of an unknown filter (blind deconvolution)

Introduction (2/2) The approach we take to these problem is a generalization of Linsker’s informax principle to nonlinear units with arbitrarily distributed inputs. When inputs are to be passed through a sigmoid function, maximum information transmission can be achieved when the sloping part of the sigmoid is optimally lined up with the high density parts of the inputs. Generation of this rule to multiple units leads to a system that, in maximizing information transfer, also reduces the redundancy between the units in the output layer.

Information Maximization The basic problem is how to maximize the mutual information that the output Y of a neural network processor contains about its input X. I(Y,X)=H(Y)-H(Y|X)

Information Maximization Information between inputs and outputs can be maximized by maximizing the entropy of the outputs alone. H(Y|X) tends to minus infinity as the noise variance goes to zero.

For One Input and One Output When we pass a single input x through a transforming function g(x) to give an output variable y, both I(y,x) and H(y) are maximized when we align high density parts of the probability density function (pdf) of x with highly sloping parts of the function g(x).

For One Input and One Output

For an N→N Network

Inverse of a Matrix

Blind Separation and Blind Deconvolution

Blind Separation Results

Different Aspects from Previous work There is no noise, or rather, there is no noise model in this system. There is no assumption that inputs or outputs have Gaussian statistics. The transfer function is in general nonlinear.

Conclusion The learning rule is decidedly nonlocal. Each “neuron” must know the cofactor either of all the weights entering it, or all those leaving it. The network rule remains unbiological. We believe that the information maximization approach presented here could serve as a unifying framework that brings together several lines of research, and as a guiding principle for further advances.