Adaptive Algorithms for PCA PART – II. Oja’s rule is the basic learning rule for PCA and extracts the first principal component Deflation procedure can.

Slides:



Advertisements
Similar presentations
Pattern Association.
Advertisements

A Brief Overview of Neural Networks By Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C. Wunsch.
Chapter 9 Approximating Eigenvalues
Introduction to Neural Networks Computing
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Adjoint Orbits, Principal Components, and Neural Nets Some facts about Lie groups and examples 2.Examples of adjoint orbits and a distance measure 3.Descent.
Mehran University of Engineering and Technology, Jamshoro Department of Electronic Engineering Neural Networks Feedforward Networks By Dr. Mukhtiar Ali.
B.Macukow 1 Lecture 12 Neural Networks. B.Macukow 2 Neural Networks for Matrix Algebra Problems.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The FIR Adaptive Filter The LMS Adaptive Filter Stability and Convergence.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Newton’s Method Application to LMS Recursive Least Squares Exponentially-Weighted.
Linear Discriminant Functions
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
Jonathan Richard Shewchuk Reading Group Presention By David Cline
Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods.
MACHINE LEARNING - Doctoral Class - EDIC EPFL A.. Billard MACHINE LEARNING Information Theory and The Neuron - II Aude.
Self Organization: Hebbian Learning CS/CMPE 333 – Neural Networks.
Un Supervised Learning & Self Organizing Maps Learning From Examples
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Unsupervised Learning
Introduction to Neural Networks John Paxton Montana State University Summer 2003.
12 1 Variations on Backpropagation Variations Heuristic Modifications –Momentum –Variable Learning Rate Standard Numerical Optimization –Conjugate.
Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.
PCA NETWORK Unsupervised Learning NEtWORKS. PCA is a Representation Network useful for signal, image, video processing.
1cs542g-term Notes  Extra class next week (Oct 12, not this Friday)  To submit your assignment: me the URL of a page containing (links to)
Radial Basis Function (RBF) Networks
Hazırlayan NEURAL NETWORKS Radial Basis Function Networks II PROF. DR. YUSUF OYSAL.
Dominant Eigenvalues & The Power Method
Adaptive Signal Processing

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.
Principle Component Analysis (PCA) Networks (§ 5.8) PCA: a statistical procedure –Reduce dimensionality of input vectors Too many features, some of them.
Unsupervised learning
Introduction to Adaptive Digital Filters Algorithms
Deep Learning – Fall 2013 Instructor: Bhiksha Raj Paper: T. D. Sanger, “Optimal Unsupervised Learning in a Single-Layer Linear Feedforward Neural Network”,
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
CHAPTER 4 Adaptive Tapped-delay-line Filters Using the Least Squares Adaptive Filtering.
Unsupervised learning
Akram Bitar and Larry Manevitz Department of Computer Science
11 1 Backpropagation Multilayer Perceptron R – S 1 – S 2 – S 3 Network.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Derivation Computational Simplifications Stability Lattice Structures.
Unsupervised Learning Motivation: Given a set of training examples with no teacher or critic, why do we learn? Feature extraction Data compression Signal.
ADALINE (ADAptive LInear NEuron) Network and
Neural Network Basics Anns are analytical systems that address problems whose solutions have not been explicitly formulated Structure in which multiple.
Chapter 2 Single Layer Feedforward Networks
Fast Learning in Networks of Locally-Tuned Processing Units John Moody and Christian J. Darken Yale Computer Science Neural Computation 1, (1989)
CHAPTER 10 Widrow-Hoff Learning Ming-Feng Yeh.
Professors: Eng. Diego Barral Eng. Mariano Llamedo Soria Julian Bruno
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
METHOD OF STEEPEST DESCENT ELE Adaptive Signal Processing1 Week 5.
Introduction to Neural Networks Freek Stulp. 2 Overview Biological Background Artificial Neuron Classes of Neural Networks 1. Perceptrons 2. Multi-Layered.
Algorithm for non-negative matrix factorization Daniel D. Lee, H. Sebastian Seung. Algorithm for non-negative matrix factorization. Nature.
Jochen Triesch, UC San Diego, 1 Part 3: Hebbian Learning and the Development of Maps Outline: kinds of plasticity Hebbian.
Neural Networks - Berrin Yanıkoğlu1 MLP & Backpropagation Issues.
Principle Component Analysis (PCA) Networks (§ 5.8)
Chapter 2 Single Layer Feedforward Networks
One-layer neural networks Approximation problems
Adaptation Behavior of Pipelined Adaptive Filters
Blind Signal Separation using Principal Components Analysis
Unsupervised learning
Variations on Backpropagation.
Neural Networks Geoff Hulten.
Backpropagation.
Fundamentals of Neural Networks Dr. Satinder Bal Gupta
Recursively Adapted Radial Basis Function Networks and its Relationship to Resource Allocating Networks and Online Kernel Learning Weifeng Liu, Puskal.
Variations on Backpropagation.
Backpropagation.
Section 3: Second Order Methods
Presentation transcript:

Adaptive Algorithms for PCA PART – II

Oja’s rule is the basic learning rule for PCA and extracts the first principal component Deflation procedure can be used to estimate the minor eigencomponents Sanger’s rule does an on-line deflation and uses Oja’s rule to estimate the eigencomponents Problems with Sanger’s rule- Strictly speaking, Sanger’s rule is non-local and makes it a little harder for VLSI implementation. Non-local rules are termed as biologically non-plausible! (As engineers, we don’t care very much about this) Sanger’s rule converges slowly. We will see later that many algorithms for PCA converge slowly.

Other Adaptive structures for PCA The first step would be to change the architecture of the network so that the update rules become local. INPUT X(n) WEIGHTS -W LATERAL WEIGHTS - C

This is the Rubner-Tavan Model. Output vector y is given by x1 x2 w1 w2 c y1 y2 C is a lower triangular matrix and this is usually called as the lateral weight matrix or the lateral inhibitor matrix. Feedforward weights W are trained using Oja’s rule. Lateral weights are trained using anti-Hebbian rule.

Why this works? Fact – We know that the eigenvectors are all orthonormal vectors. Hence the outputs of the network are all uncorrelated. Since, anti-Hebbian learning decorrelates signals, we can use this for training the lateral network. Most important contributions of Rubner-Tavan Model – Local update rules and hence biologically plausible Introduction of the lateral network for estimating minor components instead of using deflation

However, the Rubner-Tavan model is slow to converge. APEX (Adaptive Principal Component Extraction) network slightly improves the speed of convergence of the Rubner-Tavan method. APEX uses exactly the same network architecture as Rubner- Tavan. Feedforward weights are trained using Oja’s rule as before Lateral weights are trained using normalized anti-Hebbian rule instead of just anti-Hebbian! This is very similar to the normalization we did to Hebbian learning. You can say that this is Oja’s rule for anti-Hebbian

APEX is faster because normalized anti-Hebbian rule is used to train the lateral net. It should be noted that when convergence is reached, all the lateral weights must go to zero! This is because, when convergence is reached, all the outputs are uncorrelated and hence there should not be any connection between them. Faster methods for PCA- All the adaptive models we discussed so far are based on gradient formulations. Simple gradient methods are usually slow and their convergence depends heavily on the selection of the right step-sizes. Usually, the selection of step-sizes is directly dependent on the eigenvalues of the input data.

Researchers have used different optimization criteria instead of the simple steepest descent. These optimizations no doubt increase the speed of convergence but they increase the computational cost as well! There is always a trade-off between speed of convergence and complexity. There are some subspace techniques like the Natural-power method and Projection Approximation Subspace Tracking (PAST) which are faster than the traditional Sanger’s or APEX rules, but are computationally intensive. Most of them involve direct matrix multiplications.

CNEL rule for PCA- Any value of T can be choosen, but T < 1 There is no step-size in the algorithm! The algorithm is O(N) and on-line. Most importantly, it is faster than many other PCA algorithms

Eigenspread = 802 Performance With Violin Time Series (CNEL rule)

Sanger’s rule