Deep Learning – Fall 2013 Instructor: Bhiksha Raj Paper: T. D. Sanger, “Optimal Unsupervised Learning in a Single-Layer Linear Feedforward Neural Network”,

Slides:



Advertisements
Similar presentations
Eigen Decomposition and Singular Value Decomposition
Advertisements

Eigen Decomposition and Singular Value Decomposition
Chapter 28 – Part II Matrix Operations. Gaussian elimination Gaussian elimination LU factorization LU factorization Gaussian elimination with partial.
Introduction to Neural Networks Computing
Covariance Matrix Applications
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Linear Equations in Linear Algebra
B.Macukow 1 Lecture 12 Neural Networks. B.Macukow 2 Neural Networks for Matrix Algebra Problems.
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Linear Transformations
Eigenvalues and Eigenvectors
Symmetric Matrices and Quadratic Forms
© 2003 by Davi GeigerComputer Vision September 2003 L1.1 Face Recognition Recognized Person Face Recognition.
Introduction to Neural Networks John Paxton Montana State University Summer 2003.
Ch 7.9: Nonhomogeneous Linear Systems
Neural Network-based Face Recognition, using ARENA Algorithm. Gregory Tambasis Supervisor: Dr T. Windeatt.
Self Organization: Hebbian Learning CS/CMPE 333 – Neural Networks.
Un Supervised Learning & Self Organizing Maps Learning From Examples
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
3D Geometry for Computer Graphics
18 1 Hopfield Network Hopfield Model 18 3 Equations of Operation n i - input voltage to the ith amplifier a i - output voltage of the ith amplifier.
6 1 Linear Transformations. 6 2 Hopfield Network Questions.
November 30, 2010Neural Networks Lecture 20: Interpolative Associative Memory 1 Associative Networks Associative networks are able to store a set of patterns.
Tutorial 10 Iterative Methods and Matrix Norms. 2 In an iterative process, the k+1 step is defined via: Iterative processes Eigenvector decomposition.
PCA NETWORK Unsupervised Learning NEtWORKS. PCA is a Representation Network useful for signal, image, video processing.
Techniques for studying correlation and covariance structure
SVD(Singular Value Decomposition) and Its Applications
Unsupervised learning
Presented By Wanchen Lu 2/25/2013
Multiple-Layer Networks and Backpropagation Algorithms
Principal Component Analysis and Independent Component Analysis in Neural Networks David Gleich CS 152 – Neural Networks 11 December 2003.
1 1.4 Linear Equations in Linear Algebra THE MATRIX EQUATION © 2016 Pearson Education, Inc.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
Unsupervised learning
Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi.
CSE 185 Introduction to Computer Vision Face Recognition.
Adaptive Algorithms for PCA PART – II. Oja’s rule is the basic learning rule for PCA and extracts the first principal component Deflation procedure can.
Introduction to Matrices Douglas N. Greve
Unsupervised Learning Motivation: Given a set of training examples with no teacher or critic, why do we learn? Feature extraction Data compression Signal.
Contents PCA GHA APEX Kernel PCA CS 476: Networks of Neural Computation, CSD, UOC, 2009 Conclusions WK9 – Principle Component Analysis CS 476: Networks.
Elementary Linear Algebra Anton & Rorres, 9 th Edition Lecture Set – 07 Chapter 7: Eigenvalues, Eigenvectors.
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
Introduction to Linear Algebra Mark Goldman Emily Mackevicius.
CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.
PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.
Introduction to Independent Component Analysis Math 285 project Fall 2015 Jingmei Lu Xixi Lu 12/10/2015.
4.8 Rank Rank enables one to relate matrices to vectors, and vice versa. Definition Let A be an m  n matrix. The rows of A may be viewed as row vectors.
5 5.1 © 2016 Pearson Education, Ltd. Eigenvalues and Eigenvectors EIGENVECTORS AND EIGENVALUES.
Chapter 13 Discrete Image Transforms
1 1.4 Linear Equations in Linear Algebra THE MATRIX EQUATION © 2016 Pearson Education, Ltd.
1 Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors. Review Matrices.
Ch 7. Computing with Population Coding Summarized by Kim, Kwonill Bayesian Brain: Probabilistic Approaches to Neural Coding P. Latham & A. Pouget.
REVIEW Linear Combinations Given vectors and given scalars
Dimensionality Reduction and Principle Components Analysis
Linear Equations in Linear Algebra
Extreme Learning Machine
Lecture 8:Eigenfaces and Shared Features
PCA vs ICA vs LDA.
Singular Value Decomposition
A principled way to principal components analysis
Linear Equations in Linear Algebra
Principal Component Analysis
Parallelization of Sparse Coding & Dictionary Learning
Outline Singular Value Decomposition Example of PCA: Eigenfaces.
Symmetric Matrices and Quadratic Forms
Symmetric Matrices and Quadratic Forms
Presentation transcript:

Deep Learning – Fall 2013 Instructor: Bhiksha Raj Paper: T. D. Sanger, “Optimal Unsupervised Learning in a Single-Layer Linear Feedforward Neural Network”, Neural Networks, vol. 2, pp , Presenter: Khoa Luu 1

Table of Contents Introduction Eigenproblem-related Applications Hebbian Algorithms (Oja, 1982) Generalized Hebbian Algorithm (GHA) Proof of GHA Theorem GHA Local Implementation Demo 2

Introduction What are the aims of this paper? – The optimal solution given by GHA whose weight vectors span the space defined by the first few eigenvectors of the correlation matrix of the input. – The method converges with probability one. Why is Generalized Hebbian Algorithm (GHA) is important? – Guarantee to find the eigenvectors directly from the data without computing correlation matrix that usually takes lots of memory. – Example: if network has 4000 inputs x, then correlation matrix xx T has 16 million elements! If the network has 16 outputs, then the computing outer products yx T take only 64,000 elements and yy T takes only 256 elements. How does this method work? – Next sections 3

Additional Comments This method is proposed in 1989, there are not many efficient algorithms for Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) as the ones nowadays. GHA is significant by then since it is computational efficiency and can be parallelized. Some SVD algorithms proposed recently 4

Eigenproblem-related Applications 5 Image Encoding 40 principal components (6.3:1 compression) 6 principal components (39.4:1 compression) Eigenfaces – Face Recognition Object Reconstruction Keypoints Detection and more …

Unsupervised Feedforward Neural Network Input layer of source nodes Output layer of neurons Input layer Output layer Hidden Layer “Two-layer” Neural Network input hidden output “Single-layer” Neural Network 6

Unsupervised Single-Layer Linear Feedforward Neural Network “Single-layer” Neural Network Input layer of source nodes Output layer of neurons C: m×n weight matrix x: nx1 column input vector y: m×1 column output vector Some assumptions: Network is linear # of outputs < # of inputs (m < n) 7

Hebbian Algorithms Proposed Updating Rule: C(t+1) = C(t) + ɣ(y(t)x T (t)) x: nx1 column input vector C: m×n weight matrix y = Cx: m×1 column output vector ɣ: the rate of change of the weights If all diagonal elements of CC T equal to 1, then a Hebbian learning rule will cause the rows of C to converge to the principal eigenvector of Q = E[xx T ]. 8

Hebbian Algorithms (cont.) Proposed Network Learning Algorithm (Oja, 1982): The approximated differential equation: (why?) c i (t) = Qc i (t) – (c i (t) T Qc i (t))c i (t) Oja (1982) proved that for any choice of initial weights, c i will converge to the principal component eigenvector e 1 (or its negative). Hebbian learning rules tend to maximize the variance of the output units: E[y 1 ] = E[(e 1 x) 2 ] = e 1 T Qe 1 9 ∆c i = ɣ(y i x – y i 2 c i )

Hebbian Algorithms (cont.) (1) First term of the right hand side of (1) 10

Generalized Hebbian Algorithm (GHA) Oja algorithm only finds the first eigenvector e 1. If the network has more than one output, then we need to extend to GHA algorithm. Again, from Oja’s update rule: ∆c i = ɣ(y i x – y i 2 c i ) Gram-Schmidt process in matrix form: ∆C(t) = -LT(C(t)C(t) T ) C(t) This equation orthogonalizes C in (m-1) steps if the row norms are kept in 1. Then GHA is the combination: LT[A]: operator to set all entries above diagonal of A to zeros. In GHA, Oja alg. is applied to each row of C independently, thus causes all rows to converge to the principal eigenvector. GHA = Oja algorithm + Gram-Schmidt Orth. process 11

GHA Theorem - Convergence Generalized Hebbian Algorithm C(t+1) = C(t) + ɣ(t)h(C(t), x(t)) where: h(C(t), x(t)) = y(t)x T (t) – LT[y(t) y T (t)]C(t) Theorem 1: If C is assigned random weights at t=0, then with probability 1, Eqn. (2) will converge, and C will approach the matrix whose rows are the first m eigenvectors {e k } of the input correlation matrix Q = E[xx T ], ordered by decreasing eigenvalue {λ k }. 12 (2)

GHA Theorem - Proven 13

GHA Theorem - Proven 14 (3)

GHA Theorem – Proven (cont.) Induction method: if the first (i-1) rows converge to the first (i-1) eigenvectors, then the i-th row will converge to the i-th eigenvector. When k = 1, with the first row of Eqn. (3): Oja, 1982 showed that this equation forces to c 1 to converge with prob. 1 to ±e 1 which is the normalized principal eigenvector of Q (slide 9). 15

GHA Theorem – Proven (cont.) 16 (4) (5)

GHA Theorem – Proven (cont.) 17

GHA Theorem – Proven (cont.) 18

GHA Theorem – Proven (cont.) 19

GHA – Local Implementation 20

Demo 21

Thank you! 22