Dimensionality Reduction and Principle Components Analysis

Slides:



Advertisements
Similar presentations
Chapter 28 – Part II Matrix Operations. Gaussian elimination Gaussian elimination LU factorization LU factorization Gaussian elimination with partial.
Advertisements

Covariance Matrix Applications
Multi-Label Prediction via Compressed Sensing By Daniel Hsu, Sham M. Kakade, John Langford, Tong Zhang (NIPS 2009) Presented by: Lingbo Li ECE, Duke University.
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) Dimensionality Reductions or data projections Random projections.
Dimensionality Reduction PCA -- SVD
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Logistic Regression Principal Component Analysis Sampling TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAA A A A.
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
Unsupervised Learning With Neural Nets Deep Learning and Neural Nets Spring 2015.
Principal Component Analysis
Dimensional reduction, PCA
TFIDF-space  An obvious way to combine TF-IDF: the coordinate of document in axis is given by  General form of consists of three parts: Local weight.
Lecture 21 SVD and Latent Semantic Indexing and Dimensional Reduction
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Principal Component Analysis Barnabás Póczos University of Alberta Nov 24, 2009 B: Chapter 12 HRF: Chapter 14.5.
Microarray analysis Algorithms in Computational Biology Spring 2006 Written by Itai Sharon.
Unsupervised Learning
3D Geometry for Computer Graphics
DATA MINING LECTURE 7 Dimensionality Reduction PCA – SVD
Autoencoders Mostafa Heidarpour
HCC class lecture 14 comments John Canny 3/9/05. Administrivia.
Summarized by Soo-Jin Kim
Machine Learning CS 165B Spring Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks.
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
Deep Learning – Fall 2013 Instructor: Bhiksha Raj Paper: T. D. Sanger, “Optimal Unsupervised Learning in a Single-Layer Linear Feedforward Neural Network”,
SINGULAR VALUE DECOMPOSITION (SVD)
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.
Techniques for Dimensionality Reduction
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Matrix Factorization 1. Recovering latent factors in a matrix m columns v11 … …… vij … vnm n rows 2.
Vector Semantics Dense Vectors.
Collaborative Deep Learning for Recommender Systems
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Deep neural networks. Outline What’s new in ANNs in the last 5-10 years? – Deeper networks, more data, and faster training Scalability and use of GPUs.
Principal Component Analysis (PCA)
Visualizing High-Dimensional Data
Deep Feedforward Networks
Background on Classification
University of Ioannina
Dimensionality Reduction
Lecture 8:Eigenfaces and Shared Features
The Elements of Statistical Learning
Unsupervised Learning: Principle Component Analysis
Exam Review Session William Cohen.
Techniques for Dimensionality Reduction
Unsupervised Learning and Autoencoders
Machine Learning Dimensionality Reduction
Matrix Factorization.
Lecture 21 SVD and Latent Semantic Indexing and Dimensional Reduction
Towards Understanding the Invertibility of Convolutional Neural Networks Anna C. Gilbert1, Yi Zhang1, Kibok Lee1, Yuting Zhang1, Honglak Lee1,2 1University.
Goodfellow: Chapter 14 Autoencoders
Principal Component Analysis
Word Embedding Word2Vec.
Cheng-Yi, Chuang (莊成毅), b99
Connecting Data with Domain Knowledge in Neural Networks -- Use Deep learning in Conventional problems Lizhong Zheng.
Neural Networks Geoff Hulten.
Dimensionality Reduction
Outline Singular Value Decomposition Example of PCA: Eigenfaces.
Autoencoders hi shea autoencoders Sys-AI.
Neural networks (3) Regularization Autoencoder
Autoencoders Supervised learning uses explicit labels/correct output in order to train a network. E.g., classification of images. Unsupervised learning.
EM Algorithm and its Applications
Autoencoders David Dohan.
Learning-Based Low-Rank Approximations
Goodfellow: Chapter 14 Autoencoders
Presentation transcript:

Dimensionality Reduction and Principle Components Analysis

Outline What is dimensionality reduction? Principle Components Analysis (PCA) Example (Bishop, ch 12) PCA vs linear regression PCA as a mixture model variant Implementing PCA Other matrix factorization methods Applications (collaborative filtering) MF with SGD MF vs clustering vs LDA vs ….

A Dimensionality Reduction Method you Already Know…. http://ufldl.stanford.edu/wiki/

Outline What’s new in ANNs in the last 5-10 years? Deeper networks, more data, and faster training Scalability and use of GPUs ✔ Symbolic differentiation ✔ Some subtle changes to cost function, architectures, optimization methods ✔ What types of ANNs are most successful and why? Convolutional networks (CNNs) ✔ Long term/short term memory networks (LSTM) ✔ Word2vec and embeddings✔ Autoencoders What are the hot research topics for deep learning?

Neural network auto-encoding Input Output 00000001 00000010 … 10000000 Assume we would like to learn the following (trivial?) output function: One approach is to use this network http://ufldl.stanford.edu/wiki/

Neural network auto-encoding Input Output 00000001 00000010 … 10000000 Maybe learning something like this: http://ufldl.stanford.edu/wiki/

Neural network autoencoding The hidden layer is a compressed version of the data If training error is zero then the training data is perfectly reconstructed It probably won’t be unless you overfit Other similar examples will be imperfectly reconstructed High reconstruction error usually means the example is different from the training data Reconstruction error on a vector x is related to P(x) on the probability that the auto-encoder was trained with. Autoencoders are related to generative models of the data

(Deep) Stacked Autoencoders Train Apply new dataset H of hidden unit activations for X http://ufldl.stanford.edu/wiki/

(Deep) Stacked Autoencoders http://ufldl.stanford.edu/wiki/

Modeling documents using top 2000 words.

Applications of dimensionality reduction Visualization examining data in 2 or 3 dimensions Semi-supervised learning supervised learning in the reduced space Anomaly detection and data completion coming soon Convenience in data processing Go from words (106 dimensions) to word vectors (hundreds of dimensions) Use dense GPU-friendly matrix operations instead of sparse ones

PCA

PCA is basically this picture… with linear units (not sigmoid units) trained to minimize squared error with a constraint that the “hidden units” are orthogonal

Motivating Example of PCA

A Motivating Example

A Motivating Example

A Motivating Example

V[i,j] = pixel j in image i PCA as matrices PC1 1000 * 10,000,00 2 prototypes 10,000 pixels ~ x1 y1 x2 y2 .. … xn yn a1 a2 .. … am b1 b2 bm v11 … vij 1000 images … vnm PC2 V[i,j] = pixel j in image i

PCA as matrices: the optimization problem

PCA as vectors: the optimization problem

A cartoon of PCA

A cartoon of PCA

A 3D cartoon of PCA

More cartoons

V[i,j] = pixel j in image i PCA as matrices PC1 1000 * 10,000,00 2 prototypes 10,000 pixels ~ x1 y1 x2 y2 .. … xn yn a1 a2 .. … am b1 b2 bm v11 … vij 1000 images … vnm PC2 V[i,j] = pixel j in image i

PCA vs linear regression

PCA vs linear regression

PCA vs mixtures of Gaussians

PCA vs mixtures of Gaussians

PCA vs mixtures of Gaussians

PCA vs mixtures of Gaussians

PCA: Implementation

Finding principle components of a matrix There are two algorithms I like EM (Roweis, NIPS 2007) Eigenvectors of the correlation matrix (next)

PCA as vectors: the optimization problem

PCA as matrices: the optimization problem

Implementing PCA

Implementing PCA: why does this work?

Some more examples of PCA: Eigenfaces

Some more examples of PCA: Eigenfaces

Some more examples of PCA: Eigenfaces

Some more examples of PCA: Eigenfaces

Some more examples of PCA: Eigenfaces

Image denoising

for image denoising

PCA for Modeling Text (SVD = Singular Value Decomposition)

A Scalability Problem with PCA Covariance matrix is large in high dimensions With d features covariance matrix is d*d SVD is a closely-related method that can be implemented more efficiently in high dimensions Don’t explicitly compute covariance matrix Instead decompose X to X=USVT S is k*k where k<<d S is diagonal and S[i,i]=sqrt(λi) for V[:,i] Columns of V ~= principle components Rows of US ~= embedding for examples

SVD example

The Neatest Little Guide to Stock Market Investing Investing For Dummies, 4th Edition The Little Book of Common Sense Investing: The Only Way to Guarantee Your Fair Share of Stock Market Returns The Little Book of Value Investing Value Investing: From Graham to Buffett and Beyond Rich Dad’s Guide to Investing: What the Rich Invest in, That the Poor and the Middle Class Do Not! Investing in Real Estate, 5th Edition Stock Investing For Dummies Rich Dad’s Advisors: The ABC’s of Real Estate Investing: The Secrets of Finding Hidden Profits Most Investors Miss https://technowiki.wordpress.com/2011/08/27/latent-semantic-analysis-lsa-tutorial/ https://technowiki.wordpress.com/2011/08/27/latent-semantic-analysis-lsa-tutorial/

TFIDF counts would be better https://technowiki.wordpress.com/2011/08/27/latent-semantic-analysis-lsa-tutorial/ https://technowiki.wordpress.com/2011/08/27/latent-semantic-analysis-lsa-tutorial/

Recovering latent factors in a matrix m terms doc term matrix ~ x1 y1 x2 y2 .. … xn yn a1 a2 .. … am b1 b2 bm v11 … vij n documents … vnm V[i,j] = TFIDF score of term j in doc i

=

Investing for real estate Rich Dad’s Advisor’s: The ABCs of Real Estate Investment …

The little book of common sense investing: … Neatest Little Guide to Stock Market Investing