INTRODUCTION TO Machine Learning

Slides:



Advertisements
Similar presentations
CHAPTER 13: Alpaydin: Kernel Machines
Advertisements

CHAPTER 6: Dimensionality Reduction Author: Christoph Eick The material is mostly based on the Shlens PCA Tutorial
Component Analysis (Review)
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
INTRODUCTION TO Machine Learning 2nd Edition
Machine Learning Lecture 8 Data Processing and Representation
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) Dimensionality Reductions or data projections Random projections.
Dimensionality Reduction PCA -- SVD
Dimension reduction (1)
Principal Components Analysis Babak Rasolzadeh Tuesday, 5th December 2006.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
Principal Component Analysis
Dimensionality Reduction and Embeddings
Dimensional reduction, PCA
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Proceedings of the 2007 SIAM International Conference on Data Mining.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Continuous Latent Variables --Bishop
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
Summarized by Soo-Jin Kim
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
General Tensor Discriminant Analysis and Gabor Features for Gait Recognition by D. Tao, X. Li, and J. Maybank, TPAMI 2007 Presented by Iulian Pruteanu.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Discriminant Analysis
Reduces time complexity: Less computation Reduces space complexity: Less parameters Simpler models are more robust on small datasets More interpretable;
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.
Dimensionality reduction
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 10: PRINCIPAL COMPONENTS ANALYSIS Objectives:
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Out of sample extension of PCA, Kernel PCA, and MDS WILSON A. FLORERO-SALINAS DAN LI MATH 285, FALL
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Principal Component Analysis (PCA)
CSCE822 Data Mining and Warehousing
PREDICT 422: Practical Machine Learning
Dimensionality Reduction
Background on Classification
INTRODUCTION TO Machine Learning 3rd Edition
Exploring Microarray data
LECTURE 10: DISCRIMINANT ANALYSIS
Recognition with Expression Variations
Dimensionality reduction
Principal Component Analysis (PCA)
Dimension Reduction via PCA (Principal Component Analysis)
Machine Learning Dimensionality Reduction
Roberto Battiti, Mauro Brunato
PCA vs ICA vs LDA.
Principal Component Analysis
Measuring latent variables
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Dimensionality Reduction
INTRODUCTION TO Machine Learning
Feature space tansformation methods
LECTURE 09: DISCRIMINANT ANALYSIS
Curse of Dimensionality
Feature Selection Methods
Principal Component Analysis
Lecture 8: Factor analysis (FA)
INTRODUCTION TO Machine Learning
Measuring latent variables
Presentation transcript:

INTRODUCTION TO Machine Learning Lecture Slides for INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml

CHAPTER 6: Dimensionality Reduction

Why Reduce Dimensionality? Reduces time complexity: Less computation Reduces space complexity: Less parameters Saves the cost of observing the feature Simpler models are more robust on small datasets More interpretable; simpler explanation Data visualization (structure, groups, outliers, etc) if plotted in 2 or 3 dimensions Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Feature Selection vs Extraction Feature selection: Choosing k<d important features, ignoring the remaining d – k Subset selection algorithms Feature extraction: Project the original xi , i =1,...,d dimensions to new k<d dimensions, zj , j =1,...,k Principal components analysis (PCA), linear discriminant analysis (LDA), factor analysis (FA) Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Subset Selection There are 2d subsets of d features Forward search: Add the best feature at each step Set of features F initially Ø. At each iteration, find the best new feature j = argmini E ( F È xi ) Add xj to F if E ( F È xj ) < E ( F ) Hill-climbing O(d2) algorithm Backward search: Start with all features and remove one at a time, if possible. Floating search (Add k, remove l) Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Principal Components Analysis (PCA) Find a low-dimensional space such that when x is projected there, information loss is minimized. The projection of x on the direction of w is: z = wTx Find w such that Var(z) is maximized Var(z) = Var(wTx) = E[(wTx – wTμ)2] = E[(wTx – wTμ)(wTx – wTμ)] = E[wT(x – μ)(x – μ)Tw] = wT E[(x – μ)(x –μ)T]w = wT ∑ w where Var(x)= E[(x – μ)(x –μ)T] = ∑ Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Maximize Var(z) subject to ||w||=1 ∑w1 = αw1 that is, w1 is an eigenvector of ∑ Choose the one with the largest eigenvalue for Var(z) to be max Second principal component: Max Var(z2), s.t., ||w2||=1 and orthogonal to w1 ∑ w2 = α w2 that is, w2 is another eigenvector of ∑ and so on. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

What PCA does z = WT(x – m) where the columns of W are the eigenvectors of ∑, and m is sample mean Centers the data at the origin and rotates the axes Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

How to choose k ? Proportion of Variance (PoV) explained when λi are sorted in descending order Typically, stop at PoV>0.9 Scree graph plots of PoV vs k, stop at “elbow” Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Factor Analysis Find a small number of factors z, which when combined generate x : xi – µi = vi1z1 + vi2z2 + ... + vikzk + εi where zj, j =1,...,k are the latent factors with E[ zj ]=0, Var(zj)=1, Cov(zi ,, zj)=0, i ≠ j , εi are the noise sources E[ εi ]= ψi, Cov(εi , εj) =0, i ≠ j, Cov(εi , zj) =0 , and vij are the factor loadings Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

PCA vs FA PCA From x to z z = WT(x – µ) FA From z to x x – µ = Vz + ε Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Factor Analysis In FA, factors zj are stretched, rotated and translated to generate x Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Multidimensional Scaling Given pairwise distances between N points, dij, i,j =1,...,N place on a low-dim map s.t. distances are preserved. z = g (x | θ ) Find θ that min Sammon stress Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Map of Europe by MDS Map from CIA – The World Factbook: http://www.cia.gov/ Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Linear Discriminant Analysis Find a low-dimensional space such that when x is projected, classes are well-separated. Find w that maximizes Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Between-class scatter: Within-class scatter: Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Fisher’s Linear Discriminant Find w that max LDA soln: Parametric soln: Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

K>2 Classes Within-class scatter: Between-class scatter: Find W that max The largest eigenvectors of SW-1SB Maximum rank of K-1 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)