Presentation is loading. Please wait.

Presentation is loading. Please wait.

Similar presentations


Presentation on theme: ""โ€” Presentation transcript:

3 CHAPTER 5: Multivariate Methods

4 Multivariate Distribution
4 Assume all members of class came from join distribution Can learn distributions from data P(x|C) Assign new instance for most probable class P(C|x) using Bayes rule An instance described by a vector of correlated parameters Realm of multivariate distributions Multivariate normal Based on E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 4

5 Multivariate Data Multiple measurements (sensors)
5 Multiple measurements (sensors) d inputs/features/attributes: d-variate N instances/observations/examples ๐‘‹= ๐‘‹ ๐‘‹ 2 1 โ‹ฏ ๐‘‹ ๐‘‘ 1 ๐‘‹ ๐‘‹ 2 2 โ‹ฏ ๐‘‹ ๐‘‘ 2 โ‹ฎ ๐‘‹ 1 ๐‘ ๐‘‹ 2 ๐‘ โ‹ฏ ๐‘‹ ๐‘‘ ๐‘ Based on E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 5

6 Multivariate Parameters
6 Mean:๐ธ ๐‘ฅ =๐œ‡= ๐œ‡ 1 ,..., ๐œ‡ ๐‘‘ ๐‘‡ Covariance: ๐œŽ ๐‘–๐‘— โ‰กCov ๐‘‹ ๐‘– , ๐‘‹ ๐‘— Correlation: Corr ๐‘‹ ๐‘– , ๐‘‹ ๐‘— โ‰ก ๐œŒ ๐‘–๐‘— = ๐œŽ ๐‘–๐‘— ๐œŽ ๐‘– ๐œŽ ๐‘— ๐›ดโ‰กCov ๐‘‹ =๐ธ ๐‘‹โˆ’๐œ‡ ๐‘‹โˆ’๐œ‡ ๐‘‡ = ๐œŽ ๐œŽ 12 โ‹ฏ ๐œŽ 1d ๐œŽ 21 ๐œŽ 2 2 โ‹ฏ ๐œŽ 2d โ‹ฎ ๐œŽ ๐‘‘1 ๐œŽ ๐‘‘2 โ‹ฏ ๐œŽ ๐‘‘ 2 Based on E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 6

7 Parameter Estimation Sample mean๐‘š: ๐‘š ๐‘– = ๐‘ก=1 ๐‘ ๐‘ฅ ๐‘– ๐‘ก ๐‘ ,๐‘–=1,...,๐‘‘ Covariance matrix๐‘†: ๐‘  ๐‘–๐‘— = ๐‘ก=1 ๐‘ ๐‘ฅ ๐‘– ๐‘ก โˆ’ ๐‘š ๐‘– ๐‘ฅ ๐‘— ๐‘ก โˆ’ ๐‘š ๐‘— ๐‘ Correlation matrix๐‘…: ๐‘Ÿ ๐‘–๐‘— = ๐‘  ๐‘–๐‘— ๐‘  ๐‘– ๐‘  ๐‘— Based on E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 7 7

8 Estimation of Missing Values
8 What to do if certain instances have missing attributes? Ignore those instances: not a good idea if the sample is small Use โ€˜missingโ€™ as an attribute: may give information Imputation: Fill in the missing value Mean imputation: Use the most likely value (e.g., mean) Imputation by regression: Predict based on other attributes Based on for E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 8

9 Multivariate Normal Have d-attributes
9 Have d-attributes Often can assume each one distributed normally Attributes might be dependant/correlated Joint distribution of correlated several variables P(X1=x1, X2=x2, โ€ฆ Xd=xd)=? X1 is normally distributed with mean ยตi and variance ๐œŽi Lecture Notes for E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 9

10 Multivariate Normal Mahalanobis distance: (x โ€“ ฮผ)T โˆ‘โ€“1 (x โ€“ ฮผ)
10 ๐‘ฅ~ ๐‘ ๐‘‘ ๐œ‡,๐›ด ๐‘ ๐‘ฅ = 1 2ฯ€ ๐‘‘ 2 โˆฃ๐›ดโˆฃ exp โˆ’ ๐‘ฅโˆ’๐œ‡ ๐‘‡ ๐›ด โˆ’1 ๐‘ฅโˆ’๐œ‡ Mahalanobis distance: (x โ€“ ฮผ)T โˆ‘โ€“1 (x โ€“ ฮผ) 2 variables are correlated Divided by inverse of covariance (large) Contribute less to Mahalanobis distance Contribute more to the probability Lecture Notes for E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 10

11 Bivariate Normal 11 Based on E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 11

12 Multivariate Normal Distribution
12 Mahalanobis distance: (x โ€“ ฮผ)T โˆ‘โ€“1 (x โ€“ ฮผ) measures the distance from x to ฮผ in terms of โˆ‘ (normalizes for difference in variances and correlations) Bivariate: d = 2 ๐›ด= ๐œŽ ๐œŒ๐œŽ 1 ๐œŽ 2 ๐œŒ๐œŽ 1 ๐œŽ 2 ๐œŽ 2 2 ๐‘ ๐‘ฅ 1 , ๐‘ฅ 2 = 1 2 ๐œ‹๐œŽ 1 ๐œŽ 2 1โˆ’ ๐œŒ 2 exp โˆ’ 1 2 1โˆ’ ๐œŒ ๐‘ง 1 2 โˆ’ 2ฯz 1 ๐‘ง 2 + ๐‘ง ๐‘ง ๐‘– = ๐‘ฅ ๐‘– โˆ’ ๐œ‡ ๐‘– ๐œŽ ๐‘– Based on E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 12

13 Bivariate Normal 13 Lecture Notes for E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 13

14 Bivariate Normal Lecture Notes for E Alpaydฤฑn 2010 Introduction to Machine Learning 2e ยฉ The MIT Press (V1.0)

15 Independent Inputs: Naive Bayes
15 If xi are independent, offdiagonals of โˆ‘ are 0, Mahalanobis distance reduces to weighted (by 1/ฯƒi) Euclidean distance: If variances are also equal, reduces to Euclidean distance ๐‘ ๐‘ฅ = ๐‘–=1 ๐‘‘ ๐‘ ๐‘– ๐‘ฅ ๐‘– = 1 2ฯ€ ๐‘‘ 2 ๐‘–=1 ๐‘‘ ๐œŽ ๐‘– exp โˆ’ 1 2 ๐‘–=1 ๐‘‘ ๐‘ฅ ๐‘– โˆ’ ๐œ‡ ๐‘– ๐œŽ ๐‘– 2 Based on Introduction to Machine Learning ยฉ The MIT Press (V1.1) 15

16 Projection Distribution
16 Example: vector of 3 features Multivariate normal distribution Projection to 2 dimensional space (e.g. XY plane) Vectors of 2 features Projection are also multivariate normal distribution Projection of d-dimensional normal to k-dimensional space is k-dimensional normal Based on E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 16

17 1D projection 17 Based on E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 17

18 Multivariate Classification
18 Assume members of class from a single multivariate distribution Multivariate normal is a good choice Easy to analyze Model many natural phenomena Model a class as having single prototype source (mean) slightly randomly changed Based on E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 18

19 Example Matching cars to customers
19 Matching cars to customers Each cat defines a class of matching customers Customers described by (age, income) There is a correlation between age and income Assume each class is multivariate normal Need to learn P(x|C) from data Use Bayes to compute P(C|x) Lecture Notes for E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 19

20 Parametric Classification
If p (x | Ci ) ~ N ( ฮผi , โˆ‘i ) Discriminant functions are Need to know Covariance Matrix and mean to compute discriminant functions. Can ignore P(x) as the same for all classes ๐‘ ๐‘ฅโˆฃ ๐ถ ๐‘– = 1 2ฯ€ ๐‘‘ 2 โˆฃ ๐›ด ๐‘– โˆฃ exp โˆ’ ๐‘ฅโˆ’ ๐œ‡ ๐‘– ๐‘‡ ๐›ด ๐‘– โˆ’1 ๐‘ฅโˆ’ ๐œ‡ ๐‘– Based on E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 20 20

21 Estimation of Parameters
21 ๐‘ƒ ๐ถ ๐‘– = ๐‘ก ๐‘Ÿ ๐‘– ๐‘ก ๐‘ ๐‘š ๐‘– = ๐‘ก ๐‘Ÿ ๐‘– ๐‘ก ๐‘ฅ ๐‘ก ๐‘ก ๐‘Ÿ ๐‘– ๐‘ก ๐‘† ๐‘– = ๐‘ก ๐‘Ÿ ๐‘– ๐‘ก ๐‘ฅ ๐‘ก โˆ’ ๐‘š ๐‘– ๐‘ฅ ๐‘ก โˆ’ ๐‘š ๐‘– ๐‘‡ ๐‘ก ๐‘Ÿ ๐‘– ๐‘ก ๐‘” ๐‘– ๐‘ฅ =โˆ’ 1 2 logโˆฃ ๐‘† ๐‘– โˆฃโˆ’ ๐‘ฅโˆ’ ๐‘š ๐‘– ๐‘‡ ๐‘† ๐‘– โˆ’1 ๐‘ฅโˆ’ ๐‘š ๐‘– +log { ๐‘ƒ ๐ถ ๐‘– Based on E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 21

22 Covariance Matrix per Class
Quadratic discriminant Requires estimation of K*d*(d+1)/2 parameters for covariance matrix ๐‘” ๐‘– ๐‘ฅ =โˆ’ 1 2 logโˆฃ ๐‘† ๐‘– โˆฃโˆ’ ๐‘ฅ ๐‘‡ ๐‘† ๐‘– โˆ’1 ๐‘ฅโˆ’ 2x ๐‘‡ ๐‘† ๐‘– โˆ’1 ๐‘š ๐‘– + ๐‘š ๐‘– ๐‘‡ ๐‘† ๐‘– โˆ’1 ๐‘š ๐‘– +log { ๐‘ƒ ๐ถ ๐‘– = ๐‘ฅ ๐‘‡ ๐‘Š ๐‘– ๐‘ฅ+ ๐‘ค ๐‘– ๐‘‡ ๐‘ฅ+ ๐‘ค ๐‘–0 where ๐‘Š ๐‘– =โˆ’ 1 2 ๐‘† ๐‘– โˆ’1 ๐‘ค ๐‘– = ๐‘† ๐‘– โˆ’1 ๐‘š ๐‘– ๐‘ค ๐‘–0 =โˆ’ 1 2 ๐‘š ๐‘– ๐‘‡ ๐‘† ๐‘– โˆ’1 ๐‘š ๐‘– โˆ’ 1 2 logโˆฃ ๐‘† ๐‘– โˆฃ+log { ๐‘ƒ ๐ถ ๐‘– 22 Based on E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 22

23 discriminant: P (C1|x ) = 0.5 likelihoods posterior for C1 23
Based on E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 23

24 Common Covariance Matrix S
24 If not enough data can assume all classes have same common sample covariance matrix S Discriminant reduces to a linear discriminant (xTS-1x is common to all discriminant and can be removed) ๐‘†= ๐‘– ๐‘ƒ ๐ถ ๐‘– ๐‘† ๐‘– ๐‘” ๐‘– ๐‘ฅ =โˆ’ ๐‘ฅโˆ’ ๐‘š ๐‘– ๐‘‡ ๐‘† โˆ’1 ๐‘ฅโˆ’ ๐‘š ๐‘– +log { ๐‘ƒ ๐ถ ๐‘– Based on E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 24

25 Common Covariance Matrix S
25 Based on E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 25

26 Diagonal S When xj j = 1,..d, are independent, โˆ‘ is diagonal
26 When xj j = 1,..d, are independent, โˆ‘ is diagonal p (x|Ci) = โˆj p (xj |Ci) (Naive Bayesโ€™ assumption) Classify based on weighted Euclidean distance (in sj units) to the nearest mean ๐‘” ๐‘– ๐‘ฅ =โˆ’ 1 2 ๐‘—=1 ๐‘‘ ๐‘ฅ ๐‘— ๐‘ก โˆ’ ๐‘š ๐‘–๐‘— ๐‘  ๐‘— log { ๐‘ƒ ๐ถ ๐‘– Based on E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 26

27 Diagonal S variances may be different
27 variances may be different Based on E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 27

28 Diagonal S, equal variances
28 Nearest mean classifier: Classify based on Euclidean distance to the nearest mean Each mean can be considered a prototype or template and this is template matching ๐‘” ๐‘– ๐‘ฅ =โˆ’ โˆฅ๐‘ฅโˆ’ ๐‘š ๐‘– โˆฅ 2 2s 2 +log ๐‘ƒ ๐ถ ๐‘– =โˆ’ 1 2s 2 ๐‘—=1 ๐‘‘ ๐‘ฅ ๐‘— ๐‘ก โˆ’ ๐‘š ๐‘–๐‘— log ๐‘ƒ ๐ถ ๐‘– Lecture Notes for E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 28

29 Diagonal S, equal variances
29 ? * Based on E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 29

30 Model Selection Assumption Covariance matrix No of parameters Shared, Hyperspheric Si=S=s^2I 1 Shared, Axis-aligned Si=S, with sij=0 d Shared, Hyperellipsoidal Si=S d(d+1)/2 Different, Hyperellipsoidal Si K d(d+1)/2 As we increase complexity (less restricted S), bias decreases and variance increases Assume simple models (allow some bias) to control variance (regularization) Lecture Notes for E Alpaydฤฑn 2010 Introduction to Machine Learning 2e ยฉ The MIT Press (V1.0)

31 Model Selection Different covariance matrix for each class
31 Different covariance matrix for each class Have to estimate many parameters Small bias , large variance Common covariance matrices, diagonal covariance etc. reduce number of parameters Increase bias but control variance In-between states? Lecture Notes for E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 31

32 Regularized Discriminant Analysis(RDA)
32 a=b=0: Quadratic classifier a=0, b=1:Shared Covariance, linear classifier a=1,b=0: Diagonal Covariance Choose best a,b by cross validation Lecture Notes for E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 32

33 Model Selection: Example
33 Lecture Notes for E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 33

34 Model Selection 34 Based on E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 34

36 Multivariate Regression
Multivariate linear model Lecture Notes for E Alpaydฤฑn 2010 Introduction to Machine Learning 2e ยฉ The MIT Press (V1.0)

37 Multivariate Regression
Lecture Notes for E Alpaydฤฑn 2010 Introduction to Machine Learning 2e ยฉ The MIT Press (V1.0)

38 CHAPTER 6: Dimensionality Reduction

39 Dimensionality of input
39 Number of Observables (e.g. age and income) If number of observables is increased More time to compute More memory to store inputs and intermediate results More complicated explanations (knowledge from learning) Regression from 100 vs. 2 parameters No simple visualization 2D vs. 10D graph Need much more data (curse of dimensionality) 1M of 1-d inputs is not equal to 1 input of dimension 1M Based on E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 39

40 Dimensionality reduction
40 Some features (dimensions) bear little or nor useful information (e.g. color of hair for a car selection) Can drop some features Have to estimate which features can be dropped from data Several features can be combined together without loss or even with gain of information (e.g. income of all family members for loan application) Some features can be combined together Have to estimate which features to combine from data Based on E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 40

41 Feature Selection vs Extraction
41 Feature selection: Choosing k<d important features, ignoring the remaining d โ€“ k Subset selection algorithms Feature extraction: Project the original xi , i =1,...,d dimensions to new k<d dimensions, zj , j =1,...,k Principal Components Analysis (PCA) Linear Discriminant Analysis (LDA) Factor Analysis (FA) Lecture Notes for E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 41

42 Usage Have data of dimension d Reduce dimensionality to k<d
42 Have data of dimension d Reduce dimensionality to k<d Discard unimportant features Combine several features in one Use resulting k-dimensional data set for Learning for classification problem (e.g. parameters of probabilities P(x|C) Learning for regression problem (e.g. parameters for model y=g(x|Thetha) Based on E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 42

43 Subset selection Have initial set of features of size d
43 Have initial set of features of size d There are 2^d possible subsets Need a criteria to decide which subset is the best A way to search over the possible subsets Canโ€™t go over all 2^d possibilities Need some heuristics Based on E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 43

44 โ€œGoodnessโ€ of feature set
44 Supervised Train using selected subset Estimate error on validation data set Unsupervised Look at input only(e.g. age, income and savings) Select subset of 2 that bear most of the information about the person Based on E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 44

45 Mutual Information 45 Have a 3 random variables(features) X,Y,Z and have to select 2 which gives most information If X and Y are โ€œcorrelatedโ€ then much of the information about of Y is already in X Make sense to select features which are โ€œuncorrelatedโ€ Mutual Information (Kullbackโ€“Leibler Divergenceย ) is more general measure of โ€œmutual informationโ€ Can be extended to n variables (information variables x1,.. xn have about variable xn+1) Based on E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 45

46 Subset-selection Forward search Backward search
46 Forward search Start from empty set of features Try each of remaining features Estimate classification/regression error for adding specific feature Select feature that gives maximum improvement in validation error Stop when no significant improvement Backward search Start with original set of size d Drop features with smallest impact on error Based on E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 46

47 Subset Selection There are 2^d subsets of d features
Forward search: Add the best feature at each step Set of features F initially ร˜. At each iteration, find the best new feature j = argmini E ( F รˆ xi ) Add xj to F if E ( F รˆ xj ) < E ( F ) Hill-climbing O(d^2) algorithm Backward search: Start with all features and remove one at a time, if possible. Floating search (Add k, remove l) Lecture Notes for E Alpaydฤฑn 2010 Introduction to Machine Learning 2e ยฉ The MIT Press (V1.0)

48 Floating Search Forward and backward search are โ€œgreedyโ€ algorithms
48 Forward and backward search are โ€œgreedyโ€ algorithms Select best options at single step Do not always achieve optimum value Floating search Two types of steps: Add k, remove l More computations Based on E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 48

49 Feature Extraction Face recognition problem
49 Face recognition problem Training data input: pairs of Image + Label(name) Classifier input: Image Classifier output: Label(Name) Image: Matrix of 256X256=65536 values in range Each pixels bear little information so canโ€™t select best ones Average of pixels around specific positions may give an indication about an eye color. Based on E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 49

50 Projection 50 Find a projection matrix w from d-dimensional to k- dimensional vectors that keeps error low Based on E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 50

51 PCA: Motivation 51 Assume that d observables are linear combination of k<d vectors zi=wi1xi1+โ€ฆ+wikxid We would like to work with basis as it has lesser dimension and have all(almost) required information What we expect from such basis Uncorrelated or otherwise can be reduced further Have large variance (e.g. wi1 have large variation) or otherwise bear no information Based on E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 51

52 PCA: Motivation 52 Based on E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 52

53 PCA: Motivation 53 Choose directions such that a total variance of data will be maximum Maximize Total Variance Choose directions that are orthogonal Minimize correlation Choose k<d orthogonal directions which maximize total variance Based on E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 53

54 PCA Choosing only directions:
54 Choosing only directions: Maximize variance subject to a constrain using Lagrange Multipliers Taking Derivatives Eigenvector. Since want to maximize we should choose an eigenvector with largest eigenvalue Based on E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 54

55 PCA d-dimensional feature space
55 d-dimensional feature space d by d symmetric covariance matrix estimated from samples Select k largest eigenvalue of the covariance matrix and associated k eigenvectors The first eigenvector will be a direction with largest variance Based on E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 55

56 What PCA does z = WT(x โ€“ m)
56 z = WT(x โ€“ m) where the columns of W are the eigenvectors of โˆ‘, and m is sample mean Centers the data at the origin and rotates the axes Based on E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 56

57 How to choose k ? Proportion of Variance (PoV) explained
57 Proportion of Variance (PoV) explained when ฮปi are sorted in descending order Typically, stop at PoV>0.9 Scree graph plots of PoV vs k, stop at โ€œelbowโ€ ๐œ† 1 + ๐œ† 2 +โ‹ฏ+ ๐œ† ๐‘˜ ๐œ† 1 + ๐œ† 2 +โ‹ฏ+ ๐œ† ๐‘˜ +โ‹ฏ+ ๐œ† ๐‘‘ Lecture Notes for E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 57

58 58 Lecture Notes for E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 58

59 PCA PCA is unsupervised (does not take into account class information)
59 PCA is unsupervised (does not take into account class information) Can take into account classes : Karhuned-Loeve Expansion Estimate Covariance Per Class Take average weighted by prior Common Principle Components Assume all classes have same eigenvectors (directions) but different variances Lecture Notes for E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 59

60 PCA Does not try to explain noise
60 Does not try to explain noise Large noise can become new dimension/largest PC Interested in resulting uncorrelated variables which explain large portion of total sample variance Sometimes interested in explained shared variance (common factors) that affect data Based on E Alpaydฤฑn 2004 Introduction to Machine Learning ยฉ The MIT Press (V1.1) 60


Download ppt ""

Similar presentations


Ads by Google