Download presentation
Presentation is loading. Please wait.
Published byEdwina Walsh Modified over 9 years ago
1
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml Lecture Slides for
2
CHAPTER 5: Multivariate Methods
3
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 3 Multivariate Data Multiple measurements (sensors) d inputs/features/attributes: d-variate N instances/observations/examples
4
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 4 Multivariate Parameters
5
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 5 Parameter Estimation
6
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 6 Estimation of Missing Values What to do if certain instances have missing attributes? Ignore those instances: not a good idea if the sample is small Use ‘missing’ as an attribute: may give information Imputation: Fill in the missing value Mean imputation: Use the most likely value (e.g., mean) Imputation by regression: Predict based on other attributes
7
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 7 Multivariate Normal Distribution
8
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 8 Multivariate Normal Distribution Mahalanobis distance: ( x – μ ) T ∑ –1 ( x – μ ) measures the distance from x to μ in terms of ∑ (normalizes for difference in variances and correlations) Bivariate: d = 2
9
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 9 Bivariate Normal
10
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 10
11
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 11 Independent Inputs: Naive Bayes If x i are independent, offdiagonals of ∑ are 0, Mahalanobis distance reduces to weighted (by 1/ σ i ) Euclidean distance: If variances are also equal, reduces to Euclidean distance
12
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 12 Parametric Classification If p (x | C i ) ~ N ( μ i, ∑ i ) Discriminant functions are
13
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 13 Estimation of Parameters
14
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 14 Different S i Quadratic discriminant
15
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 15 likelihoods posterior for C 1 discriminant: P (C 1 |x ) = 0.5
16
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 16 Common Covariance Matrix S Shared common sample covariance S Discriminant reduces to which is a linear discriminant
17
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 17 Common Covariance Matrix S
18
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 18 Diagonal S When x j j = 1,..d, are independent, ∑ is diagonal p (x|C i ) = ∏ j p (x j |C i )(Naive Bayes’ assumption) Classify based on weighted Euclidean distance (in s j units) to the nearest mean
19
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 19 Diagonal S variances may be different
20
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 20 Diagonal S, equal variances Nearest mean classifier: Classify based on Euclidean distance to the nearest mean Each mean can be considered a prototype or template and this is template matching
21
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 21 Diagonal S, equal variances * ?
22
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 22 Model Selection As we increase complexity (less restricted S), bias decreases and variance increases Assume simple models (allow some bias) to control variance (regularization) AssumptionCovariance matrixNo of parameters Shared, HypersphericSi=S=s2ISi=S=s2I1 Shared, Axis-alignedS i =S, with s ij =0d Shared, HyperellipsoidalSi=SSi=Sd(d+1)/2 Different, Hyperellipsoidal SiSi K d(d+1)/2
23
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 23 Discrete Features Binary features: if x j are independent (Naive Bayes’) the discriminant is linear Estimated parameters
24
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 24 Discrete Features Multinomial (1-of-n j ) features: x j {v 1, v 2,..., v n j } if x j are independent
25
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 25 Multivariate Regression Multivariate linear model Multivariate polynomial model: Define new higher-order variables z 1 =x 1, z 2 =x 2, z 3 =x 1 2, z 4 =x 2 2, z 5 =x 1 x 2 and use the linear model in this new z space (basis functions, kernel trick, SVM: Chapter 10)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.