CH 5: Multivariate Methods 5.1 Multivariate Data Data vector: d-variate data where features/attributes e.g., A sample may be represented as a matrix. 1
5.2 Parameter Estimation Mean vector: Covariance matrix: where Correlation: the standard deviation of 2 2
If two random variables are independent, then Parameter Estimation Given a sample 3 3
5.3 Estimation of Missing Values Certain instances have missing attributes. Ignore those instances: not a good idea if the sample is small. Imputation: Fill in the missing value Mean imputation: Substitute the mean of the available data of the missing attribute Imputation by regression: Predict based on other attributes (by regression or classification methods) 4
5.4 Multivariate Normal Distribution Mahalanobis distance (function) measures the distance from x to in terms of ∑, which normalizes variances of different dimensions 5
from x to in standard deviation unit. e.g., d = 1, The square distance from x to in standard deviation unit. : hyperellipsoid (equation) centered at . Its shape and orientation are governed by , which normalized all variables to unit variance. 6
e.g., : d = 2, 7
8
zero mean and unit variance, called z-normalization are unit normal, i.e., zero mean and unit variance, called z-normalization 9
10
11
i.e., the projection of a d-D normal on a vector w Let i.e., the projection of a d-D normal on a vector w is univariate normal. Let W be a d x k matrix. 12
5.5 Multivariate Classification i.e., a d-D normal is projected to a k-D space in which a k-D normal is obtained. 5.5 Multivariate Classification Define the discriminant function for class as and assume 13
Estimation of Parameters Given a sample where Estimation of Parameters Substitute into (A) and ignore 14
i) Quadratic discriminant: The number of parameters to be estimated is means and covariance matrices Share common sample covariance, and ignore . (B) reduces to 15
The numbers of parameters: means and covariance matrices. Ignoring , (C) reduces to ii) Linear discriminant: 16
Assuming off-diagonals of S to be 0, Substitute into (C) 17
iii) Naive Bayes’ classifier: The number of parameters to be estimated is means and d covariance matrices. Assuming all variances to be equal, Plug into (E) The number of parameters to be estimated is means and 1 for . 18
Assuming equal priors and ignore s, iii) Nearest mean classifier: Ignore the common term Assuming equal , iv) Inner product classifier: 19
5.6 Tuning Complexity As we increase complexity, bias decreases but variance increases (bias-variance dilemma). 20
Different covariance matrices fitted to the same data lead to different class shapes and boundaries. 21
5.7 Discrete Features where are binary. If xj ’s are independent, Binary attributes: where are binary. If xj ’s are independent, The discriminant function: 22
Multinomial features: Define Let : the probability that takes value , i.e., If xj ’s are independent, The discriminant function: 23
5.8 Multivariate Regression Multivariate linear model Assume Then, maximizing the likelihood minimizing the sum of squared error Taking derivatives wrt , respectively, 24
In vector-matrix form: , where Solution: 25
Generalizing the linear model: A multivariate linear model leads to a polynomial Model. A nonlinear model can lead to a multivariate linear model. 26