CH 5: Multivariate Methods

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Component Analysis (Review)
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Data Modeling and Parameter Estimation Nov 9, 2005 PSCI 702.
Pattern Classification. Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory–Continuous Features.
Chapter 4: Linear Models for Classification
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Classification and risk prediction
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Chapter 5 Part II 5.3 Spread of Data 5.4 Fisher Discriminant.
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Today Wrap up of probability Vectors, Matrices. Calculus
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.
Outline Separating Hyperplanes – Separable Case
Principles of Pattern Recognition
Multivariate Methods Slides from Machine Learning by Ethem Alpaydin Expanded by some slides from Gutierrez-Osuna.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
Machine Learning Recitation 6 Sep 30, 2009 Oznur Tastan.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.
Machine Learning 5. Parametric Methods.
Multivariate Methods Slides from Machine Learning by Ethem Alpaydin Expanded by some slides from Gutierrez-Osuna.
ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and  2 Now, we need procedures to calculate  and  2, themselves.
- 1 - Preliminaries Multivariate normal model (section 3.6, Gelman) –For a multi-parameter vector y, multivariate normal distribution is where  is covariance.
Pattern Classification All materials in these slides* were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Multivariate Methods Slides from Machine Learning by Ethem Alpaydin Expanded by some slides from Gutierrez-Osuna.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Applied statistics Usman Roshan.
Lecture 2. Bayesian Decision Theory
Usman Roshan CS 675 Machine Learning
Chapter 3: Maximum-Likelihood Parameter Estimation
Probability Theory and Parameter Estimation I
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Multivariate Methods Slides from Machine Learning by Ethem Alpaydin
Ch3: Model Building through Regression
Ch8: Nonparametric Methods
Pattern Classification, Chapter 3
Classification Discriminant Analysis
Propagating Uncertainty In POMDP Value Iteration with Gaussian Process
Classification Discriminant Analysis
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
ECE 417 Lecture 4: Multivariate Gaussians
J.-F. Pâris University of Houston
Linear regression Fitting a straight line to observations.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Recognition and Machine Learning
OVERVIEW OF LINEAR MODELS
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Linear Discrimination
Test #1 Thursday September 20th
Topic 11: Matrix Approach to Linear Regression
Presentation transcript:

CH 5: Multivariate Methods 5.1 Multivariate Data Data vector: d-variate data where features/attributes e.g., A sample may be represented as a matrix. 1

5.2 Parameter Estimation Mean vector: Covariance matrix: where Correlation: the standard deviation of 2 2

If two random variables are independent, then Parameter Estimation Given a sample 3 3

5.3 Estimation of Missing Values Certain instances have missing attributes. Ignore those instances: not a good idea if the sample is small. Imputation: Fill in the missing value Mean imputation: Substitute the mean of the available data of the missing attribute Imputation by regression: Predict based on other attributes (by regression or classification methods) 4

5.4 Multivariate Normal Distribution Mahalanobis distance (function) measures the distance from x to in terms of ∑, which normalizes variances of different dimensions 5

from x to in standard deviation unit. e.g., d = 1, The square distance from x to in standard deviation unit. : hyperellipsoid (equation) centered at . Its shape and orientation are governed by , which normalized all variables to unit variance. 6

e.g., : d = 2, 7

8

zero mean and unit variance, called z-normalization are unit normal, i.e., zero mean and unit variance, called z-normalization 9

10

11

i.e., the projection of a d-D normal on a vector w Let i.e., the projection of a d-D normal on a vector w is univariate normal. Let W be a d x k matrix. 12

5.5 Multivariate Classification i.e., a d-D normal is projected to a k-D space in which a k-D normal is obtained. 5.5 Multivariate Classification Define the discriminant function for class as and assume 13

Estimation of Parameters Given a sample where Estimation of Parameters Substitute into (A) and ignore 14

i) Quadratic discriminant: The number of parameters to be estimated is means and covariance matrices Share common sample covariance, and ignore . (B) reduces to 15

The numbers of parameters: means and covariance matrices. Ignoring , (C) reduces to ii) Linear discriminant: 16

Assuming off-diagonals of S to be 0, Substitute into (C) 17

iii) Naive Bayes’ classifier: The number of parameters to be estimated is means and d covariance matrices. Assuming all variances to be equal, Plug into (E) The number of parameters to be estimated is means and 1 for . 18

Assuming equal priors and ignore s, iii) Nearest mean classifier: Ignore the common term Assuming equal , iv) Inner product classifier: 19

5.6 Tuning Complexity As we increase complexity, bias decreases but variance increases (bias-variance dilemma). 20

Different covariance matrices fitted to the same data lead to different class shapes and boundaries. 21

5.7 Discrete Features where are binary. If xj ’s are independent, Binary attributes: where are binary. If xj ’s are independent, The discriminant function: 22

Multinomial features: Define Let : the probability that takes value , i.e., If xj ’s are independent, The discriminant function: 23

5.8 Multivariate Regression Multivariate linear model Assume Then, maximizing the likelihood minimizing the sum of squared error Taking derivatives wrt , respectively, 24

In vector-matrix form: , where Solution: 25

Generalizing the linear model: A multivariate linear model leads to a polynomial Model. A nonlinear model can lead to a multivariate linear model. 26