Multivariate Methods Berlin Chen

Slides:



Advertisements
Similar presentations
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Advertisements

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Classification and risk prediction
Lecture Notes for CMPUT 466/551 Nilanjan Ray
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
Computer vision: models, learning and inference Chapter 3 Common probability distributions.
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
Chapter Two Probability Distributions: Discrete Variables
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
0 Pattern Classification, Chapter 3 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda,
Principles of Pattern Recognition
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Whitening.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
Machine Learning Recitation 6 Sep 30, 2009 Oznur Tastan.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
Linear Models for Classification
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Machine Learning 5. Parametric Methods.
Review of statistical modeling and probability theory Alan Moses ML4bio.
Linear Classifiers Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Crash course in probability theory and statistics – part 2 Machine Learning, Wed Apr 16, 2008.
Applied statistics Usman Roshan.
Lecture 2. Bayesian Decision Theory
Usman Roshan CS 675 Machine Learning
Chapter 3: Maximum-Likelihood Parameter Estimation
Probability Theory and Parameter Estimation I
Multivariate Methods Slides from Machine Learning by Ethem Alpaydin
Ch8: Nonparametric Methods
CH 5: Multivariate Methods
Computer vision: models, learning and inference
Overview of Supervised Learning
Pattern Classification, Chapter 3
Distributions and Concepts in Probability Theory
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
EE513 Audio Signals and Systems
Generally Discriminant Analysis
LECTURE 07: BAYESIAN ESTIMATION
Parametric Methods Berlin Chen, 2005 References:
Mathematical Foundations of BME
Multivariate Methods Berlin Chen, 2005 References:
Linear Discrimination
Machine Learning – a Probabilistic Perspective
Test #1 Thursday September 20th
Hairong Qi, Gonzalez Family Professor
Presentation transcript:

Multivariate Methods Berlin Chen Graduate Institute of Computer Science & Information Engineering National Taiwan Normal University References: 1. Introduction to Machine Learning , Chapter 5

Multivariate Methods Input: a data sample with multiple features (variables/ inputs) Output (of Prediction) Classification: class code (discrete variable) Regression: real number (continuous variable) Supervised learning Model/function to be trained with labeled training samples Model/Function A, B, C, … or 0.3, 1.5, 2.3,…

Multivariate Data (1/2) Each data sample is represented by an observation vector with dimensions Each dimension of the vector is termed input/feature/attribute Dimensions with different types and value domains The whole data set of size can be viewed as a data matrix Features are usually assumed correlated ! Otherwise, there is no need for a multivariate analysis samples/instances/items Reduced to a naïve Bayes’ classifier Inputs/features/attributes

Multivariate Data (2/2) Motivations for multivariate data analysis Simplification Assume that the large body of data can be well summarized by means of relatively few parameters Exploration Predict the value of one variable form the values of other variables Multivariate classification (Discrete) Multivariate regression (Numeric)

Parameter Estimation (1/4) Mean of data samples Covariance matrix of data samples Mean of Column 1 of matrix X Mean of Column d of matrix X

Parameter Estimation (2/4) Covariance matrix is symmetric Diagonal terms: variances Off-diagonal terms: covariances Correlation between two variables and Two variables independent → covariance =correlation=0 But covariance =correlation=0 does not imply two variables are independent (nonlinear dependence) independent → uncorrelated

Parameter Estimation (3/4) If and are linear dependent

Parameter Estimation (4/4) Maximum Likelihood Estimators Sample mean as an estimator for mean Sample covariance matrix as an estimator for covariance matrix Sample correlation coefficients

Multivariate Normal Distribution (1/4) A random variable that is -dimensional and normal distributed, is denoted as

Multivariate Normal Distribution (2/4) If each dimension is independent of one another and with same variance value If each dimension is independent of one another and with different variance value If each dimension is dependent of one another with

Multivariate Normal Distribution (3/4) If the components of random variable are independent

Multivariate Normal Distribution (4/4) Recall: The projection of a d-dimensional normal distribution on a vector is univariate normal (suppose that ) The projection of a d-dimensional normal distribution to a k-dimensional space is k-variate normal scalar scalar k-dim vector kxk matrix

Multivariate Classification (1/2) Normal density as the class-conditional probability of random variable Define the discriminant function as

Multivariate Classification (2/2) Maximum likelihood (ML) training of classifiers Given a set of labeled samples

Quadratic Discriminant (1/2) The discriminant function with normal class-conditional density can be expressed as a quadratic discriminant quadratic discriminant

Quadratic Discriminant (2/2) likelihood posterior likelihood

Linear Discriminant (1/2) The discriminant function with normal class-conditional density sharing the same covariance matrix can be expressed as a linear discriminant All classes share the same covariance matrix

Linear Discriminant (2/2) covariances (off-diagonal terms of covariance matrix) are not equal to zero likelihood

Naïve Bayes’ Classifier (1/2) The discriminant function with a normal class-conditional density sharing the same diagonal covariance matrix can be expressed as a naïve Bayes’ classifier likelihood variances (diagonal terms of covariance matrix) are not equal

Naïve Bayes’ Classifier (2/2) If the variances (diagonal terms) of the naïve Bayes’ classifier are further set to equal for all dimensions The Mahalanobis distance is reduced to Euclidean distance. (all variables/features have the same variance and are independent) likelihood

Nearest Mean Classifier Assign the data sample to the class of the nearest mean If the priors are further set to equal

Tuning Complexity Tradeoff between the bias and variance of an estimator Simplifying covariance matrix → decreasing number of parameters → introducing estimation bias Arbitrary covariance matrix → much more data is needed → introducing estimation variance Regularized Discriminant Analysis (RDA, 1989) Use a weighted average of three special cases of covariance matrix a shared diagonal covariance matrix (with equal variance) for all classes a shared covariance matrix for all classes a specific covariance matrix for each class

Discrete Features (1/4) Features that take one of n different values E.g. If each feature is a binary random variable Bernoulli distribution If features are further assumed to be independent Assumption for Naïve Bayes’ classifier Discriimant function Linear ?

Discrete Features (2/4) Appendix A Linear

Discrete Features (3/4) Maximum likelihood estimation for (for binary variable) Extension: features are independent multinomial random variables Define the probability that belongs to and take :

Discrete Features (4/4) Assumption for Naïve Bayes’ classifier Discriimant function Linear ? Maximum likelihood estimation for