This supervised learning technique uses Bayes’ rule but is different in philosophy from the well known work of Aitken, Taroni, et al. Bayes’ rule: Pr is.

Slides:



Advertisements
Similar presentations
Managerial Economics in a Global Economy
Advertisements

Regression analysis Relating two data matrices/tables to each other Purpose: prediction and interpretation Y-data X-data.
Component Analysis (Review)
Discriminant Analysis Database Marketing Instructor:Nanda Kumar.
Dimension reduction (1)
Simple Regression. Major Questions Given an economic model involving a relationship between two economic variables, how do we go about specifying the.
Pattern Classification. Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory–Continuous Features.
Chapter 4: Linear Models for Classification
A Short Introduction to Curve Fitting and Regression by Brad Morantz
x – independent variable (input)
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Lecture 8 The Principle of Maximum Likelihood. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
Linear Methods for Classification
Basics of discriminant analysis
Tables, Figures, and Equations
1 Linear Classification Problem Two approaches: -Fisher’s Linear Discriminant Analysis -Logistic regression model.
Discriminant Analysis Testing latent variables as predictors of groups.
Classification and Prediction: Regression Analysis
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Summarized by Soo-Jin Kim
Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.
Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 
Object Orie’d Data Analysis, Last Time OODA in Image Analysis –Landmarks, Boundary Rep ’ ns, Medial Rep ’ ns Mildly Non-Euclidean Spaces –M-rep data on.
Learning Theory Reza Shadmehr Bayesian Learning 2: Gaussian distribution & linear regression Causal inference.
CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.
Principles of Pattern Recognition
Discriminant Function Analysis Basics Psy524 Andrew Ainsworth.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
2.4 Nonnegative Matrix Factorization  NMF casts matrix factorization as a constrained optimization problem that seeks to factor the original matrix into.
3.2 - Least- Squares Regression. Where else have we seen “residuals?” Sx = data point - mean (observed - predicted) z-scores = observed - expected * note.
Linear Discriminant Analysis and Its Variations Abu Minhajuddin CSE 8331 Department of Statistical Science Southern Methodist University April 27, 2002.
BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity
BCS547 Neural Decoding.
Discriminant Analysis
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 3 Association: Contingency, Correlation, and Regression Section 3.3 Predicting the Outcome.
Linear Methods for Classification Based on Chapter 4 of Hastie, Tibshirani, and Friedman David Madigan.
Return to Big Picture Main statistical goals of OODA: Understanding population structure –Low dim ’ al Projections, PCA … Classification (i. e. Discrimination)
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Principal Component Analysis (PCA)
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
Linear Classifiers Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
 Seeks to determine group membership from predictor variables ◦ Given group membership, how many people can we correctly classify?
Chapters 8 Linear Regression. Correlation and Regression Correlation = linear relationship between two variables. Summarize relationship with line. Called.
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L11.1 Lecture 11: Canonical correlation analysis (CANCOR)
PCA Data Represent ’ n (Cont.). PCA Simulation Idea: given Mean Vector Eigenvectors Eigenvalues Simulate data from Corresponding Normal Distribution.
Dimension reduction (2) EDR space Sliced inverse regression Multi-dimensional LDA Partial Least Squares Network Component analysis.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Chapter 4: Basic Estimation Techniques
Return to Big Picture Main statistical goals of OODA:
Chapter 4 Basic Estimation Techniques
Basic Estimation Techniques
LECTURE 10: DISCRIMINANT ANALYSIS
CH 5: Multivariate Methods
Classification Discriminant Analysis
Basic Estimation Techniques
Classification Discriminant Analysis
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Parametric Estimation
Principal Components Analysis
Generally Discriminant Analysis
Mathematical Foundations of BME
LECTURE 09: DISCRIMINANT ANALYSIS
Parametric Methods Berlin Chen, 2005 References:
A medical researcher wishes to determine how the dosage (in mg) of a drug affects the heart rate of the patient. Find the correlation coefficient & interpret.
Unsupervised Learning
Presentation transcript:

This supervised learning technique uses Bayes’ rule but is different in philosophy from the well known work of Aitken, Taroni, et al. Bayes’ rule: Pr is probability Equation means: “How does the probability of an item being a member of group change, given evidence x” Bayesian Discriminant Analysis Prior probability This can be a problem!

Bayesian Discriminant Analysis Bayes’ rule can be turned into a classification rule: => Choose group 1 *If priors are both 0.5, decision boundaries are where curves cross

If the data is multivariate normal drawn from the same population, the decision rule becomes: Bayes-Gaussian Discriminant Analysis slope intercept with the “distance” defined as: and Note that if the data is just 1D this is just an equation for a line: Like an average cov mat

If the data is multivariate normal but drawn from different populations, the decision rule is the same but the “decision distance” becomes: Bayes-Gaussian Discriminant Analysis b c Note that if the data is just 1D this is an equation for a parabola: New quadratic term a

The “quadratic” version is always called quadratic discriminant analysis, QDA The “linear” version is called by a number of names! linear discriminant analysis, LDA Some combination of of the above with the words, Gaussian or classification A number of techniques use the name LDA! Important to specify the equations used to tell the difference! Bayes-Gaussian Discriminant Analysis

Groups have similar covariance structure: linear discriminant rule should work well Groups have different covariance structure: quadratic discriminant rule may work better

This supervised technique is called Linear Discriminant Analysis (LDA) in R Also called Fisher linear discriminant analysis CVA is closely related to linear Bayes-Gaussian discriminant analysis Works on a principle similar to PCA: Look for “interesting directions in data space” CVA: Find directions in space which best separate groups. Technically: find directions which maximize ratio of between group to within variation Canonical Variate Analysis

Project on PC1: Not necessarily good group separation! Project on CV1: Good group separation! Note: There are #groups -1 or p CVs which ever is smaller

Use between-group to within-group covariance matrix, W -1 B to find directions of best group separation (CVA loadings, A cv ): CVA can be used for dimension reduction. Caution! These “dimensions” are not at right angles (i.e. not orthogonal) CVA plots can thus be distorted from reality Always check loading angles! Caution! CVA will not work well with very correlated data Canonical Variate Analysis

2D CVA of gasoline data set: 2D PCA of gasoline data set:

Distance metric used in CVA to assign group i.d. of an unknown data point: If data is Gaussian and group covariance structures are the same then CVA classification is the same as Bayes-Gaussian classification. Canonical Variate Analysis

PLS-DA is a supervised discrimination technique and very popular in chemometrics Works well with highly correlated variables (like in spectroscopy) Lots of correlation causes CVA to fail! Group labels coded into a “response matrix” Y PLS searches for directions of maximum covariance in X and Y. Loading for X can be used like PCA loading Dimension reduction Loading plots Partial Least Squares Discriminant Analysis

PLS-DA theory: Find an (approximate) linear relationship between experimental (explanatory) variables and group labels (response variables) Y=XB+E X=TP T +E X Y=UQ T +E Y So substituting: UQ T =TP T B+E exp. vars. lbls. “error” or “residuals” matrix PLS-scores PLS-loadings *Use these “Y-scores” with a “soft-max” or “Bayes” to pick “most-likely” group label

Partial Least Squares Discriminant Analysis How do we solve this for T, P and U?? Objective: maximize covariance between X and Y scores, T and U. Various procedure to do this: Kernel-PLS SIMPLS NIPLS Give close, but slightly different numerical results In R, functions are: plsr (pls package) spls (spls package) Easiest: plsda (caret pakage)

Partial Least Squares Discriminant Analysis 2D PLS of gasoline data set: 2D PCA of gasoline data set:

Group assignments of observation vectors are made by interpreting Y scores. Typically “soft-max” function is used. Partial Least Squares Discriminant Analysis Observation Vectors Y-scores