Linear Methods for Classification : Presentation for MA seminar in statistics Eli Dahan
Outline Introduction - problem and solution LDA - Linear Discriminant Analysis LR : Logistic Regression (Linear Regression) LDA Vs. LR In a word – Separating Hyperplanes
Introduction- the problem X Group k Observation Or Group l? *We can think of G as “group label” Posteriori Pj=P(G=j|X=x)
Introduction- the solution Linear Decision boundary: p k =p l p k >p l choose K p l >p k choose L
Linear Discriminant Analysis Let P(G = k) = k and P(X=x|G=k) = f k (x) Then by bayes rule: Decision boundary:
Linear Discriminant Analysis Assuming f k (x) ~ gauss( k, k ) and 1 = 2 = … = K = We get Linear (in x) decision boundary For not common we get QDA (RDA)
Linear Discriminant Analysis Using empirical estimation methods: Top classifier (Michie et al., 1994) – the data supports linear boundaries, stability
Logistic Regression Models posterior prob. Of K classes; they sum to one and remain in [0,1]: Linear Decision boundary:
Logistic Regression Model fit: In max. ML Newton-Raphson algorithm is used
Linear Regression Recall the common features of multivariate regression: +Lack of multicollinearity etc. Here: Assuming N instances (N*p observation matrix X), Y is a N*K indicator response matrix (K classes).
Linear Regression
LDA Vs. LR Similar results, LDA slightly better (56% vs. 67% error rate for LR) Presumably, they are identical because of the linear end-form of decision boundaries (return to see).
LDA Vs. LR LDA: parameters fit by max. full log-likelihood based on the joint density which assumes Gaussian density (Efron 1975 – worst case of ignoring gaussianity 30% eff. reduction) Linearity is derived LR: P(X) arbitrary (advantage in model selection and abitility to absorb extreme X values), fits parameters of P(G|X) by maximizing the conditional likelihood. Linearity is assumed
In a word – separating hyperplanes