Download presentation
Presentation is loading. Please wait.
Published byJared Conrad McCormick Modified over 9 years ago
1
Linear Methods for Classification Based on Chapter 4 of Hastie, Tibshirani, and Friedman David Madigan
2
Predictive Modeling Goal: learn a mapping: y = f(x; ) Need: 1. A model structure 2. A score function 3. An optimization strategy Categorical y {c 1,…,c m }: classification Real-valued y: regression Note: usually assume {c 1,…,c m } are mutually exclusive and exhaustive
3
Probabilistic Classification Let p(c k ) = prob. that a randomly chosen object comes from c k Objects from c k have: p(x |c k, k ) (e.g., MVN) Then: p(c k | x ) p(x |c k, k ) p(c k ) Bayes Error Rate: Lower bound on the best possible error rate
4
Bayes error rate about 6%
5
Classifier Types Discriminative: model p(c k | x ) - e.g. logistic regression, CART Generative: model p(x |c k, k ) - e.g. “Bayesian classifiers”, LDA
6
Regression for Binary Classification Can fit a linear regression model to a 0/1 response Predicted values are not necessarily between zero and one zeroOneR.txt With p>1, the decision boundary is linear e.g. 0.5 = b0 + b1 x1 + b2 x2
8
Linear Discriminant Analysis K classes, X n × p data matrix. p(c k | x ) p(x |c k, k ) p(c k ) Could model each class density as multivariate normal: LDA assumes for all k. Then: This is linear in x.
9
Linear Discriminant Analysis (cont.) It follows that the classifier should predict: “linear discriminant function” If we don’t assume the k ’s are identicial, get Quadratic DA:
10
Linear Discriminant Analysis (cont.) Can estimate the LDA parameters via maximum likelihood:
12
LDAQDA
14
LDA (cont.) Fisher is optimal if the class are MVN with a common covariance matrix Computational complexity O(mp 2 n)
15
Logistic Regression Note that LDA is linear in x: Linear logistic regression looks the same: But the estimation procedure for the co-efficients is different. LDA maximizes joint likelihood [y,X]; logistic regression maximizes conditional likelihood [y|X]. Usually similar predictions.
16
Logistic Regression MLE For the two-class case, the likelihood is: The maximize need to solve (non-linear) score equations:
17
Logistic Regression Modeling South African Heart Disease Example (y=MI) Coef.S.E.Z score Intercept-4.1300.964-4.285 sbp0.006 1.023 Tobacco0.0800.0263.034 ldl0.1850.0573.219 Famhist0.9390.2254.178 Obesity-0.0350.029-1.187 Alcohol0.0010.0040.136 Age0.0430.0104.184 Wald
18
Regularized Logistic Regression Ridge/LASSO logistic regression Successful implementation with over 100,000 predictor variables Can also regularize discriminant analysis
19
Simple Two-Class Perceptron Define: Classify as class 1 if h(x)>0, class 2 otherwise Score function: # misclassification errors on training data For training, replace class 2 x j ’s by -x j ; now need h(x)>0 Initialize weight vector Repeat one or more times: For each training data point x i If point correctly classified, do nothing Else Guaranteed to converge to a separating hyperplane (if exists)
21
“Optimal” Hyperplane The “optimal” hyperplane separates the two classes and maximizes the distance to the closest point from either class. Finding this hyperplane is a convex optimization problem. This notion plays an important role in support vector machines
23
w x+b=0
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.