Download presentation
Presentation is loading. Please wait.
1
Lecture 04: Logistic Regression
CS489/698: Intro to ML Lecture 04: Logistic Regression 9/21/17 Yao-Liang Yu
2
Outline Announcements Baseline Learning “Machine Learning” Pyramid
Regression or Classification (that’s it!) History of Classification History of Solvers (Analytical to Convex to “Non-Convex but smooth”) Convexity SGD Perceptron Review Bernoulli model / Logistic Regression Tensorflow Playground / Demo code Multiclass 9/21/17 Yao-Liang Yu
3
Announcements Assignment 1 due next Tuesday 9/21/17 Yao-Liang Yu
4
Outline Announcements Baseline Learning “Machine Learning” Pyramid
Regression or Classification (that’s it!) History of Classification History of Solvers (Analytical to Convex to “Non-Convex but smooth”) Convexity SGD Perceptron Review Bernoulli model / Logistic Regression Tensorflow Playground / Demo code Multiclass 9/21/17 Yao-Liang Yu
5
Francois Chaubard and Agastya Kalra
Baseline Assuming Lin Alg Basics: 9/21/17 Francois Chaubard and Agastya Kalra
6
Outline Announcements Baseline Learning “Machine Learning” Pyramid
Regression or Classification (that’s it!) History of Classification History of Solvers (Analytical to Convex to “Non-Convex but smooth”) Convexity SGD Perceptron Review Bernoulli model / Logistic Regression Tensorflow Playground / Demo code Multiclass 9/21/17 Yao-Liang Yu
7
Francois Chaubard and Agastya Kalra
ML Pyramid Deep Learning Machine Learning (anything fancier than simple Sklearn fit / predict calls) Software Engineering Convex Optimization Information Theory Linear Algebra Probability and Statistics 9/21/17 Francois Chaubard and Agastya Kalra
8
Outline Announcements Baseline Learning “Machine Learning” Pyramid
Regression or Classification (that’s it!) History of Classification History of Solvers (Analytical to Convex to “Non-Convex but smooth”) Convexity SGD Perceptron Review Bernoulli model / Logistic Regression Tensorflow Playground / Demo code Multiclass 9/21/17 Yao-Liang Yu
9
Regression or Classification
9/21/17 Francois Chaubard and Agastya Kalra
10
Outline Announcements Baseline Learning “Machine Learning” Pyramid
Regression or Classification (that’s it!) History of Classification History of Solvers (Analytical to Convex to “Non-Convex but smooth”) Convexity SGD Perceptron Review Bernoulli model / Logistic Regression Tensorflow Playground / Demo code Multiclass 9/21/17 Yao-Liang Yu
11
Francois Chaubard and Agastya Kalra
Classification Higher “complexity” datasets need higher ”capacity” models (Formally, higher VC-Dimensionality) Perceptron !Perceptron Logistic Regression !Logistic Regression ? MLP Feature Eng etc etc 9/21/17 Francois Chaubard and Agastya Kalra
12
Outline Announcements Baseline Learning “Machine Learning” Pyramid
Regression or Classification (that’s it!) History of Classification History of Solvers (Analytical to Convex to “Non-Convex but smooth”) Convexity SGD Perceptron Review Bernoulli model / Logistic Regression Tensorflow Playground / Demo code Multiclass 9/21/17 Yao-Liang Yu
13
Francois Chaubard and Agastya Kalra
History of Solvers Closed Form <1950s Iterative Methods+Convex Iterative Methods+Smooth 2012+ Least Squares etc Interior Point Methods Log Barrier etc Deep Learning etc 9/21/17 Francois Chaubard and Agastya Kalra
14
Outline Announcements Baseline Learning “Machine Learning” Pyramid
Regression or Classification (that’s it!) History of Classification History of Solvers (Analytical to Convex to “Non-Convex but smooth”) Convexity SGD Perceptron Review Bernoulli model / Logistic Regression Tensorflow Playground / Demo code Multiclass 9/21/17 Yao-Liang Yu
15
Francois Chaubard and Agastya Kalra
Convexity Jensen’s Inequality 9/21/17 Francois Chaubard and Agastya Kalra
16
Outline Announcements Baseline Learning “Machine Learning” Pyramid
Regression or Classification (that’s it!) History of Classification History of Solvers (Analytical to Convex to “Non-Convex but smooth”) Convexity SGD Perceptron Review Bernoulli model / Logistic Regression Tensorflow Playground / Demo code Multiclass 9/21/17 Yao-Liang Yu
17
Francois Chaubard and Agastya Kalra
SGD 9/21/17 Francois Chaubard and Agastya Kalra
18
Outline Announcements Baseline Learning “Machine Learning” Pyramid
Regression or Classification (that’s it!) History of Classification History of Solvers (Analytical to Convex to “Non-Convex but smooth”) Convexity SGD Perceptron Review Bernoulli model / Logistic Regression Tensorflow Playground / Demo code Multiclass 9/21/17 Yao-Liang Yu
19
Francois Chaubard and Agastya Kalra
Perceptron Issues: 9/21/17 Francois Chaubard and Agastya Kalra
20
Outline Announcements Baseline Learning “Machine Learning” Pyramid
Regression or Classification (that’s it!) History of Classification History of Solvers (Analytical to Convex to “Non-Convex but smooth”) Convexity SGD Perceptron Review Bernoulli model / Logistic Regression Tensorflow Playground / Demo code Multiclass 9/21/17 Yao-Liang Yu
21
Bernoulli model Let P(Y=1 | X=x) = p(x; w), parameterized by w
Conditional likelihood on {(x1, y1), … (xn, yn)}: simplifies if independence holds Assuming yi is {0,1}-valued 9/21/17 Yao-Liang Yu
22
Naïve solution Find w to maximize conditional likelihood
What is the solution if p(x; w) does not depend on x? What is the solution if p(x; w) does not depend on ? 9/21/17 Yao-Liang Yu
23
Generalized linear models (GLM)
y ~ Bernoulli(p); p = p(x; w) natural parameter Logistic regression y ~ Normal(μ, σ2); μ = μ(x; w) (weighted) least-squares regression GLM: y ~ exp( θ φ(y) – A(θ) ) log-partition function sufficient statistics 9/21/17 Yao-Liang Yu
24
Logit transform p(x; w) = wTx? p >=0 not guaranteed…
log p(x; w) = wTx? better! LHS negative, RHS real-valued… Logit transform Or equivalently odds ratio 9/21/17 Yao-Liang Yu
25
Prediction with confidence
ŷ = 1 if p = P(Y=1 | X=x) > ½ iff wTx > 0 Decision boundary wTx = 0 ŷ = sign(wTx) as before, but with confidence p(x; w) 9/21/17 Yao-Liang Yu
26
Not just a classification algorithm
Logistic regression does more than classification it estimates conditional probabilities under the logit transform assumption Having confidence in prediction is nice the price is an assumption that may or may not hold If classification is the sole goal, then doing extra work as shall see, SVM only estimates decision boundary 9/21/17 Yao-Liang Yu
27
More than logistic regression
F(p) transforms p from [0,1] to R Then, equating F(p) to a linear function wTx But, there are many other choices for F! precisely the inverse of any distribution function! 9/21/17 Yao-Liang Yu
28
Logistic distribution
Cumulative Distribution Function Mean mu, variance s2π2/3 9/21/17 Yao-Liang Yu
29
Outline Announcements Baseline Learning “Machine Learning” Pyramid
Regression or Classification (that’s it!) History of Classification History of Solvers (Analytical to Convex to “Non-Convex but smooth”) Convexity SGD Perceptron Review Bernoulli model / Logistic Regression Tensorflow Playground / Demo code Multiclass 9/21/17 Yao-Liang Yu
30
Francois Chaubard and Agastya Kalra
Playground 9/21/17 Francois Chaubard and Agastya Kalra
31
Tensorflow coding example
9/21/17 Francois Chaubard and Agastya Kalra
32
Outline Announcements Baseline Learning “Machine Learning” Pyramid
Regression or Classification (that’s it!) History of Classification History of Solvers (Analytical to Convex to “Non-Convex but smooth”) Convexity SGD Perceptron Review Bernoulli model / Logistic Regression Tensorflow Playground / Demo code Multiclass 9/21/17 Yao-Liang Yu
33
More than 2 classes Softmax 9/21/17 Yao-Liang Yu
34
More than 2 classes Softmax Again, nonnegative and sum to 1
Negative log-likelihood (y is one-hot) 9/21/17 Yao-Liang Yu
35
Questions? 9/21/17 Yao-Liang Yu
36
backup 9/21/17 Yao-Liang Yu
37
Classification revisited
ŷ = sign( xTw + b ) How confident we are about ŷ? |xTw + b| seems a good indicator real-valued; hard to interpret ways to transform into [0,1] Better(?) idea: learn confidence directly 9/21/17 Yao-Liang Yu
38
Conditional probability
P(Y=1 | X=x): conditional on seeing x, what is the chance of this instance being positive, i.e., Y=1? obviously, value in [0,1] P(Y=0 | X=x) = 1 – P(Y=1 | X=x), if two classes more generally, sum to 1 Notation (Simplex). Δk-1 := { p in Rk: p ≥ 0, Σi pi = 1 } 9/21/17 Yao-Liang Yu
39
Reduction to a harder problem
P(Y=1 | X=x) = E(1Y=1 | X=x) Let Z = 1Y=1, then regression function for (X, Z) use linear regression for binary Z? Exploit structure! conditional probabilities are in a simplex Never reduce to unnecessarily harder problem 9/21/17 Yao-Liang Yu
40
Maximum likelihood Minimize negative log-likelihood 9/21/17
Yao-Liang Yu
41
Newton’s algorithm η = 1: iterative weighted least-squares PSD
Uncertain predictions get bigger weight η = 1: iterative weighted least-squares 9/21/17 Yao-Liang Yu
42
A word about implementation
Numerically computing exponential can be tricky easily underflows or overflows The usual trick estimate the range of the exponents shift the mean of the exponents to 0 9/21/17 Yao-Liang Yu
43
Robustness Bounded derivative Variational exponential
Larger exp loss gets smaller weights 9/21/17 Yao-Liang Yu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.