CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: 6: Logistic Regression 1 CSC M.A. Papalaskari - Villanova University T he slides in this presentation are adapted from: Andrew Ng’s ML course
Machine learning problems Supervised Learning – Classification – Regression Unsupervised learning Others: Reinforcement learning, recommender systems. Also talk about: Practical advice for applying learning algorithms. CSC M.A. Papalaskari - Villanova University 2
Machine learning problems Supervised Learning – Classification – Regression Unsupervised learning Others: Reinforcement learning, recommender systems. Also talk about: Practical advice for applying learning algorithms. CSC M.A. Papalaskari - Villanova University 3
Classification Spam / Not Spam? Online Transactions: Fraudulent (Yes / No)? Tumor: Malignant / Benign ? 0: “Negative Class” (e.g., benign tumor) 1: “Positive Class” (e.g., malignant tumor)
Tumor Size Threshold classifier output at 0.5: If, predict “y = 1” If, predict “y = 0” Tumor Size Malignant ? (Yes) 1 (No) 0
Classification: y = 0 or 1 can be > 1 or < 0 Logistic Regression:
New model: Use Sigmoid function (or Logistic function) Logistic Regression Model Want Old regression model:
Interpretation of Hypothesis Output = estimated probability that y = 1 on input x Tell patient that 70% chance of tumor being malignant Example: If “probability that y = 1, given x, parameterized by ”
Logistic regression Suppose predict “ ” if predict “ “ if z 1
x1x1 x2x2 Decision Boundary Predict “ “ if
Non-linear decision boundaries x1x1 x2x2 Predict “ “ if 1 1
Training set: How to choose parameters ? m examples
Cost function Linear regression:
Logistic regression cost function If y = 1 1 0
Logistic regression cost function If y = 0 1 0
Logistic regression cost function
Logistic regression cost function – more compact expression
Output Logistic regression cost function – more compact expression To fit parameters : To make a prediction given new :
Gradient Descent Want : Repeat (simultaneously update all )
Gradient Descent Want : (simultaneously update all ) Repeat Algorithm looks identical to linear regression!
Optimization algorithm Given, we have code that can compute - (for ) Optimization algorithms: -Gradient descent -Conjugate gradient -BFGS -L-BFGS BFGS= Broyden Fletcher Goldfarb Shanno Advantages: -No need to manually pick -Often faster than gradient descent. Disadvantages: -More complex
Multiclass classification foldering/tagging: Work, Friends, Family, Hobby Medical diagrams: Not ill, Cold, Flu Weather: Sunny, Cloudy, Rain, Snow
x1x1 x2x2 x1x1 x2x2 Binary classification: Multi-class classification:
x1x1 x2x2 One-vs-all (one-vs-rest): Class 1: Class 2: Class 3: x1x1 x2x2 x1x1 x2x2 x1x1 x2x2
One-vs-all Train a logistic regression classifier for each class to predict the probability that. On a new input, to make a prediction, pick the class that maximizes