Logistic Regression Classification Machine Learning
Classification: y = 0 or 1 can be > 1 or < 0 Logistic Regression:
Sigmoid function Logistic function Logistic Regression Model Want 1 0.5 Sigmoid function Logistic function
Logistic Regression Cost function Machine Learning
Training set: m examples How to choose parameters ?
Logistic regression cost function If y = 1 1
Logistic regression cost function If y = 0 1
Logistic regression cost function
Logistic regression cost function To fit parameters : To make a prediction given new : Output
Gradient Descent Want : Repeat (simultaneously update all )
Algorithm looks identical to linear regression! Gradient Descent Want : Repeat (simultaneously update all ) Algorithm looks identical to linear regression!
Example: function [jVal, gradient] = costFunction(theta) jVal = (theta(1)-5)^2 + ... (theta(2)-5)^2; gradient = zeros(2,1); gradient(1) = 2*(theta(1)-5); gradient(2) = 2*(theta(2)-5); options = optimset(‘GradObj’, ‘on’, ‘MaxIter’, ‘100’); initialTheta = zeros(2,1); [optTheta, functionVal, exitFlag] ... = fminunc(@costFunction, initialTheta, options);
code to compute code to compute code to compute code to compute theta = function [jVal, gradient] = costFunction(theta) jVal = [ ]; code to compute gradient(1) = [ ]; code to compute gradient(2) = [ ]; code to compute gradient(n+1) = [ ]; code to compute
The problem of overfitting Regularization The problem of overfitting Machine Learning
Example: Linear regression (housing prices) Size Size Size Overfitting: If we have too many features, the learned hypothesis may fit the training set very well ( ), but fail to generalize to new examples (predict prices on new examples).
Example: Logistic regression ( = sigmoid function)
Addressing overfitting: size of house no. of bedrooms Price no. of floors age of house average income in neighborhood Size kitchen size
Addressing overfitting: Options: Reduce number of features. Manually select which features to keep. Model selection algorithm (later in course). Regularization. Keep all the features, but reduce magnitude/values of parameters . Works well when we have a lot of features, each of which contributes a bit to predicting .
Regularization Cost function Machine Learning
Suppose we penalize and make , really small. Intuition Price Price Size of house Size of house Suppose we penalize and make , really small.
Regularization. Small values for parameters “Simpler” hypothesis Less prone to overfitting Housing: Features: Parameters:
Regularization. Price Size of house
In regularized linear regression, we choose to minimize What if is set to an extremely large value (perhaps for too large for our problem, say )? Price Size of house
Regularized linear regression Regularization Regularized linear regression Machine Learning
Regularized linear regression
Gradient descent Repeat
Regularized logistic regression Regularization Regularized logistic regression Machine Learning
Regularized logistic regression. x1 x2 Cost function:
Gradient descent Repeat
Evaluating a hypothesis Advice for applying machine learning Evaluating a hypothesis Machine Learning
Evaluating your hypothesis Fails to generalize to new examples not in training set. price size of house no. of bedrooms no. of floors size age of house average income in neighborhood kitchen size
Evaluating your hypothesis Dataset: Size Price 2104 400 1600 330 2400 369 1416 232 3000 540 1985 300 1534 315 1427 199 1380 212 1494 243
Model selection and training/validation/test sets Advice for applying machine learning Model selection and training/validation/test sets Machine Learning
Overfitting example Once parameters were fit to some set of data (training set), the error of the parameters as measured on that data (the training error xxxxx) is likely to be lower than the actual generalization error. price size
Model selection 1. 2. 3. 10. Choose How well does the model generalize? Report test set error . Problem: is likely to be an optimistic estimate of generalization error. I.e. our extra parameter ( = degree of polynomial) is fit to test set.
Evaluating your hypothesis Dataset: Size Price 2104 400 1600 330 2400 369 1416 232 3000 540 1985 300 1534 315 1427 199 1380 212 1494 243
Train/validation/test error Training error: Cross Validation error: Test error:
Diagnosing bias vs. variance Advice for applying machine learning Diagnosing bias vs. variance Machine Learning
Bias/variance High bias (underfit) “Just right” High variance Price Size Price Price Size Size High bias (underfit) “Just right” High variance (overfit)
Cross validation error: Bias/variance Training error: Cross validation error: error size price degree of polynomial d
Diagnosing bias vs. variance Suppose your learning algorithm is performing less well than you were hoping. ( or is high.) Is it a bias problem or a variance problem? Bias (underfit): degree of polynomial d error (cross validation error) Variance (overfit): (training error)
Regularization and bias/variance Advice for applying machine learning Regularization and bias/variance Machine Learning
High variance (overfit) Linear regression with regularization Model: Price Size Price Price Size Size Large xx High bias (underfit) Intermediate xx “Just right” Small xx High variance (overfit)
Choosing the regularization parameter
Choosing the regularization parameter Model: Try Pick (say) . Test error:
Bias/variance as a function of the regularization parameter
Advice for applying machine learning Learning curves Machine Learning
Learning curves error (training set size)
High bias price error size (training set size) price If a learning algorithm is suffering from high bias, getting more training data will not (by itself) help much. size
High variance (and small ) error price size (training set size) price If a learning algorithm is suffering from high variance, getting more training data is likely to help. size