Logistic Regression Classification Machine Learning.

Slides:



Advertisements
Similar presentations
Neural Networks: Learning
Advertisements

Machine Learning and Data Mining Linear regression
Pattern Recognition and Machine Learning
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Indian Statistical Institute Kolkata
Logistic Regression Classification Machine Learning.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Classification and Prediction: Regression Analysis
Classification Part 3: Artificial Neural Networks
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Machine Learning Introduction Study on the Coursera All Right Reserved : Andrew Ng Lecturer:Much Database Lab of Xiamen University Aug 12,2014.
Model representation Linear regression with one variable
Andrew Ng Linear regression with one variable Model representation Machine Learning.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 3 Practical Notes Regularisation *Courtesy of Associate Professor Andrew.
M Machine Learning F# and Accord.net. Alena Dzenisenka Software architect at Luxoft Poland Member of F# Software Foundation Board of Trustees Researcher.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.
Logistic Regression Week 3 – Soft Computing By Yosi Kristian.
Dimensionality Reduction Motivation I: Data Compression Machine Learning.
ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 3 Practical Notes Application Advice *Courtesy of Associate Professor Andrew.
The problem of overfitting
Regularization (Additional)
M Machine Learning F# and Accord.net.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Chapter 2-OPTIMIZATION
Machine Learning 5. Parametric Methods.
Support Vector Machines Optimization objective Machine Learning.
WEEK 2 SOFT COMPUTING & MACHINE LEARNING YOSI KRISTIAN Gradient Descent for Linear Regression.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Model selection –Error decomposition –Bias-Variance Tradeoff –Classification:
Computational Intelligence: Methods and Applications Lecture 14 Bias-variance tradeoff – model selection. Włodzisław Duch Dept. of Informatics, UMK Google:
Evaluating Classifiers
Machine Learning & Deep Learning
CSE 4705 Artificial Intelligence
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
ECE 5424: Introduction to Machine Learning
Bias and Variance of the Estimator
Machine Learning – Regression David Fenyő
Recognition - III.
Logistic Regression Classification Machine Learning.
CS 2750: Machine Learning Line Fitting + Bias-Variance Trade-off
Logistic Regression Classification Machine Learning.
10701 / Machine Learning Today: - Cross validation,
CS 1675: Intro to Machine Learning Regression and Overfitting
Deep Learning for Non-Linear Control
ML – Lecture 3B Deep NN.
Overfitting and Underfitting
Bias-variance Trade-off
Support Vector Machine I
Softmax Classifier.
Neural networks (1) Traditional multi-layer perceptrons
Where does the error come from?
Machine Learning Algorithms – An Overview
Machine learning overview
The Bias-Variance Trade-Off
Logistic Regression Classification Machine Learning.
Shih-Yang Su Virginia Tech
Multiple features Linear Regression with multiple variables
Multiple features Linear Regression with multiple variables
Logistic Regression Classification Machine Learning.
Introduction to Machine learning
Linear regression with one variable
Machine Learning in Business John C. Hull
Logistic Regression Classification Machine Learning.
Logistic Regression Geoff Hulten.
Presentation transcript:

Logistic Regression Classification Machine Learning

Classification: y = 0 or 1 can be > 1 or < 0 Logistic Regression:

Sigmoid function Logistic function Logistic Regression Model Want 1 0.5 Sigmoid function Logistic function

Logistic Regression Cost function Machine Learning

Training set: m examples How to choose parameters ?

Logistic regression cost function If y = 1 1

Logistic regression cost function If y = 0 1

Logistic regression cost function

Logistic regression cost function To fit parameters : To make a prediction given new : Output

Gradient Descent Want : Repeat (simultaneously update all )

Algorithm looks identical to linear regression! Gradient Descent Want : Repeat (simultaneously update all ) Algorithm looks identical to linear regression!

Example: function [jVal, gradient] = costFunction(theta) jVal = (theta(1)-5)^2 + ... (theta(2)-5)^2; gradient = zeros(2,1); gradient(1) = 2*(theta(1)-5); gradient(2) = 2*(theta(2)-5); options = optimset(‘GradObj’, ‘on’, ‘MaxIter’, ‘100’); initialTheta = zeros(2,1); [optTheta, functionVal, exitFlag] ... = fminunc(@costFunction, initialTheta, options);

code to compute code to compute code to compute code to compute theta = function [jVal, gradient] = costFunction(theta) jVal = [ ]; code to compute gradient(1) = [ ]; code to compute gradient(2) = [ ]; code to compute gradient(n+1) = [ ]; code to compute

The problem of overfitting Regularization The problem of overfitting Machine Learning

Example: Linear regression (housing prices) Size Size Size Overfitting: If we have too many features, the learned hypothesis may fit the training set very well ( ), but fail to generalize to new examples (predict prices on new examples).

Example: Logistic regression ( = sigmoid function)

Addressing overfitting: size of house no. of bedrooms Price no. of floors age of house average income in neighborhood Size kitchen size

Addressing overfitting: Options: Reduce number of features. Manually select which features to keep. Model selection algorithm (later in course). Regularization. Keep all the features, but reduce magnitude/values of parameters . Works well when we have a lot of features, each of which contributes a bit to predicting .

Regularization Cost function Machine Learning

Suppose we penalize and make , really small. Intuition Price Price Size of house Size of house Suppose we penalize and make , really small.

Regularization. Small values for parameters “Simpler” hypothesis Less prone to overfitting Housing: Features: Parameters:

Regularization. Price Size of house

In regularized linear regression, we choose to minimize What if is set to an extremely large value (perhaps for too large for our problem, say )? Price Size of house

Regularized linear regression Regularization Regularized linear regression Machine Learning

Regularized linear regression

Gradient descent Repeat

Regularized logistic regression Regularization Regularized logistic regression Machine Learning

Regularized logistic regression. x1 x2 Cost function:

Gradient descent Repeat

Evaluating a hypothesis Advice for applying machine learning Evaluating a hypothesis Machine Learning

Evaluating your hypothesis Fails to generalize to new examples not in training set. price size of house no. of bedrooms no. of floors size age of house average income in neighborhood kitchen size

Evaluating your hypothesis Dataset: Size Price 2104 400 1600 330 2400 369 1416 232 3000 540 1985 300 1534 315 1427 199 1380 212 1494 243

Model selection and training/validation/test sets Advice for applying machine learning Model selection and training/validation/test sets Machine Learning

Overfitting example Once parameters were fit to some set of data (training set), the error of the parameters as measured on that data (the training error xxxxx) is likely to be lower than the actual generalization error. price size

Model selection 1. 2. 3. 10. Choose How well does the model generalize? Report test set error . Problem: is likely to be an optimistic estimate of generalization error. I.e. our extra parameter ( = degree of polynomial) is fit to test set.

Evaluating your hypothesis Dataset: Size Price 2104 400 1600 330 2400 369 1416 232 3000 540 1985 300 1534 315 1427 199 1380 212 1494 243

Train/validation/test error Training error: Cross Validation error: Test error:

Diagnosing bias vs. variance Advice for applying machine learning Diagnosing bias vs. variance Machine Learning

Bias/variance High bias (underfit) “Just right” High variance Price Size Price Price Size Size High bias (underfit) “Just right” High variance (overfit)

Cross validation error: Bias/variance Training error: Cross validation error: error size price degree of polynomial d

Diagnosing bias vs. variance Suppose your learning algorithm is performing less well than you were hoping. ( or is high.) Is it a bias problem or a variance problem? Bias (underfit): degree of polynomial d error (cross validation error) Variance (overfit): (training error)

Regularization and bias/variance Advice for applying machine learning Regularization and bias/variance Machine Learning

High variance (overfit) Linear regression with regularization Model: Price Size Price Price Size Size Large xx High bias (underfit) Intermediate xx “Just right” Small xx High variance (overfit)

Choosing the regularization parameter

Choosing the regularization parameter Model: Try Pick (say) . Test error:

Bias/variance as a function of the regularization parameter

Advice for applying machine learning Learning curves Machine Learning

Learning curves error (training set size)

High bias price error size (training set size) price If a learning algorithm is suffering from high bias, getting more training data will not (by itself) help much. size

High variance (and small ) error price size (training set size) price If a learning algorithm is suffering from high variance, getting more training data is likely to help. size