Logistic Regression Classification Machine Learning.

Slides:



Advertisements
Similar presentations
Neural Networks: Learning
Advertisements

Pattern Recognition and Machine Learning
Single Neuron Hung-yi Lee. Single Neuron … bias Activation function.
Machine Learning Week 2 Lecture 1.
Machine Learning Week 1, Lecture 2. Recap Supervised Learning Data Set Learning Algorithm Hypothesis h h(x) ≈ f(x) Unknown Target f Hypothesis Set 5 0.
Chapter 4: Linear Models for Classification
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Artificial Intelligence Lecture 2 Dr. Bo Yuan, Professor Department of Computer Science and Engineering Shanghai Jiaotong University
Lecture 14 – Neural Networks
x – independent variable (input)
Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202 Fall 2007 Introduction to Classification Greg Grudic.
Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic.
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Logistic Regression 10701/15781 Recitation February 5, 2008 Parts of the slides are from previous years’ recitation and lecture notes, and from Prof. Andrew.
Collaborative Filtering Matrix Factorization Approach
Machine learning Image source:
Learning with large datasets Machine Learning Large scale machine learning.
Outline Classification Linear classifiers Perceptron Multi-class classification Generative approach Naïve Bayes classifier 2.
Logistic Regression L1, L2 Norm Summary and addition to Andrew Ng’s lectures on machine learning.
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 2B *Courtesy of Associate Professor Andrew Ng’s Notes, Stanford University.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Machine Learning Introduction Study on the Coursera All Right Reserved : Andrew Ng Lecturer:Much Database Lab of Xiamen University Aug 12,2014.
Model representation Linear regression with one variable
Andrew Ng Linear regression with one variable Model representation Machine Learning.
CSE 446 Perceptron Learning Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
Nov 23rd, 2001Copyright © 2001, 2003, Andrew W. Moore Linear Document Classifier.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
ICS 178 Introduction Machine Learning & data Mining Instructor max Welling Lecture 6: Logistic Regression.
M Machine Learning F# and Accord.net. Alena Dzenisenka Software architect at Luxoft Poland Member of F# Software Foundation Board of Trustees Researcher.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
Logistic Regression Week 3 – Soft Computing By Yosi Kristian.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Linear Discrimination Reading: Chapter 2 of textbook.
The problem of overfitting
Regularization (Additional)
Logistic Regression (Classification Algorithm)
Regress-itation Feb. 5, Outline Linear regression – Regression: predicting a continuous value Logistic regression – Classification: predicting a.
Logistic Regression William Cohen.
Chapter 2-OPTIMIZATION
Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.
Support Vector Machines Optimization objective Machine Learning.
Learning by Loss Minimization. Machine learning: Learn a Function from Examples Function: Examples: – Supervised: – Unsupervised: – Semisuprvised:
Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.
WEEK 2 SOFT COMPUTING & MACHINE LEARNING YOSI KRISTIAN Gradient Descent for Linear Regression.
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
First Lecture of Machine Learning Hung-yi Lee. Learning to say “yes/no” Binary Classification.
Lecture 3: Linear Regression (with One Variable)
Perceptrons Lirong Xia.
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Logistic Regression Classification Machine Learning.
Logistic Regression Classification Machine Learning.
CS 188: Artificial Intelligence
Collaborative Filtering Matrix Factorization Approach
Logistic Regression Classification Machine Learning.
Support Vector Machine I
Machine Learning Algorithms – An Overview
Logistic Regression Chapter 7.
Logistic Regression Classification Machine Learning.
Multiple features Linear Regression with multiple variables
Multiple features Linear Regression with multiple variables
Logistic Regression Classification Machine Learning.
Linear regression with one variable
Perceptrons Lirong Xia.
Logistic Regression Classification Machine Learning.
Logistic Regression Geoff Hulten.
Presentation transcript:

Logistic Regression Classification Machine Learning

Classification Email: Spam / Not Spam? Online Transactions: Fraudulent (Yes / No)? Tumor: Malignant / Benign ? 0: “Negative Class” (e.g., benign tumor) 1: “Positive Class” (e.g., malignant tumor)

Threshold classifier output at 0.5: (Yes) 1 Malignant ? (No) 0 Tumor Size Tumor Size Threshold classifier output at 0.5: If , predict “y = 1” If , predict “y = 0”

Classification: y = 0 or 1 can be > 1 or < 0 Logistic Regression:

Hypothesis Representation Logistic Regression Hypothesis Representation Machine Learning

Sigmoid function Logistic function Logistic Regression Model Want 1 0.5 Sigmoid function Logistic function

Interpretation of Hypothesis Output = estimated probability that y = 1 on input x Example: If Tell patient that 70% chance of tumor being malignant “probability that y = 1, given x, parameterized by ”

Logistic Regression Decision boundary Machine Learning

Logistic regression z 1 Suppose predict “ “ if predict “ “ if

Decision Boundary x2 3 2 1 1 2 3 x1 Predict “ “ if

Non-linear decision boundaries x2 1 x1 -1 1 -1 Predict “ “ if x2 x1

Logistic Regression Cost function Machine Learning

Training set: m examples How to choose parameters ?

Cost function Linear regression: “non-convex” “convex”

Logistic regression cost function If y = 1 1

Logistic regression cost function If y = 0 1

Simplified cost function and gradient descent Logistic Regression Simplified cost function and gradient descent Machine Learning

Logistic regression cost function

Logistic regression cost function To fit parameters : To make a prediction given new : Output

Gradient Descent Want : Repeat (simultaneously update all )

Algorithm looks identical to linear regression! Gradient Descent Want : Repeat (simultaneously update all ) Algorithm looks identical to linear regression!

Advanced optimization Logistic Regression Advanced optimization Machine Learning

Optimization algorithm Cost function . Want . Given , we have code that can compute (for ) Gradient descent: Repeat

Optimization algorithm Given , we have code that can compute (for ) Optimization algorithms: Gradient descent Advantages: No need to manually pick Often faster than gradient descent. Disadvantages: More complex Conjugate gradient BFGS L-BFGS

Example: function [jVal, gradient] = costFunction(theta) jVal = (theta(1)-5)^2 + ... (theta(2)-5)^2; gradient = zeros(2,1); gradient(1) = 2*(theta(1)-5); gradient(2) = 2*(theta(2)-5); options = optimset(‘GradObj’, ‘on’, ‘MaxIter’, ‘100’); initialTheta = zeros(2,1); [optTheta, functionVal, exitFlag] ... = fminunc(@costFunction, initialTheta, options);

code to compute code to compute code to compute code to compute theta = function [jVal, gradient] = costFunction(theta) jVal = [ ]; code to compute gradient(1) = [ ]; code to compute gradient(2) = [ ]; code to compute gradient(n+1) = [ ]; code to compute

Multi-class classification: One-vs-all Logistic Regression Multi-class classification: One-vs-all Machine Learning

Multiclass classification Email foldering/tagging: Work, Friends, Family, Hobby Medical diagrams: Not ill, Cold, Flu Weather: Sunny, Cloudy, Rain, Snow

Binary classification: Multi-class classification: x1 x2 x2 x1

One-vs-all (one-vs-rest): x2 One-vs-all (one-vs-rest): x1 x2 x2 x1 x1 x2 Class 1: Class 2: Class 3: x1

One-vs-all Train a logistic regression classifier for each class to predict the probability that . On a new input , to make a prediction, pick the class that maximizes