Logistic Regression Chapter 7.

Slides:

Advertisements

Similar presentations

Pattern Classification & Decision Theory. How are we doing on the pass sequence? Bayesian regression and estimation enables us to track the man in the.

Advertisements

The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke

Why does it work? We have not addressed the question of why does this classifier performs well, given that the assumptions are unlikely to be satisfied.

Slides from: Doug Gray, David Poole

Linear Regression.

Classification. Introduction A discriminant is a function that separates the examples of different classes. For example – IF (income > Q1 and saving >Q2)

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)

Chapter 4: Linear Models for Classification

Computer vision: models, learning and inference

Machine Learning Neural Networks

Lecture 14 – Neural Networks

x – independent variable (input)

Lecture 29: Optimization and Neural Nets CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li

Today Linear Regression Logistic Regression Bayesians v. Frequentists

Artificial Neural Networks

1 Linear Classification Problem Two approaches: -Fisher’s Linear Discriminant Analysis -Logistic regression model.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 16– Linear and Logistic Regression) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Logistic Regression Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata September 1, 2014.

Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.

Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.

Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.

Linear Discrimination Reading: Chapter 2 of textbook.

Linear Models for Classification

Linear Methods for Classification Based on Chapter 4 of Hastie, Tibshirani, and Friedman David Madigan.

Regress-itation Feb. 5, Outline Linear regression – Regression: predicting a continuous value Logistic regression – Classification: predicting a.

Logistic Regression William Cohen.

Machine Learning 5. Parametric Methods.

Linear Classifiers Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

Linear Models & Clustering Presented by Kwak, Nam-ju 1.

Pattern Recognition and Image Analysis Dr. Manal Helal – Fall 2014 Lecture 7 Linear Classifiers.

Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Chapter 7. Classification and Prediction

Learning Deep Generative Models by Ruslan Salakhutdinov

Deep Feedforward Networks

Machine Learning Logistic Regression

Lecture 04: Logistic Regression

Empirical risk minimization

CSE 4705 Artificial Intelligence

10701 / Machine Learning.

A Simple Artificial Neuron

Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)

Lecture 04: Logistic Regression

Data Mining Lecture 11.

CS 188: Artificial Intelligence

LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS

Machine Learning Logistic Regression

Statistical Learning Dong Liu Dept. EEIS, USTC.

Machine Learning Today: Reading: Maria Florina Balcan

CS 188: Artificial Intelligence

Collaborative Filtering Matrix Factorization Approach

Logistic Regression.

Generally Discriminant Analysis

Parametric Methods Berlin Chen, 2005 References:

Empirical risk minimization

Neural networks (1) Traditional multi-layer perceptrons

Multivariate Methods Berlin Chen

Multivariate Methods Berlin Chen, 2005 References:

Linear Discrimination

Naïve Bayes Classifier

Patterson: Chap 1 A Review of Machine Learning

Presentation transcript:

Logistic Regression Chapter 7

Reference http://www.cnblogs.com/BYRans/category/719322.html http://www.cs.toronto.edu/~hinton/ https://en.wikipedia.org/wiki/Multinomial_logistic_regression https://www.cs.ox.ac.uk/people/nando.defreitas/machinelearning/ http://www.tuicool.com/articles/faMVfq

Outline Logistic regression Softmax regression

Logistic regression Softmax regression Logistic Regression Name is somewhat misleading. Really a technique for classification, not regression. ‘Regression’ comes from fact that we fit a linear model to the feature space. A single-layer perceptron or single-layer artificial neural network(we will see it in the ANN chapter).

Different ways of expressing probability Logistic regression Softmax regression Different ways of expressing probability Consider a two-outcome probability space, where p(O1)=p p(O1)=1-p=q Can express probability of O1 as:

Numeric treatment of outcomes O1 and O2 is equivalent Logistic regression Softmax regression Log odds Numeric treatment of outcomes O1 and O2 is equivalent If either outcome is favored over the other, then log odds=0 If one outcome is favored with log odds =x, then other outcome is disfavored with log odds= -x.

Probability to log odds (and back again) Logistic regression Softmax regression Probability to log odds (and back again)

Logistic regression Softmax regression Logistic fuction

Using Scenario Scenario: Logistic regression Softmax regression Using Scenario Scenario: A multidimensional feature space (features can be categorical or continuous) Outcome is discrete, not continuous. We’ll focus on case of two classes here. It seems plausible that a linear decision boundary (hyperplane) will give good predictive accuracy.

Using a logistic regression model Softmax regression Using a logistic regression model Model consists of a vector β in d-dimensional feature space For a point x in feature space, project it onto β to convert it into a real number z in the range -∞ to + ∞ Mapt z to the range of 0 to 1 using the logistic funciton Overall, logistic regression maps a point x in d-dimensional feature space to a value in the range of 0 to 1.

Using a logistic regression model Softmax regression Using a logistic regression model Can interpret prediction from a logistic regression model as: A probability of class membership A class assignment, by applying threshold to probability Threshold represents decision boundary in feature space.

Training a logistic regression model Softmax regression Training a logistic regression model Need to optimize β so the model gives the best possible reproduction of training set labels Usually done by numerical approximation of maximum likelihood On really large datasets, may use stochastic gradient descent

Logistic regression Softmax regression Examplse

Logistic regression Softmax regression Examples

Logistic regression Softmax regression Examples

Logistic regression Softmax regression Examples

Logistic regression Softmax regression Examples

Heart disease dataset for test Logistic regression Softmax regression Heart disease dataset for test

Advantages and disadvantages Logistic regression Softmax regression Advantages and disadvantages Advantages: Makes no assumptions about distributions of classes in feature space Easily extended to multiple classes (later) Natural probabilistic view of class predictions Quick to train Fast at classifying unknown records Good accuracy for many simple data sets Resistance to overffiting Can interpret model coefficients as indicators of feature importance Disadvantages: Linear decision boundary

Softmax Regression Main reference: Logistic regression Softmax regression Softmax Regression Main reference: http://deeplearning.stanford.edu/wiki/index.php/Softmax_Regression

Logistic Regression recall Softmax regression Logistic Regression recall We had a training set of m labeled examples, where the input features are {(x(1),y(1)), (x(2),y(2)),…, (x(N),y(N))} . ( letting the feature vectors x be M + 1 dimensional, with x0 = 1 corresponding to the intercept term). With logistic regression, we were in the binary classification setting, so the labels were y . Our hypothesis took the form: Where σ is the probability of x belonging to the positive class y=1.

Logistic Regression recall Softmax regression Logistic Regression recall Suppose y is a Bernoully random variable that takes the value {0,1}, then the probability can be written as Furtherly we can rewrite the probability more succinctly as follows So we can get parameter β by minimizing such a cost function

Using the chain rule to get the error derivatives Logistic regression Softmax regression Using the chain rule to get the error derivatives Then a gradient descending method can be used to estimate the β as follows:

Logistic regression Softmax regression Softmax Regression With Softmax regression, we were in the multiclass classification setting, so the labels were {1,….,K}. Our hypothesis took the form: Where we get all probabilities of input x belonging to class 1 to K.

Logistic regression Softmax regression Softmax Regression ?? This term is added to make the cost function E() strictly convex. It penalizes the large values of parameters. Then a gradient descending method can be used to estimate all the 这一步是怎么过来的？？？

Softmax vs. K bianary classifier Logistic regression Softmax regression Softmax vs. K bianary classifier Now, consider a computer vision example, where you're trying to classify images into three different classes. (i) Suppose that your classes are indoor_scene, outdoor_urban_scene, and outdoor_wilderness_scene. Would you use sofmax regression or three logistic regression classifiers? (ii) Now suppose your classes are indoor_scene, black_and_white_image, and image_has_people. Would you use softmax regression or multiple logistic regression classifiers? In the first case, the classes are mutually exclusive, so a softmax regression classifier would be appropriate. In the second case, it would be more appropriate to build three separate logistic regression classifiers.