Machine Learning Logistic Regression

Slides:



Advertisements
Similar presentations
Pattern Classification & Decision Theory. How are we doing on the pass sequence? Bayesian regression and estimation enables us to track the man in the.
Advertisements

The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Machine Learning Math Essentials Part 2
Why does it work? We have not addressed the question of why does this classifier performs well, given that the assumptions are unlikely to be satisfied.
Linear Regression.
Logistic Regression Psy 524 Ainsworth.
SVM—Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)
Chapter 4: Linear Models for Classification
Computer vision: models, learning and inference
x – independent variable (input)
Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.
Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.
Today Linear Regression Logistic Regression Bayesians v. Frequentists
Data mining and statistical learning - lecture 13 Separating hyperplane.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Classification and Prediction: Regression Analysis
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Collaborative Filtering Matrix Factorization Approach
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Classification (Supervised Clustering) Naomi Altman Nov '06.
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 16– Linear and Logistic Regression) Pushpak Bhattacharyya CSE Dept., IIT Bombay.
Classification / Regression Neural Networks 2
Logistic Regression Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata September 1, 2014.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression Regression Trees.
Linear Models for Classification
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.
Linear Classifiers Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
Linear Models (II) Rong Jin. Recap  Classification problems Inputs x  output y y is from a discrete set Example: height 1.8m  male/female?  Statistical.
Logistic Regression and Odds Ratios Psych DeShon.
Pattern Recognition and Image Analysis Dr. Manal Helal – Fall 2014 Lecture 7 Linear Classifiers.
Chapter 7. Classification and Prediction
Lecture 04: Logistic Regression
CSE 4705 Artificial Intelligence
10701 / Machine Learning.
Basic machine learning background with Python scikit-learn
Machine Learning Basics
Roberto Battiti, Mauro Brunato
Classification / Regression Neural Networks 2
Pawan Lingras and Cory Butz
Machine Learning Logistic Regression
Classification Discriminant Analysis
Statistical Learning Dong Liu Dept. EEIS, USTC.
Classification Discriminant Analysis
Collaborative Filtering Matrix Factorization Approach
دسته بندی با استفاده از مدل های خطی
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Machine Learning Math Essentials Part 2
Logistic Regression.
Neural Networks ICS 273A UC Irvine Instructor: Max Welling
Generally Discriminant Analysis
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Machine learning overview
Logistic Regression Chapter 7.
Multivariate Methods Berlin Chen, 2005 References:
Linear Discrimination
Logistic Regression Geoff Hulten.
Presentation transcript:

Machine Learning Logistic Regression

Logistic regression Name is somewhat misleading. Really a technique for classification, not regression. “Regression” comes from fact that we fit a linear model to the feature space. Involves a more probabilistic view of classification.

Different ways of expressing probability Consider a two-outcome probability space, where: p( O1 ) = p p( O2 ) = 1 – p = q Can express probability of O1 as: notation range equivalents standard probability p 0.5 1 odds p / q +  log odds (logit) log( p / q ) - 

Log odds Numeric treatment of outcomes O1 and O2 is equivalent If neither outcome is favored over the other, then log odds = 0. If one outcome is favored with log odds = x, then other outcome is disfavored with log odds = -x. Especially useful in domains where relative probabilities can be miniscule Example: multiple sequence alignment in computational biology

From probability to log odds (and back again)

Standard logistic function

Logistic regression Scenario: A multidimensional feature space (features can be categorical or continuous). Outcome is discrete, not continuous. We’ll focus on case of two classes. It seems plausible that a linear decision boundary (hyperplane) will give good predictive accuracy.

Using a logistic regression model Model consists of a vector  in d-dimensional feature space For a point x in feature space, project it onto  to convert it into a real number z in the range -  to +  Map z to the range 0 to 1 using the logistic function Overall, logistic regression maps a point x in d- dimensional feature space to a value in the range 0 to 1

Using a logistic regression model Can interpret prediction from a logistic regression model as: A probability of class membership A class assignment, by applying threshold to probability threshold represents decision boundary in feature space

Training a logistic regression model Need to optimize  so the model gives the best possible reproduction of training set labels Usually done by numerical approximation of maximum likelihood On really large datasets, may use stochastic gradient descent

Logistic regression in one dimension

Logistic regression in one dimension

Logistic regression in one dimension Parameters control shape and location of sigmoid curve  controls location of midpoint  controls slope of rise

Logistic regression in one dimension

Logistic regression in one dimension

Logistic regression in two dimensions Subset of Fisher iris dataset Two classes First two columns (SL, SW) decision boundary

Logistic regression in two dimensions Interpreting the model vector of coefficients From MATLAB: B = [ 13.0460 -1.9024 -0.4047 ]  = B( 1 ),  = [ 1 2 ] = B( 2 : 3 ) ,  define location and orientation of decision boundary -  is distance of decision boundary from origin decision boundary is perpendicular to  magnitude of  defines gradient of probabilities between 0 and 1 

Heart disease dataset 13 attributes (see heart.docx for details) 2 demographic (age, gender) 11 clinical measures of cardiovascular status and performance 2 classes: absence ( 1 ) or presence ( 2 ) of heart disease 270 samples Dataset taken from UC Irvine Machine Learning Repository: http://archive.ics.uci.edu/ml/datasets/Statlog+(Heart) Preformatted for MATLAB as heart.mat.

MATLAB interlude matlab_demo_05.m

Logistic regression Advantages: Disadvantages: Makes no assumptions about distributions of classes in feature space Easily extended to multiple classes (multinomial regression) Natural probabilistic view of class predictions Quick to train Very fast at classifying unknown records Good accuracy for many simple data sets Resistant to overfitting Can interpret model coefficients as indicators of feature importance Disadvantages: Linear decision boundary