Logistic Regression.

Slides:



Advertisements
Similar presentations
Contrastive Divergence Learning
Advertisements

Parameter Learning in MN. Outline CRF Learning CRF for 2-d image segmentation IPF parameter sharing revisited.
Multivariate linear models for regression and classification Outline: 1) multivariate linear regression 2) linear classification (perceptron) 3) logistic.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Artificial Intelligence Lecture 2 Dr. Bo Yuan, Professor Department of Computer Science and Engineering Shanghai Jiaotong University
Today Today: Chapter 9 Assignment: 9.2, 9.4, 9.42 (Geo(p)=“geometric distribution”), 9-R9(a,b) Recommended Questions: 9.1, 9.8, 9.20, 9.23, 9.25.
Bayes Classifier, Linear Regression 10701/15781 Recitation January 29, 2008 Parts of the slides are from previous years’ recitation and lecture notes,
1 Linear Classification Problem Two approaches: -Fisher’s Linear Discriminant Analysis -Logistic regression model.
Logistic Regression 10701/15781 Recitation February 5, 2008 Parts of the slides are from previous years’ recitation and lecture notes, and from Prof. Andrew.
Review of Lecture Two Linear Regression Normal Equation
Collaborative Filtering Matrix Factorization Approach
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
Logistic Regression Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata September 1, 2014.
ICS 178 Introduction Machine Learning & data Mining Instructor max Welling Lecture 6: Logistic Regression.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Logistic Regression –NB & LR connections Readings: Barber.
Linear Models for Classification
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Fundamentals of Artificial Neural Networks Chapter 7 in amlbook.com.
Linear Methods for Classification Based on Chapter 4 of Hastie, Tibshirani, and Friedman David Madigan.
CHAPTER 10: Logistic Regression. Binary classification Two classes Y = {0,1} Goal is to learn how to correctly classify the input into one of these two.
Regress-itation Feb. 5, Outline Linear regression – Regression: predicting a continuous value Logistic regression – Classification: predicting a.
Logistic Regression William Cohen.
Logistic Regression Saed Sayad 1www.ismartsoft.com.
CSE 446 Logistic Regression Perceptron Learning Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.
CS 2750: Machine Learning Linear Models for Classification Prof. Adriana Kovashka University of Pittsburgh February 15, 2016.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Conditional Random Fields & Table Extraction Dongfang Xu School of Information.
Naive Bayes (Generative Classifier) vs. Logistic Regression (Discriminative Classifier) Minkyoung Kim.
Multinomial Regression and the Softmax Activation Function Gary Cottrell.
Why does it work? We have not addressed the question of why does this classifier performs well, given that the assumptions are unlikely to be satisfied.
LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.
Deep Feedforward Networks
Probability Theory and Parameter Estimation I
Hidden Markov Models.
Machine Learning Logistic Regression
Lecture 04: Logistic Regression
Generative verses discriminative classifier
Dan Roth Department of Computer and Information Science
Empirical risk minimization
10701 / Machine Learning.
ECE 5424: Introduction to Machine Learning
Classification with Perceptrons Reading:
Lecture 04: Logistic Regression
Roberto Battiti, Mauro Brunato
Machine Learning Logistic Regression
Statistical Learning Dong Liu Dept. EEIS, USTC.
CS 188: Artificial Intelligence
Collaborative Filtering Matrix Factorization Approach
Ying shen Sse, tongji university Sep. 2016
Statistical Assumptions for SLR
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Lecture 04: Logistic Regression
Biointelligence Laboratory, Seoul National University
LECTURE 23: INFORMATION THEORY REVIEW
Softmax Classifier.
Perceptron Algorithm.
Parametric Methods Berlin Chen, 2005 References:
Empirical risk minimization
CS 188: Artificial Intelligence
Logistic Regression Chapter 7.
Recap: Naïve Bayes classifier
Logistic Regression [Many of the slides were originally created by Prof. Dan Jurafsky from Stanford.]
Maximum Likelihood We have studied the OLS estimator. It only applies under certain assumptions In particular,  ~ N(0, 2 ) But what if the sampling distribution.
Introduction to Neural Networks
Linear regression with one variable
Logistic Regression Geoff Hulten.
Presentation transcript:

Logistic Regression

Linear regression fits the a line to a set of points. Given x, you can use the line to predict y.

Logistic regression fits a logistic function (sigmoid) to a set of points and binary labels. Given a new point, the sigmoid gives the predicted probability that the class is positive. https://en.wikipedia.org/wiki/Logistic_regression

Logistic Regression For ease of notation, let x = (x0, x1, …, xn), where x0 =1. Let w = (w0, w1, …,wn), where w0 is the bias weight. Class y  {0,1}

Learning: Use training data to determine weights.

Learning: Use training data to determine weights. To classify a new x, assign class y that maximizes P(y | x)

Logistic Regression: Learning Weights Goal is to learn weights w. Let (x j , y j) be the jth training example and its label. We want: This is equivalent to: This is called “log of conditional likelihood”

We can write the log conditional likelihood this way: since yl is either 0 or 1 This is what we want to maximize.

Use gradient ascent to maximize l(w). This is called “Maximum likelihood estimation” (or MLE). Recall: We have: Let’s find the gradient with respect to wi :

Using chain rule and algebra

Stochastic Gradient Ascent for Logistic Regression Start with small random initial weights, both positive and negative: w = (w0, w1, …wn) Repeat until convergence, or for some max number of epochs For each training example : Note again that w includes the bias weight w0, and x includes the bias term x0 = 1.

Homework 4, Part 2