Regress-itation Feb. 5, 2015. Outline Linear regression – Regression: predicting a continuous value Logistic regression – Classification: predicting a.

Slides:



Advertisements
Similar presentations
Parameter Learning in MN. Outline CRF Learning CRF for 2-d image segmentation IPF parameter sharing revisited.
Advertisements

Supervised Learning Recap
Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)
Chapter 4: Linear Models for Classification
Naïve Bayes Classifier
On Discriminative vs. Generative classifiers: Naïve Bayes
Multivariate linear models for regression and classification Outline: 1) multivariate linear regression 2) linear classification (perceptron) 3) logistic.
Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.
Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.
Announcements  Homework 4 is due on this Thursday (02/27/2004)  Project proposal is due on 03/02.
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
CES 514 – Data Mining Lecture 8 classification (contd…)
Decision Theory Naïve Bayes ROC Curves
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Linear Discriminant Functions Chapter 5 (Duda et al.)
Bayes Classifier, Linear Regression 10701/15781 Recitation January 29, 2008 Parts of the slides are from previous years’ recitation and lecture notes,
1 Linear Classification Problem Two approaches: -Fisher’s Linear Discriminant Analysis -Logistic regression model.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Logistic Regression 10701/15781 Recitation February 5, 2008 Parts of the slides are from previous years’ recitation and lecture notes, and from Prof. Andrew.
Discriminant Analysis Testing latent variables as predictors of groups.
Review of Lecture Two Linear Regression Normal Equation
Crash Course on Machine Learning
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.
Final review LING572 Fei Xia Week 10: 03/11/
CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 2012
EM and expected complete log-likelihood Mixture of Experts
Logistic Regression L1, L2 Norm Summary and addition to Andrew Ng’s lectures on machine learning.
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
1 CS546: Machine Learning and Natural Language Discriminative vs Generative Classifiers This lecture is based on (Ng & Jordan, 02) paper and some slides.
CSE 446 Perceptron Learning Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
Generative verses discriminative classifier
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
CS Statistical Machine learning Lecture 10 Yuan (Alan) Qi Purdue CS Sept
CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
Logistic Regression Week 3 – Soft Computing By Yosi Kristian.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Linear Discrimination Reading: Chapter 2 of textbook.
Non-Bayes classifiers. Linear discriminants, neural networks.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Logistic Regression –NB & LR connections Readings: Barber.
Linear Models for Classification
Linear Methods for Classification : Presentation for MA seminar in statistics Eli Dahan.
Logistic Regression William Cohen.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
CSE 446 Logistic Regression Perceptron Learning Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
Linear Models (II) Rong Jin. Recap  Classification problems Inputs x  output y y is from a discrete set Example: height 1.8m  male/female?  Statistical.
LECTURE 05: CLASSIFICATION PT. 1 February 8, 2016 SDS 293 Machine Learning.
Pattern Recognition and Image Analysis Dr. Manal Helal – Fall 2014 Lecture 7 Linear Classifiers.
Linear Models Tony Dodd. 21 January 2008Mathematics for Data Modelling: Linear Models Overview Linear models. Parameter estimation. Linear in the parameters.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Naive Bayes (Generative Classifier) vs. Logistic Regression (Discriminative Classifier) Minkyoung Kim.
LEARNING FROM EXAMPLES AIMA CHAPTER 18 (4-5) CSE 537 Spring 2014 Instructor: Sael Lee Slides are mostly made from AIMA resources, Andrew W. Moore’s tutorials:
CH 5: Multivariate Methods
Perceptrons Lirong Xia.
ECE 5424: Introduction to Machine Learning
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
with observed random variables
Statistical Learning Dong Liu Dept. EEIS, USTC.
Logistic Regression.
Multivariate Methods Berlin Chen
Logistic Regression Chapter 7.
Multivariate Methods Berlin Chen, 2005 References:
Linear Discrimination
Perceptrons Lirong Xia.
Presentation transcript:

Regress-itation Feb. 5, 2015

Outline Linear regression – Regression: predicting a continuous value Logistic regression – Classification: predicting a discrete value Gradient descent – Very general optimization technique

Regression wants to predict a continuous- valued output for an input. Data: Goal:

Linear Regression

Linear regression assumes a linear relationship between inputs and outputs. Data: Goal:

You collected data about commute times.

Now, you want to predict commute time for a new person, who lives 1.1 miles from campus.

1.1

Now, you want to predict commute time for a new person, who lives 1.1 miles from campus. 1.1 ~23

How can we find this line?

Define – x i : input, distance from campus – y i : output, commute time We want to predict y for an unknown x Assume – In general, assume y = f(x) + ε – For 1-D linear regression, assume f(x) = w 0 + w 1 x We want to learn the parameters w

We can learn w from the observed data by maximizing the conditional likelihood. Recall: Introducing some new notation…

We can learn w from the observed data by maximizing the conditional likelihood.

minimizing least-squares error

For the 1-D case… Two values define this line – w 0 : intercept – w 1 : slope – f(x) = w 0 + w 1 x

Logistic Regression

Logistic regression is a discriminative approach to classification. Classification: predicts discrete-valued output – E.g., is an spam or not?

Logistic regression is a discriminative approach to classification. Discriminative: directly estimates P(Y|X) – Only concerned with discriminating (differentiating) between classes Y – In contrast, naïve Bayes is a generative classifier Estimates P(Y) & P(X|Y) and uses Bayes’ rule to calculate P(Y|X) Explains how data are generated, given class label Y Both logistic regression and naïve Bayes use their estimates of P(Y|X) to assign a class to an input X—the difference is in how they arrive at these estimates.

The assumptions of logistic regression Given Want to learn Want to learn p(Y=1|X=x)

The logistic function is appropriate for making probability estimates. a b

Logistic regression models probabilities with the logistic function. Want to predict Y=1 for X when P(Y=1|X) ≥ 0.5 P(Y=1|X) Y = 1 Y = 0

Logistic regression models probabilities with the logistic function. Want to predict Y=1 for X when P(Y=1|X) ≥ 0.5 P(Y=1|X) Y = 1 Y = 0

Therefore, logistic regression is a linear classifier. Use the logistic function to estimate the probability of Y given X Decision boundary:

Maximize the conditional likelihood to find the weights w = [w 0,w 1,…,w d ].

How can we optimize this function? Concave [check Hessian of P(Y|X,w)] No closed-form solution for w 

Gradient Descent

Gradient descent can optimize differentiable functions. Updated value for optimum Previous value for optimum Step size Gradient of f, evaluated at current x

Here is the trajectory of gradient descent on a quadratic function.

How does step size affect the result?

Gradient descent can optimize differentiable functions. Updated value for optimum Previous value for optimum Step size Gradient of f, evaluated at current x