Logistic Regression L1, L2 Norm Summary and addition to Andrew Ng’s lectures on machine learning.

Slides:



Advertisements
Similar presentations
Pattern Classification & Decision Theory. How are we doing on the pass sequence? Bayesian regression and estimation enables us to track the man in the.
Advertisements

Linear Regression.
Support vector machine
R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Indian Statistical Institute Kolkata
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
Logistic Regression Classification Machine Learning.
Chapter 4: Linear Models for Classification
The loss function, the normal equation,
x – independent variable (input)
RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.
Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic.
Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
1 Linear Classification Problem Two approaches: -Fisher’s Linear Discriminant Analysis -Logistic regression model.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Introduction to machine learning
Today Wrap up of probability Vectors, Matrices. Calculus
Review of Lecture Two Linear Regression Normal Equation
Collaborative Filtering Matrix Factorization Approach
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 2B *Courtesy of Associate Professor Andrew Ng’s Notes, Stanford University.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Model representation Linear regression with one variable
Andrew Ng Linear regression with one variable Model representation Machine Learning.
Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.
Logistic Regression Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata September 1, 2014.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Logistic Regression Week 3 – Soft Computing By Yosi Kristian.
Linear Models for Classification
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Logistic Regression (Classification Algorithm)
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Regress-itation Feb. 5, Outline Linear regression – Regression: predicting a continuous value Logistic regression – Classification: predicting a.
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
6.S093 Visual Recognition through Machine Learning Competition Image by kirkh.deviantart.com Joseph Lim and Aditya Khosla Acknowledgment: Many slides from.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.
Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:
LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.
Deep Feedforward Networks
Large Margin classifiers
CH 5: Multivariate Methods
10701 / Machine Learning.
Roberto Battiti, Mauro Brunato
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
Machine Learning Week 1.
Logistic Regression Classification Machine Learning.
Collaborative Filtering Matrix Factorization Approach
Logistic Regression Classification Machine Learning.
Mathematical Foundations of BME Reza Shadmehr
Mathematical Foundations of BME
The loss function, the normal equation,
Support Vector Machine I
Mathematical Foundations of BME Reza Shadmehr
Multivariate Methods Berlin Chen
Logistic Regression Chapter 7.
Multivariate Methods Berlin Chen, 2005 References:
Logistic Regression Classification Machine Learning.
Logistic Regression Classification Machine Learning.
Linear regression with one variable
Logistic Regression Classification Machine Learning.
Presentation transcript:

Logistic Regression L1, L2 Norm Summary and addition to Andrew Ng’s lectures on machine learning

Linear Regression  Predict continuous values as outcomes  Categorical outcome violates the linearity Area(m^2) Price($1000) y=x+5

What is Logistic Regression  A “classification algorithm”. In nature, it is a transformed linear regression  Transform the output to the range of 0~1 and then project the continuous results to discrete category  predict class label base on features  i.e. Predict whether a patient has a given disease based on their age, gender, body mass index, blood test, etc.

Two types of Logistic Regression Binomial  two possible outcomes  i.e. spam or nor spam Multinomial  3 or more types of outcomes  i.e. Disease A, B,C..

Logistic Function

Logistic Function Summary  F(t): Probability of dependent variable belongs to the certain category  Link between linear regression and probability -infinity<Input<infinity Logistic function 0<=Output<=1

Decision Boundary  Continuous  Discrete  Y=1 if h θ (x) ≥ 0.5  θ T x ≥ 0  Y=0 if h θ (x) < 0.5  θ T x < 0

Linear Boundary Example   θ T x =  θ T x ≥ 0, fall on the Upper side of the line, Predicting y=1  θ T x < 0, fall on the lower side of the Line, predicting Y=0 θTxθTx

Nonlinear Boundary Example   θ T x =  θ T x ≥ 0, fall outside circle, Predict y=1  θ T x < 0, fall inside circle, Predict y=0

Cost Function for Linear Regression  Cost function of linear regression  Can not be applied to logistic regression, because the function is non-convex and thus hard to be minimized

Cost function for logistic regression    

Estimate Parameters  Minimize the total cost  Find the best set of parameters θ  Methods: Gradient descent, mean square

Gradient descent to find parameters  Step 1. randomly assign values to θ  Step 2. update θ according to following algorithm until reach the minimum J(θ).

Mean Square method  Let X be the feature matrix, the class labels be Y vector, then we can calculate the parameters directly as

Multinomial Classification  s: Spam, work, personal  Y=1,2,3  Weather: sunny, rain, snow, windy  Y=1,2,3,4

One-vs-all Regression

One-vs-all Summary  Train a logistic regression classifier For each class i to predict the probability that y=i  For k classes, we need to train k classifiers  Given a new input x, feed x into the each of the classifier, pick the class i that maximize the probability

Norm  Norm is a total size or length of all vectors in a vector space or matrices  Given a vector space V, a norm on V is a function p: V  R with the following properties:  p(a v )=|a|p( v ) (absolute homogeneity)  P( u + v )<= p( u )+p( v ) (triangle inequality)  If p( v )=0, then v is the zero vector Source:

Ln Norm  Let x to be a vector or a matrix, then the ln norm of x is defined as  Mathematic properties of difference norm are very different Source norm-l-infinity-norm/

L1-norm (Manhattan norm)  Defined as  Manhattan distance defined as

L2-Euclidean norm  Defined as  Euclidean distance defined as

Example  V = (1,2,3) NormSymbolValueNumerical L1|x| = L2|x| 2 √1+4+9=√

L1&L2 regulation  Goal  Regulation L1_vs_L2.pdf

Sources  “Machine Learning” Online Course, Andrew Ng, Page.php?course=MachineLearninghttp://openclassroom.stanford.edu/MainFolder/Course Page.php?course=MachineLearning  Afshin Rostami, Andew Ng. "L1 vs. L2 Regularization and feature selection.” 2004,  JerryLead, 2011, html html   Book of Rorasa, norm-l1-norm-l2-norm-l-infinity-norm/ norm-l1-norm-l2-norm-l-infinity-norm/