Model representation Linear regression with one variable

Slides:

Advertisements

Similar presentations

CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:

Advertisements

Indian Statistical Institute Kolkata

Logistic Regression Classification Machine Learning.

The loss function, the normal equation,

Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.

Linear Learning Machines  Simplest case: the decision function is a hyperplane in input space.  The Perceptron Algorithm: Rosenblatt, 1956  An on-line.

Linear Learning Machines  Simplest case: the decision function is a hyperplane in input space.  The Perceptron Algorithm: Rosenblatt, 1956  An on-line.

Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.

Artificial Neural Networks

CS 4700: Foundations of Artificial Intelligence

Collaborative Filtering Matrix Factorization Approach

More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.

Learning with large datasets Machine Learning Large scale machine learning.

Artificial Neural Networks

Outline Classification Linear classifiers Perceptron Multi-class classification Generative approach Naïve Bayes classifier 2.

Logistic Regression L1, L2 Norm Summary and addition to Andrew Ng’s lectures on machine learning.

1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.

Machine Learning Chapter 4. Artificial Neural Networks

CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:

CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:

Machine Learning Introduction Study on the Coursera All Right Reserved : Andrew Ng Lecturer:Much Database Lab of Xiamen University Aug 12,2014.

Andrew Ng Linear regression with one variable Model representation Machine Learning.

CS 478 – Tools for Machine Learning and Data Mining Backpropagation.

Linear regression By gradient descent (with thanks to Prof. Ng’s machine learning course)

CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:

ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 3 Practical Notes Regularisation *Courtesy of Associate Professor Andrew.

M Machine Learning F# and Accord.net. Alena Dzenisenka Software architect at Luxoft Poland Member of F# Software Foundation Board of Trustees Researcher.

Logistic Regression Week 3 – Soft Computing By Yosi Kristian.

Week 1 - An Introduction to Machine Learning & Soft Computing

The problem of overfitting

Regularization (Additional)

M Machine Learning F# and Accord.net.

Insight: Steal from Existing Supervised Learning Methods! Training = {X,Y} Error = target output – actual output.

Chapter 2 Single Layer Feedforward Networks

Logistic Regression William Cohen.

Chapter 2-OPTIMIZATION

Support Vector Machines Optimization objective Machine Learning.

Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.

WEEK 2 SOFT COMPUTING & MACHINE LEARNING YOSI KRISTIAN Gradient Descent for Linear Regression.

Machine Learning & Deep Learning

Machine Learning – Regression David Fenyő

Lecture 3: Linear Regression (with One Variable)

Machine Learning I & II.

10701 / Machine Learning.

A Simple Artificial Neuron

Machine Learning – Regression David Fenyő

Probabilistic Models for Linear Regression

Logistic Regression Classification Machine Learning.

Logistic Regression Classification Machine Learning.

Collaborative Filtering Matrix Factorization Approach

Logistic Regression Classification Machine Learning.

Introduction to Machine learning

The loss function, the normal equation,

How do we find the best linear regression line?

Mathematical Foundations of BME Reza Shadmehr

Softmax Classifier.

Machine Learning Algorithms – An Overview

The Math of Machine Learning

Backpropagation David Kauchak CS159 – Fall 2019.

What is machine learning

Logistic Regression Classification Machine Learning.

Shih-Yang Su Virginia Tech

Multiple features Linear Regression with multiple variables

Multiple features Linear Regression with multiple variables

Logistic Regression Classification Machine Learning.

Linear regression with one variable

Logistic Regression Classification Machine Learning.

Logistic Regression Geoff Hulten.

Presentation transcript:

Model representation Linear regression with one variable Machine Learning

Housing Prices (Portland, OR) (in 1000s of dollars) Size (feet2) Supervised Learning Given the “right answer” for each example in the data. Regression Problem Predict real-valued output

Training set of housing prices (Portland, OR) Size in feet2 (x) Price ($) in 1000's (y) 2104 460 1416 232 1534 315 852 178 … Notation: m = Number of training examples x’s = “input” variable / features y’s = “output” variable / “target” variable

Training Set How do we represent h ? Learning Algorithm Size of house Estimated price Linear regression with one variable. Univariate linear regression.

Linear regression with one variable Cost function Machine Learning

Training Set Hypothesis: ‘s: Parameters How to choose ‘s ? Size in feet2 (x) Price ($) in 1000's (y) 2104 460 1416 232 1534 315 852 178 … Hypothesis: ‘s: Parameters How to choose ‘s ?

Idea: Choose so that is close to for our training examples y x Idea: Choose so that is close to for our training examples

Cost function intuition I Linear regression with one variable Cost function intuition I Machine Learning

Simplified Hypothesis: Parameters: Cost Function: Goal:

(for fixed , this is a function of x) (function of the parameter ) y x

(function of the parameter ) (for fixed , this is a function of x) y x

(function of the parameter ) (for fixed , this is a function of x) y x

Cost function intuition II Linear regression with one variable Cost function intuition II Machine Learning

Hypothesis: Parameters: Cost Function: Goal:

(for fixed , this is a function of x) (function of the parameters ) Price ($) in 1000’s Size in feet2 (x)

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

Linear regression with one variable Gradient descent Machine Learning

Have some function Want Outline: Start with some Keep changing to reduce until we hopefully end up at a minimum

J(0,1) 1 0

J(0,1) 1 0

Gradient descent algorithm Correct: Simultaneous update Incorrect:

Gradient descent intuition Linear regression with one variable Gradient descent intuition Machine Learning

Gradient descent algorithm

If α is too small, gradient descent can be slow. If α is too large, gradient descent can overshoot the minimum. It may fail to converge, or even diverge.

at local optima Current value of

Gradient descent can converge to a local minimum, even with the learning rate α fixed. As we approach a local minimum, gradient descent will automatically take smaller steps. So, no need to decrease α over time.

Gradient descent for linear regression Linear regression with one variable Gradient descent for linear regression Machine Learning

Gradient descent algorithm Linear Regression Model

Gradient descent algorithm update and simultaneously

J(0,1) 1 0

J(0,1) 1 0

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

“Batch”: Each step of gradient descent uses all the training examples. “Batch” Gradient Descent “Batch”: Each step of gradient descent uses all the training examples.