WEEK 2 SOFT COMPUTING & MACHINE LEARNING YOSI KRISTIAN Gradient Descent for Linear Regression.

Slides:



Advertisements
Similar presentations
Linear Regression.
Advertisements

Regularization David Kauchak CS 451 – Fall 2013.
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
The loss function, the normal equation,
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Overview over different methods – Supervised Learning
x – independent variable (input)
Motion Analysis (contd.) Slides are from RPI Registration Class.
Systems of Non-Linear Equations
Artificial Neural Networks
Collaborative Filtering Matrix Factorization Approach
More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.
Learning with large datasets Machine Learning Large scale machine learning.
Biointelligence Laboratory, Seoul National University
1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.
Machine Learning Chapter 4. Artificial Neural Networks
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Machine Learning Introduction Study on the Coursera All Right Reserved : Andrew Ng Lecturer:Much Database Lab of Xiamen University Aug 12,2014.
Model representation Linear regression with one variable
Andrew Ng Linear regression with one variable Model representation Machine Learning.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
Linear regression By gradient descent (with thanks to Prof. Ng’s machine learning course)
ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 3 Practical Notes Regularisation *Courtesy of Associate Professor Andrew.
M Machine Learning F# and Accord.net. Alena Dzenisenka Software architect at Luxoft Poland Member of F# Software Foundation Board of Trustees Researcher.
Logistic Regression Week 3 – Soft Computing By Yosi Kristian.
Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.
Week 1 - An Introduction to Machine Learning & Soft Computing
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 3 Practical Notes Application Advice *Courtesy of Associate Professor Andrew.
The problem of overfitting
Regularization (Additional)
M Machine Learning F# and Accord.net.
Insight: Steal from Existing Supervised Learning Methods! Training = {X,Y} Error = target output – actual output.
Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.
Chapter 2-OPTIMIZATION
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
Machine Learning Supervised Learning Classification and Regression
Fall 2004 Backpropagation CS478 - Machine Learning.
One-layer neural networks Approximation problems
Lecture 3: Linear Regression (with One Variable)
第 3 章 神经网络.
A Simple Artificial Neuron
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Classification with Perceptrons Reading:
Machine Learning – Regression David Fenyő
Probabilistic Models for Linear Regression
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
Logistic Regression Classification Machine Learning.
Collaborative Filtering Matrix Factorization Approach
Artificial Intelligence Chapter 3 Neural Networks
Neural Networks Geoff Hulten.
Introduction to Machine learning
Artificial Intelligence Chapter 3 Neural Networks
Overfitting and Underfitting
The loss function, the normal equation,
Mathematical Foundations of BME Reza Shadmehr
Artificial Intelligence Chapter 3 Neural Networks
Softmax Classifier.
Backpropagation David Kauchak CS159 – Fall 2019.
Artificial Intelligence Chapter 3 Neural Networks
Multiple features Linear Regression with multiple variables
Multiple features Linear Regression with multiple variables
Linear regression with one variable
Artificial Intelligence Chapter 3 Neural Networks
Logistic Regression Geoff Hulten.
Presentation transcript:

WEEK 2 SOFT COMPUTING & MACHINE LEARNING YOSI KRISTIAN Gradient Descent for Linear Regression

Single Variable Linear Regression Gradient Descent

Have some function Want Outline: Start with some Keep changing to reduce until we hopefully end up at a minimum

Ilustration   J(     )

Ilustration   J(     )

The Algorithm Gradient descent algorithm Correct: Simultaneous updateIncorrect:

Algorithm Explained..   = Learning Rate  Following  = are the derivative Gradient descent algorithm

 effects.. If α is too small, gradient descent can be slow. If α is too large, gradient descent can overshoot the minimum. It may fail to converge, or even diverge.

Fixed .. Gradient descent can converge to a local minimum, even with the learning rate α fixed. As we approach a local minimum, gradient descent will automatically take smaller steps. So, no need to decrease α over time.

Applying Gradient Descent for Linear Regresion Gradient descent algorithm Linear Regression Model

Gradient Descent Function..

Algorithm.. Gradient descent algorithm update and simultaneously

Remember Local Minimum Problem.   J(     )

It Wont Happened Here..

“Batch” Gradient Descent “Batch”: Each step of gradient descent uses all the training examples.

Visualization (for fixed, this is a function of x)(function of the parameters )

Contd. (for fixed, this is a function of x)(function of the parameters )

Contd.. (for fixed, this is a function of x)(function of the parameters )

Contd. (for fixed, this is a function of x)(function of the parameters )

Contd… (for fixed, this is a function of x)(function of the parameters )

(for fixed, this is a function of x)(function of the parameters )

(for fixed, this is a function of x)(function of the parameters )

(for fixed, this is a function of x)(function of the parameters )

(for fixed, this is a function of x)(function of the parameters )

Homework  Create a program to demonstrate Gradient Descent usage on One Variable Linear Regression Problem.  Use Diamond Data.  Input : 1 variable  Output : 1 variable.  Visualize your program. (MSE, Line Regression)  Able to manually initialize  0  1

Multiple features Linear Regression with multiple variables

Previously

Multiple Feature

Multiple features (variables). Notation: = number of features = input (features) of training example. = value of feature in training example.

Hypothesis: Previously:

Still Hypothesis… For convenience of notation, define. Multivariate linear regression.

Gradient descent for multiple variables linear regression Linear Regression with multiple variables

Hypothesis: Symplified Cost function: Parameters: (simultaneously update for every ) Repeat Gradient descent:

Gradient Descent (simultaneously update ) Repeat Previously (n=1): New algorithm : Repeat (simultaneously update for )

Gradient descent in practice I: Feature Scaling Linear Regression with multiple variables

Feature Scaling E.g. = size ( feet 2 ) = number of bedrooms (1-5 ) Idea: Make sure features are on a similar scale. size (feet 2 ) number of bedrooms

Feature Scaling Get every feature into approximately a range.

Mean normalization Replace with to make features have approximately zero mean (Do not apply to ). E.g.

Choosing Learning Rate Linear Regression with multiple variables

Making sure gradient descent is working correctly. Example automatic convergence test: Declare convergence if decreases by less than in one iteration. No. of iterations

Making sure gradient descent is working correctly. Gradient descent not working. Use smaller. No. of iterations -For sufficiently small, should decrease on every iteration. -But if is too small, gradient descent can be slow to converge.

Summary: -If is too small: slow convergence. -If is too large: may not decrease on every iteration; may not converge. To choose, try

Homework  Create a program to demonstrate Gradient Descent usage on Multiple Variable Linear Regression Problem.  Use Housing Data.  Input : 2 variable  Output : 1 variable.  Able to manually initialize  0  1   is customizable  Do the “Feature Scalling”

Features and polynomial regression Linear Regression with multiple variables

Housing prices prediction

Polynomial regression Price (y) Size (x)

Finally … Fin…