Multiple features Linear Regression with multiple variables

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Machine Learning and Data Mining Linear regression
Linear Regression.
Regularization David Kauchak CS 451 – Fall 2013.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
The loss function, the normal equation,
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
x – independent variable (input)
Motion Analysis (contd.) Slides are from RPI Registration Class.
Linear Regression  Using a linear function to interpolate the training set  The most popular criterion: Least squares approach  Given the training set:
The Widrow-Hoff Algorithm (Primal Form) Repeat: Until convergence criterion satisfied return: Given a training set and learning rate Initial:  Minimize.
Systems of Non-Linear Equations
12 1 Variations on Backpropagation Variations Heuristic Modifications –Momentum –Variable Learning Rate Standard Numerical Optimization –Conjugate.
UNCONSTRAINED MULTIVARIABLE
Collaborative Filtering Matrix Factorization Approach
Learning with large datasets Machine Learning Large scale machine learning.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Machine Learning Introduction Study on the Coursera All Right Reserved : Andrew Ng Lecturer:Much Database Lab of Xiamen University Aug 12,2014.
Model representation Linear regression with one variable
Andrew Ng Linear regression with one variable Model representation Machine Learning.
Linear regression By gradient descent (with thanks to Prof. Ng’s machine learning course)
ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 3 Practical Notes Regularisation *Courtesy of Associate Professor Andrew.
Dimensionality Reduction Motivation I: Data Compression Machine Learning.
ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 3 Practical Notes Application Advice *Courtesy of Associate Professor Andrew.
The problem of overfitting
Regularization (Additional)
M Machine Learning F# and Accord.net.
Chapter 2-OPTIMIZATION
Variations on Backpropagation.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
Linear Systems Numerical Methods. 2 Jacobi Iterative Method Choose an initial guess (i.e. all zeros) and Iterate until the equality is satisfied. No guarantee.
Support Vector Machines Optimization objective Machine Learning.
WEEK 2 SOFT COMPUTING & MACHINE LEARNING YOSI KRISTIAN Gradient Descent for Linear Regression.
Kernel Regression Prof. Bennett
LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.
Artificial Neural Networks
Machine Learning & Deep Learning
Solving Systems of Linear Equations: Iterative Methods
Lecture 3: Linear Regression (with One Variable)
Computer vision: models, learning and inference
A Simple Artificial Neuron
Machine Learning – Regression David Fenyő
Probabilistic Models for Linear Regression
Logistic Regression Classification Machine Learning.
Logistic Regression Classification Machine Learning.
Collaborative Filtering Matrix Factorization Approach
Logistic Regression Classification Machine Learning.
Variations on Backpropagation.
Chapter 10. Numerical Solutions of Nonlinear Systems of Equations
Nonlinear regression.
Overfitting and Underfitting
Support Vector Machines
The loss function, the normal equation,
Support Vector Machine I
Introduction to Radial Basis Function Networks
Mathematical Foundations of BME Reza Shadmehr
The Math of Machine Learning
Variations on Backpropagation.
Logistic Regression Classification Machine Learning.
Multiple features Linear Regression with multiple variables
Jia-Bin Huang Virginia Tech
Reinforcement Learning (2)
Logistic Regression Classification Machine Learning.
Linear regression with one variable
Machine Learning in Business John C. Hull
Logistic Regression Classification Machine Learning.
Reinforcement Learning (2)
Logistic Regression Geoff Hulten.
Presentation transcript:

Multiple features Linear Regression with multiple variables Machine Learning

Multiple features (variables). Size (feet2) Price ($1000) 2104 460 1416 232 1534 315 852 178 …

Multiple features (variables). Size (feet2) Number of bedrooms Number of floors Age of home (years) Price ($1000) 2104 5 1 45 460 1416 3 2 40 232 1534 30 315 852 36 178 … Notation: = number of features = input (features) of training example. = value of feature in training example. Pop-up Quiz

Hypothesis: Previously:

For convenience of notation, define . Multivariate linear regression.

Gradient descent for multiple variables Linear Regression with multiple variables Gradient descent for multiple variables Machine Learning

Hypothesis: Parameters: Cost function: Gradient descent: Repeat (simultaneously update for every ) Repeat Gradient descent:

Gradient Descent New algorithm : Repeat Previously (n=1): Repeat (simultaneously update for ) (simultaneously update )

Gradient descent in practice I: Feature Scaling Linear Regression with multiple variables Gradient descent in practice I: Feature Scaling Machine Learning

Idea: Make sure features are on a similar scale. Feature Scaling Idea: Make sure features are on a similar scale. E.g. = size (0-2000 feet2) = number of bedrooms (1-5) size (feet2) number of bedrooms

Feature Scaling Get every feature into approximately a range.

Mean normalization Replace with to make features have approximately zero mean (Do not apply to ). E.g.

Gradient descent in practice II: Learning rate Linear Regression with multiple variables Gradient descent in practice II: Learning rate Machine Learning

Gradient descent “Debugging”: How to make sure gradient descent is working correctly. How to choose learning rate .

Making sure gradient descent is working correctly. Example automatic convergence test: Declare convergence if decreases by less than in one iteration. No. of iterations

Making sure gradient descent is working correctly. Gradient descent not working. Use smaller . No. of iterations No. of iterations No. of iterations For sufficiently small , should decrease on every iteration. But if is too small, gradient descent can be slow to converge.

Summary: If is too small: slow convergence. If is too large: may not decrease on every iteration; may not converge. To choose , try

Features and polynomial regression Linear Regression with multiple variables Features and polynomial regression Machine Learning

Housing prices prediction

Polynomial regression Price (y) Size (x)

Choice of features Price (y) Size (x)

Normal equation Linear Regression with multiple variables Machine Learning

Gradient Descent Normal equation: Method to solve for analytically.

Intuition: If 1D (for every ) Solve for

Examples: Size (feet2) Number of bedrooms Number of floors Age of home (years) Price ($1000) 1 2104 5 45 460 1416 3 2 40 232 1534 30 315 852 36 178 Size (feet2) Number of bedrooms Number of floors Age of home (years) Price ($1000) 2104 5 1 45 460 1416 3 2 40 232 1534 30 315 852 36 178 Pop-up Quiz

examples ; features. E.g. If

is inverse of matrix . Octave: pinv(X’*X)*X’*y

training examples, features. Gradient Descent Normal Equation Need to choose . Needs many iterations. No need to choose . Don’t need to iterate. Works well even when is large. Need to compute Slow if is very large.

Normal equation and non-invertibility (optional) Linear Regression with multiple variables Normal equation and non-invertibility (optional) Machine Learning

What if is non-invertible? (singular/ degenerate) Normal equation What if is non-invertible? (singular/ degenerate) Octave: pinv(X’*X)*X’*y

What if is non-invertible? Redundant features (linearly dependent). E.g. size in feet2 size in m2 Too many features (e.g. ). Delete some features, or use regularization.