CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: 4: Regression (continued) 1CSC M.A. Papalaskari - Villanova University T he slides in this presentation are adapted from: The Stanford online ML course
Last time Introduction to linear regression Intuition – least squares approximation Intuition – gradient descent algorithm Hands on: Simple example using excel CSC M.A. Papalaskari - Villanova University2
Today How to apply gradient descent to minimize the cost function for regression linear algebra refresher CSC M.A. Papalaskari - Villanova University3
Housing Prices (Portland, OR) Price (in 1000s of dollars) Size (feet 2 ) 4CSC M.A. Papalaskari - Villanova University Reminder: sample problem
Notation: m = Number of training examples x’s = “input” variable / features y’s = “output” variable / “target” variable Size in feet 2 (x) Price ($) in 1000's (y) …… Training set of housing prices (Portland, OR) 5CSC M.A. Papalaskari - Villanova University Reminder: Notation
Training Set Learning Algorithm h Size of house Estimate price Linear Hypothesis: Univariate linear regression) 6CSC M.A. Papalaskari - Villanova University Reminder: Learning algorithm for hypothesis function h
Training Set Learning Algorithm h Size of house Estimate price Linear Hypothesis: Univariate linear regression) 7CSC M.A. Papalaskari - Villanova University Reminder: Learning algorithm for hypothesis function h
Gradient descent algorithm Linear Regression Model 8CSC M.A. Papalaskari - Villanova University
Today How to apply gradient descent to minimize the cost function for regression 1.a closer look at the cost function 2.applying gradient descent to find the minimum of the cost function linear algebra refresher CSC M.A. Papalaskari - Villanova University9
Hypothesis: Parameters: Cost Function: Goal: 10CSC M.A. Papalaskari - Villanova University
Hypothesis: Parameters: Cost Function: Goal: Simplified 11CSC M.A. Papalaskari - Villanova University θ 0 = 0
y x (for fixed θ 1 this is a function of x )(function of the parameter θ 1 ) 12CSC M.A. Papalaskari - Villanova University θ 0 = 0 h θ (x) = x
y x 13CSC M.A. Papalaskari - Villanova University (for fixed θ 1 this is a function of x )(function of the parameter θ 1 ) θ 0 = 0 h θ (x) = 0.5x
y x 14CSC M.A. Papalaskari - Villanova University (for fixed θ 1 this is a function of x )(function of the parameter θ 1 ) θ 0 = 0 h θ (x) = 0
Hypothesis: Parameters: Cost Function: Goal: 15CSC M.A. Papalaskari - Villanova University What if θ 0 ≠ 0?
Price ($) in 1000’s Size in feet 2 (x) 16CSC M.A. Papalaskari - Villanova University h θ (x) = x (for fixed θ 0, θ 1, this is a function of x)(function of the parameters θ 0, θ 1 )
17CSC M.A. Papalaskari - Villanova University
(for fixed θ 0, θ 1, this is a function of x)(function of the parameters θ 0, θ 1 ) 18CSC M.A. Papalaskari - Villanova University
19CSC M.A. Papalaskari - Villanova University (for fixed θ 0, θ 1, this is a function of x)(function of the parameters θ 0, θ 1 )
20CSC M.A. Papalaskari - Villanova University (for fixed θ 0, θ 1, this is a function of x)(function of the parameters θ 0, θ 1 )
21CSC M.A. Papalaskari - Villanova University (for fixed θ 0, θ 1, this is a function of x)(function of the parameters θ 0, θ 1 )
Today How to apply gradient descent to minimize the cost function for regression 1.a closer look at the cost function 2.applying gradient descent to find the minimum of the cost function linear algebra refresher CSC M.A. Papalaskari - Villanova University22
Have some function Want Gradient descent algorithm outline: Start with some Keep changing to reduce until we hopefully end up at a minimum 23CSC M.A. Papalaskari - Villanova University
Have some function Want Gradient descent algorithm 24CSC M.A. Papalaskari - Villanova University
Have some function Want Gradient descent algorithm learning rate 25CSC M.A. Papalaskari - Villanova University
If α is too small, gradient descent can be slow. If α is too large, gradient descent can overshoot the minimum. It may fail to converge, or even diverge. 26CSC M.A. Papalaskari - Villanova University
at local minimum Current value of 27CSC M.A. Papalaskari - Villanova University
Gradient descent can converge to a local minimum, even with the learning rate α fixed. 28CSC M.A. Papalaskari - Villanova University
Gradient descent algorithm Linear Regression Model 29CSC M.A. Papalaskari - Villanova University
Gradient descent algorithm update and simultaneously 30CSC M.A. Papalaskari - Villanova University
J( ) 31CSC M.A. Papalaskari - Villanova University
J( ) 32CSC M.A. Papalaskari - Villanova University
33CSC M.A. Papalaskari - Villanova University
(for fixed, this is a function of x)(function of the parameters ) 34CSC M.A. Papalaskari - Villanova University
(for fixed, this is a function of x)(function of the parameters ) 35CSC M.A. Papalaskari - Villanova University
(for fixed, this is a function of x)(function of the parameters ) 36CSC M.A. Papalaskari - Villanova University
(for fixed, this is a function of x)(function of the parameters ) 37CSC M.A. Papalaskari - Villanova University
(for fixed, this is a function of x)(function of the parameters ) 38CSC M.A. Papalaskari - Villanova University
(for fixed, this is a function of x)(function of the parameters ) 39CSC M.A. Papalaskari - Villanova University
(for fixed, this is a function of x)(function of the parameters ) 40CSC M.A. Papalaskari - Villanova University
(for fixed, this is a function of x)(function of the parameters ) 41CSC M.A. Papalaskari - Villanova University
(for fixed, this is a function of x)(function of the parameters ) 42CSC M.A. Papalaskari - Villanova University
“Batch” Gradient Descent “Batch”: Each step of gradient descent uses all the training examples. Alternative: process part of the dataset for each step of the algorithm. T he slides in this presentation are adapted from: The Stanford online ML course 43CSC M.A. Papalaskari - Villanova University
Size (feet 2 ) Number of bedrooms Number of floors Age of home (years) Price ($1000) What’s next? We are not in univariate regression anymore: 44CSC M.A. Papalaskari - Villanova University
Size (feet 2 ) Number of bedrooms Number of floors Age of home (years) Price ($1000) What’s next? We are not in univariate regression anymore: 45CSC M.A. Papalaskari - Villanova University
Today How to apply gradient descent to minimize the cost function for regression 1.a closer look at the cost function 2.applying gradient descent to find the minimum of the cost function linear algebra refresher CSC M.A. Papalaskari - Villanova University46
Linear Algebra Review CSC M.A. Papalaskari - Villanova University47
Matrix Elements (entries of matrix) “ i, j entry” in the i th row, j th column Matrix: Rectangular array of numbers Dimension of matrix: number of rows x number of columns eg: 4 x 2 48CSC M.A. Papalaskari - Villanova University
49 Another Example: Representing communication links in a network b b a c e d e d Adjacency matrix Adjacency matrix a b c d e a b c d e a a b b c c d d e e
Vector: An n x 1 matrix. n-dimensional vector element 50CSC M.A. Papalaskari - Villanova University
Vector: An n x 1 matrix. n-dimensional vector 1-indexed vs 0-indexed: element 51CSC M.A. Papalaskari - Villanova University
Matrix Addition 52CSC M.A. Papalaskari - Villanova University
Scalar Multiplication 53CSC M.A. Papalaskari - Villanova University
Combination of Operands 54CSC M.A. Papalaskari - Villanova University
Matrix-vector multiplication 55CSC M.A. Papalaskari - Villanova University
Details: m x n matrix (m rows, n columns) n x 1 matrix (n-dimensional vector) m-dimensional vector To get y i, multiply A ’s i th row with elements of vector x, and add them up. 56CSC M.A. Papalaskari - Villanova University
Example 57CSC M.A. Papalaskari - Villanova University
House sizes: 58CSC M.A. Papalaskari - Villanova University
Example matrix-matrix multiplication
Details: m x k matrix (m rows, k columns) k x n matrix (k rows, n columns) m x n matrix 60CSC M.A. Papalaskari - Villanova University The i th column of the Matrix C is obtained by multiplying A with the i th column of B. (for i = 1, 2, …, n )
Example: Matrix-matrix multiplication 61CSC M.A. Papalaskari - Villanova University
House sizes: Matrix Have 3 competing hypotheses: CSC M.A. Papalaskari - Villanova University
Let and be matrices. Then in general, (not commutative.) E.g. 63CSC M.A. Papalaskari - Villanova University
Let Compute 64CSC M.A. Papalaskari - Villanova University
Identity Matrix For any matrix A, Denoted I (or I n x n or I n ). Examples of identity matrices: 2 x 2 3 x 3 4 x 4 65CSC M.A. Papalaskari - Villanova University
Matrix inverse: A -1 If A is an m x m matrix, and if it has an inverse, Matrices that don’t have an inverse are “singular” or “degenerate” 66CSC M.A. Papalaskari - Villanova University
Matrix Transpose Example: Let be an m x n matrix, and let Then is an n x m matrix, and 67CSC M.A. Papalaskari - Villanova University
Size (feet 2 ) Number of bedrooms Number of floors Age of home (years) Price ($1000) What’s next? We are not in univariate regression anymore: 68CSC M.A. Papalaskari - Villanova University