Download presentation
Presentation is loading. Please wait.
Published byClement Cummings Modified over 9 years ago
1
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: www.csc.villanova.edu/~map/4510/ 4: Regression (continued) 1CSC 4510 - M.A. Papalaskari - Villanova University T he slides in this presentation are adapted from: The Stanford online ML course http://www.ml-class.org/http://www.ml-class.org/
2
Last time Introduction to linear regression Intuition – least squares approximation Intuition – gradient descent algorithm Hands on: Simple example using excel CSC 4510 - M.A. Papalaskari - Villanova University2
3
Today How to apply gradient descent to minimize the cost function for regression linear algebra refresher CSC 4510 - M.A. Papalaskari - Villanova University3
4
Housing Prices (Portland, OR) Price (in 1000s of dollars) Size (feet 2 ) 4CSC 4510 - M.A. Papalaskari - Villanova University Reminder: sample problem
5
Notation: m = Number of training examples x’s = “input” variable / features y’s = “output” variable / “target” variable Size in feet 2 (x) Price ($) in 1000's (y) 2104460 1416232 1534315 852178 …… Training set of housing prices (Portland, OR) 5CSC 4510 - M.A. Papalaskari - Villanova University Reminder: Notation
6
Training Set Learning Algorithm h Size of house Estimate price Linear Hypothesis: Univariate linear regression) 6CSC 4510 - M.A. Papalaskari - Villanova University Reminder: Learning algorithm for hypothesis function h
7
Training Set Learning Algorithm h Size of house Estimate price Linear Hypothesis: Univariate linear regression) 7CSC 4510 - M.A. Papalaskari - Villanova University Reminder: Learning algorithm for hypothesis function h
8
Gradient descent algorithm Linear Regression Model 8CSC 4510 - M.A. Papalaskari - Villanova University
9
Today How to apply gradient descent to minimize the cost function for regression 1.a closer look at the cost function 2.applying gradient descent to find the minimum of the cost function linear algebra refresher CSC 4510 - M.A. Papalaskari - Villanova University9
10
Hypothesis: Parameters: Cost Function: Goal: 10CSC 4510 - M.A. Papalaskari - Villanova University
11
Hypothesis: Parameters: Cost Function: Goal: Simplified 11CSC 4510 - M.A. Papalaskari - Villanova University θ 0 = 0
12
y x (for fixed θ 1 this is a function of x )(function of the parameter θ 1 ) 12CSC 4510 - M.A. Papalaskari - Villanova University θ 0 = 0 h θ (x) = x
13
y x 13CSC 4510 - M.A. Papalaskari - Villanova University (for fixed θ 1 this is a function of x )(function of the parameter θ 1 ) θ 0 = 0 h θ (x) = 0.5x
14
y x 14CSC 4510 - M.A. Papalaskari - Villanova University (for fixed θ 1 this is a function of x )(function of the parameter θ 1 ) θ 0 = 0 h θ (x) = 0
15
Hypothesis: Parameters: Cost Function: Goal: 15CSC 4510 - M.A. Papalaskari - Villanova University What if θ 0 ≠ 0?
16
Price ($) in 1000’s Size in feet 2 (x) 16CSC 4510 - M.A. Papalaskari - Villanova University h θ (x) = 10 + 0.1x (for fixed θ 0, θ 1, this is a function of x)(function of the parameters θ 0, θ 1 )
17
17CSC 4510 - M.A. Papalaskari - Villanova University
18
(for fixed θ 0, θ 1, this is a function of x)(function of the parameters θ 0, θ 1 ) 18CSC 4510 - M.A. Papalaskari - Villanova University
19
19CSC 4510 - M.A. Papalaskari - Villanova University (for fixed θ 0, θ 1, this is a function of x)(function of the parameters θ 0, θ 1 )
20
20CSC 4510 - M.A. Papalaskari - Villanova University (for fixed θ 0, θ 1, this is a function of x)(function of the parameters θ 0, θ 1 )
21
21CSC 4510 - M.A. Papalaskari - Villanova University (for fixed θ 0, θ 1, this is a function of x)(function of the parameters θ 0, θ 1 )
22
Today How to apply gradient descent to minimize the cost function for regression 1.a closer look at the cost function 2.applying gradient descent to find the minimum of the cost function linear algebra refresher CSC 4510 - M.A. Papalaskari - Villanova University22
23
Have some function Want Gradient descent algorithm outline: Start with some Keep changing to reduce until we hopefully end up at a minimum 23CSC 4510 - M.A. Papalaskari - Villanova University
24
Have some function Want Gradient descent algorithm 24CSC 4510 - M.A. Papalaskari - Villanova University
25
Have some function Want Gradient descent algorithm learning rate 25CSC 4510 - M.A. Papalaskari - Villanova University
26
If α is too small, gradient descent can be slow. If α is too large, gradient descent can overshoot the minimum. It may fail to converge, or even diverge. 26CSC 4510 - M.A. Papalaskari - Villanova University
27
at local minimum Current value of 27CSC 4510 - M.A. Papalaskari - Villanova University
28
Gradient descent can converge to a local minimum, even with the learning rate α fixed. 28CSC 4510 - M.A. Papalaskari - Villanova University
29
Gradient descent algorithm Linear Regression Model 29CSC 4510 - M.A. Papalaskari - Villanova University
30
Gradient descent algorithm update and simultaneously 30CSC 4510 - M.A. Papalaskari - Villanova University
31
J( ) 31CSC 4510 - M.A. Papalaskari - Villanova University
32
J( ) 32CSC 4510 - M.A. Papalaskari - Villanova University
33
33CSC 4510 - M.A. Papalaskari - Villanova University
34
(for fixed, this is a function of x)(function of the parameters ) 34CSC 4510 - M.A. Papalaskari - Villanova University
35
(for fixed, this is a function of x)(function of the parameters ) 35CSC 4510 - M.A. Papalaskari - Villanova University
36
(for fixed, this is a function of x)(function of the parameters ) 36CSC 4510 - M.A. Papalaskari - Villanova University
37
(for fixed, this is a function of x)(function of the parameters ) 37CSC 4510 - M.A. Papalaskari - Villanova University
38
(for fixed, this is a function of x)(function of the parameters ) 38CSC 4510 - M.A. Papalaskari - Villanova University
39
(for fixed, this is a function of x)(function of the parameters ) 39CSC 4510 - M.A. Papalaskari - Villanova University
40
(for fixed, this is a function of x)(function of the parameters ) 40CSC 4510 - M.A. Papalaskari - Villanova University
41
(for fixed, this is a function of x)(function of the parameters ) 41CSC 4510 - M.A. Papalaskari - Villanova University
42
(for fixed, this is a function of x)(function of the parameters ) 42CSC 4510 - M.A. Papalaskari - Villanova University
43
“Batch” Gradient Descent “Batch”: Each step of gradient descent uses all the training examples. Alternative: process part of the dataset for each step of the algorithm. T he slides in this presentation are adapted from: The Stanford online ML course http://www.ml-class.org/http://www.ml-class.org/ 43CSC 4510 - M.A. Papalaskari - Villanova University
44
Size (feet 2 ) Number of bedrooms Number of floors Age of home (years) Price ($1000) 121045145460 114163240232 115343230315 18522136178 What’s next? We are not in univariate regression anymore: 44CSC 4510 - M.A. Papalaskari - Villanova University
45
Size (feet 2 ) Number of bedrooms Number of floors Age of home (years) Price ($1000) 121045145460 114163240232 115343230315 18522136178 What’s next? We are not in univariate regression anymore: 45CSC 4510 - M.A. Papalaskari - Villanova University
46
Today How to apply gradient descent to minimize the cost function for regression 1.a closer look at the cost function 2.applying gradient descent to find the minimum of the cost function linear algebra refresher CSC 4510 - M.A. Papalaskari - Villanova University46
47
Linear Algebra Review CSC 4510 - M.A. Papalaskari - Villanova University47
48
Matrix Elements (entries of matrix) “ i, j entry” in the i th row, j th column Matrix: Rectangular array of numbers Dimension of matrix: number of rows x number of columns eg: 4 x 2 48CSC 4510 - M.A. Papalaskari - Villanova University
49
49 Another Example: Representing communication links in a network b b a c e d e d Adjacency matrix Adjacency matrix a b c d e a b c d e a 0 1 2 0 3 a 0 1 0 0 2 b 1 0 0 0 0 b 0 1 0 0 0 c 2 0 0 1 1 c 1 0 0 1 0 d 0 0 1 0 1 d 0 0 1 0 1 e 3 0 1 1 0 e 0 0 0 0 0
50
Vector: An n x 1 matrix. n-dimensional vector element 50CSC 4510 - M.A. Papalaskari - Villanova University
51
Vector: An n x 1 matrix. n-dimensional vector 1-indexed vs 0-indexed: element 51CSC 4510 - M.A. Papalaskari - Villanova University
52
Matrix Addition 52CSC 4510 - M.A. Papalaskari - Villanova University
53
Scalar Multiplication 53CSC 4510 - M.A. Papalaskari - Villanova University
54
Combination of Operands 54CSC 4510 - M.A. Papalaskari - Villanova University
55
Matrix-vector multiplication 55CSC 4510 - M.A. Papalaskari - Villanova University
56
Details: m x n matrix (m rows, n columns) n x 1 matrix (n-dimensional vector) m-dimensional vector To get y i, multiply A ’s i th row with elements of vector x, and add them up. 56CSC 4510 - M.A. Papalaskari - Villanova University
57
Example 57CSC 4510 - M.A. Papalaskari - Villanova University
58
House sizes: 58CSC 4510 - M.A. Papalaskari - Villanova University
59
Example matrix-matrix multiplication
60
Details: m x k matrix (m rows, k columns) k x n matrix (k rows, n columns) m x n matrix 60CSC 4510 - M.A. Papalaskari - Villanova University The i th column of the Matrix C is obtained by multiplying A with the i th column of B. (for i = 1, 2, …, n )
61
Example: Matrix-matrix multiplication 61CSC 4510 - M.A. Papalaskari - Villanova University
62
House sizes: Matrix Have 3 competing hypotheses: 1. 2. 3. 62CSC 4510 - M.A. Papalaskari - Villanova University
63
Let and be matrices. Then in general, (not commutative.) E.g. 63CSC 4510 - M.A. Papalaskari - Villanova University
64
Let Compute 64CSC 4510 - M.A. Papalaskari - Villanova University
65
Identity Matrix For any matrix A, Denoted I (or I n x n or I n ). Examples of identity matrices: 2 x 2 3 x 3 4 x 4 65CSC 4510 - M.A. Papalaskari - Villanova University
66
Matrix inverse: A -1 If A is an m x m matrix, and if it has an inverse, Matrices that don’t have an inverse are “singular” or “degenerate” 66CSC 4510 - M.A. Papalaskari - Villanova University
67
Matrix Transpose Example: Let be an m x n matrix, and let Then is an n x m matrix, and 67CSC 4510 - M.A. Papalaskari - Villanova University
68
Size (feet 2 ) Number of bedrooms Number of floors Age of home (years) Price ($1000) 121045145460 114163240232 115343230315 18522136178 What’s next? We are not in univariate regression anymore: 68CSC 4510 - M.A. Papalaskari - Villanova University
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.