Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: www.csc.villanova.edu/~map/4510/

Similar presentations


Presentation on theme: "CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: www.csc.villanova.edu/~map/4510/"— Presentation transcript:

1 CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: www.csc.villanova.edu/~map/4510/ 4: Regression (continued) 1CSC 4510 - M.A. Papalaskari - Villanova University T he slides in this presentation are adapted from: The Stanford online ML course http://www.ml-class.org/http://www.ml-class.org/

2 Last time Introduction to linear regression Intuition – least squares approximation Intuition – gradient descent algorithm Hands on: Simple example using excel CSC 4510 - M.A. Papalaskari - Villanova University2

3 Today How to apply gradient descent to minimize the cost function for regression linear algebra refresher CSC 4510 - M.A. Papalaskari - Villanova University3

4 Housing Prices (Portland, OR) Price (in 1000s of dollars) Size (feet 2 ) 4CSC 4510 - M.A. Papalaskari - Villanova University Reminder: sample problem

5 Notation: m = Number of training examples x’s = “input” variable / features y’s = “output” variable / “target” variable Size in feet 2 (x) Price ($) in 1000's (y) 2104460 1416232 1534315 852178 …… Training set of housing prices (Portland, OR) 5CSC 4510 - M.A. Papalaskari - Villanova University Reminder: Notation

6 Training Set Learning Algorithm h Size of house Estimate price Linear Hypothesis: Univariate linear regression) 6CSC 4510 - M.A. Papalaskari - Villanova University Reminder: Learning algorithm for hypothesis function h

7 Training Set Learning Algorithm h Size of house Estimate price Linear Hypothesis: Univariate linear regression) 7CSC 4510 - M.A. Papalaskari - Villanova University Reminder: Learning algorithm for hypothesis function h

8 Gradient descent algorithm Linear Regression Model 8CSC 4510 - M.A. Papalaskari - Villanova University

9 Today How to apply gradient descent to minimize the cost function for regression 1.a closer look at the cost function 2.applying gradient descent to find the minimum of the cost function linear algebra refresher CSC 4510 - M.A. Papalaskari - Villanova University9

10 Hypothesis: Parameters: Cost Function: Goal: 10CSC 4510 - M.A. Papalaskari - Villanova University

11 Hypothesis: Parameters: Cost Function: Goal: Simplified 11CSC 4510 - M.A. Papalaskari - Villanova University θ 0 = 0

12 y x (for fixed θ 1 this is a function of x )(function of the parameter θ 1 ) 12CSC 4510 - M.A. Papalaskari - Villanova University θ 0 = 0 h θ (x) = x

13 y x 13CSC 4510 - M.A. Papalaskari - Villanova University (for fixed θ 1 this is a function of x )(function of the parameter θ 1 ) θ 0 = 0 h θ (x) = 0.5x

14 y x 14CSC 4510 - M.A. Papalaskari - Villanova University (for fixed θ 1 this is a function of x )(function of the parameter θ 1 ) θ 0 = 0 h θ (x) = 0

15 Hypothesis: Parameters: Cost Function: Goal: 15CSC 4510 - M.A. Papalaskari - Villanova University What if θ 0 ≠ 0?

16 Price ($) in 1000’s Size in feet 2 (x) 16CSC 4510 - M.A. Papalaskari - Villanova University h θ (x) = 10 + 0.1x (for fixed θ 0, θ 1, this is a function of x)(function of the parameters θ 0, θ 1 )

17 17CSC 4510 - M.A. Papalaskari - Villanova University

18 (for fixed θ 0, θ 1, this is a function of x)(function of the parameters θ 0, θ 1 ) 18CSC 4510 - M.A. Papalaskari - Villanova University

19 19CSC 4510 - M.A. Papalaskari - Villanova University (for fixed θ 0, θ 1, this is a function of x)(function of the parameters θ 0, θ 1 )

20 20CSC 4510 - M.A. Papalaskari - Villanova University (for fixed θ 0, θ 1, this is a function of x)(function of the parameters θ 0, θ 1 )

21 21CSC 4510 - M.A. Papalaskari - Villanova University (for fixed θ 0, θ 1, this is a function of x)(function of the parameters θ 0, θ 1 )

22 Today How to apply gradient descent to minimize the cost function for regression 1.a closer look at the cost function 2.applying gradient descent to find the minimum of the cost function linear algebra refresher CSC 4510 - M.A. Papalaskari - Villanova University22

23 Have some function Want Gradient descent algorithm outline: Start with some Keep changing to reduce until we hopefully end up at a minimum 23CSC 4510 - M.A. Papalaskari - Villanova University

24 Have some function Want Gradient descent algorithm 24CSC 4510 - M.A. Papalaskari - Villanova University

25 Have some function Want Gradient descent algorithm learning rate 25CSC 4510 - M.A. Papalaskari - Villanova University

26 If α is too small, gradient descent can be slow. If α is too large, gradient descent can overshoot the minimum. It may fail to converge, or even diverge. 26CSC 4510 - M.A. Papalaskari - Villanova University

27 at local minimum Current value of 27CSC 4510 - M.A. Papalaskari - Villanova University

28 Gradient descent can converge to a local minimum, even with the learning rate α fixed. 28CSC 4510 - M.A. Papalaskari - Villanova University

29 Gradient descent algorithm Linear Regression Model 29CSC 4510 - M.A. Papalaskari - Villanova University

30 Gradient descent algorithm update and simultaneously 30CSC 4510 - M.A. Papalaskari - Villanova University

31   J(     ) 31CSC 4510 - M.A. Papalaskari - Villanova University

32   J(     ) 32CSC 4510 - M.A. Papalaskari - Villanova University

33 33CSC 4510 - M.A. Papalaskari - Villanova University

34 (for fixed, this is a function of x)(function of the parameters ) 34CSC 4510 - M.A. Papalaskari - Villanova University

35 (for fixed, this is a function of x)(function of the parameters ) 35CSC 4510 - M.A. Papalaskari - Villanova University

36 (for fixed, this is a function of x)(function of the parameters ) 36CSC 4510 - M.A. Papalaskari - Villanova University

37 (for fixed, this is a function of x)(function of the parameters ) 37CSC 4510 - M.A. Papalaskari - Villanova University

38 (for fixed, this is a function of x)(function of the parameters ) 38CSC 4510 - M.A. Papalaskari - Villanova University

39 (for fixed, this is a function of x)(function of the parameters ) 39CSC 4510 - M.A. Papalaskari - Villanova University

40 (for fixed, this is a function of x)(function of the parameters ) 40CSC 4510 - M.A. Papalaskari - Villanova University

41 (for fixed, this is a function of x)(function of the parameters ) 41CSC 4510 - M.A. Papalaskari - Villanova University

42 (for fixed, this is a function of x)(function of the parameters ) 42CSC 4510 - M.A. Papalaskari - Villanova University

43 “Batch” Gradient Descent “Batch”: Each step of gradient descent uses all the training examples. Alternative: process part of the dataset for each step of the algorithm. T he slides in this presentation are adapted from: The Stanford online ML course http://www.ml-class.org/http://www.ml-class.org/ 43CSC 4510 - M.A. Papalaskari - Villanova University

44 Size (feet 2 ) Number of bedrooms Number of floors Age of home (years) Price ($1000) 121045145460 114163240232 115343230315 18522136178 What’s next? We are not in univariate regression anymore: 44CSC 4510 - M.A. Papalaskari - Villanova University

45 Size (feet 2 ) Number of bedrooms Number of floors Age of home (years) Price ($1000) 121045145460 114163240232 115343230315 18522136178 What’s next? We are not in univariate regression anymore: 45CSC 4510 - M.A. Papalaskari - Villanova University

46 Today How to apply gradient descent to minimize the cost function for regression 1.a closer look at the cost function 2.applying gradient descent to find the minimum of the cost function linear algebra refresher CSC 4510 - M.A. Papalaskari - Villanova University46

47 Linear Algebra Review CSC 4510 - M.A. Papalaskari - Villanova University47

48 Matrix Elements (entries of matrix) “ i, j entry” in the i th row, j th column Matrix: Rectangular array of numbers Dimension of matrix: number of rows x number of columns eg: 4 x 2 48CSC 4510 - M.A. Papalaskari - Villanova University

49 49 Another Example: Representing communication links in a network b b a c e d e d Adjacency matrix Adjacency matrix a b c d e a b c d e a 0 1 2 0 3 a 0 1 0 0 2 b 1 0 0 0 0 b 0 1 0 0 0 c 2 0 0 1 1 c 1 0 0 1 0 d 0 0 1 0 1 d 0 0 1 0 1 e 3 0 1 1 0 e 0 0 0 0 0

50 Vector: An n x 1 matrix. n-dimensional vector element 50CSC 4510 - M.A. Papalaskari - Villanova University

51 Vector: An n x 1 matrix. n-dimensional vector 1-indexed vs 0-indexed: element 51CSC 4510 - M.A. Papalaskari - Villanova University

52 Matrix Addition 52CSC 4510 - M.A. Papalaskari - Villanova University

53 Scalar Multiplication 53CSC 4510 - M.A. Papalaskari - Villanova University

54 Combination of Operands 54CSC 4510 - M.A. Papalaskari - Villanova University

55 Matrix-vector multiplication 55CSC 4510 - M.A. Papalaskari - Villanova University

56 Details: m x n matrix (m rows, n columns) n x 1 matrix (n-dimensional vector) m-dimensional vector To get y i, multiply A ’s i th row with elements of vector x, and add them up. 56CSC 4510 - M.A. Papalaskari - Villanova University

57 Example 57CSC 4510 - M.A. Papalaskari - Villanova University

58 House sizes: 58CSC 4510 - M.A. Papalaskari - Villanova University

59 Example matrix-matrix multiplication

60 Details: m x k matrix (m rows, k columns) k x n matrix (k rows, n columns) m x n matrix 60CSC 4510 - M.A. Papalaskari - Villanova University The i th column of the Matrix C is obtained by multiplying A with the i th column of B. (for i = 1, 2, …, n )

61 Example: Matrix-matrix multiplication 61CSC 4510 - M.A. Papalaskari - Villanova University

62 House sizes: Matrix Have 3 competing hypotheses: 1. 2. 3. 62CSC 4510 - M.A. Papalaskari - Villanova University

63 Let and be matrices. Then in general, (not commutative.) E.g. 63CSC 4510 - M.A. Papalaskari - Villanova University

64 Let Compute 64CSC 4510 - M.A. Papalaskari - Villanova University

65 Identity Matrix For any matrix A, Denoted I (or I n x n or I n ). Examples of identity matrices: 2 x 2 3 x 3 4 x 4 65CSC 4510 - M.A. Papalaskari - Villanova University

66 Matrix inverse: A -1 If A is an m x m matrix, and if it has an inverse, Matrices that don’t have an inverse are “singular” or “degenerate” 66CSC 4510 - M.A. Papalaskari - Villanova University

67 Matrix Transpose Example: Let be an m x n matrix, and let Then is an n x m matrix, and 67CSC 4510 - M.A. Papalaskari - Villanova University

68 Size (feet 2 ) Number of bedrooms Number of floors Age of home (years) Price ($1000) 121045145460 114163240232 115343230315 18522136178 What’s next? We are not in univariate regression anymore: 68CSC 4510 - M.A. Papalaskari - Villanova University


Download ppt "CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: www.csc.villanova.edu/~map/4510/"

Similar presentations


Ads by Google