CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: www.csc.villanova.edu/~map/4510/

Slides:



Advertisements
Similar presentations
Matrices A matrix is a rectangular array of quantities (numbers, expressions or function), arranged in m rows and n columns x 3y.
Advertisements

CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Refresher: Vector and Matrix Algebra Mike Kirkpatrick Department of Chemical Engineering FAMU-FSU College of Engineering.
The loss function, the normal equation,
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Maths for Computer Graphics
Linear regression models in matrix terms. The regression function in matrix terms.
Matrix Approach to Simple Linear Regression KNNL – Chapter 5.
Lecture 7: Matrix-Vector Product; Matrix of a Linear Transformation; Matrix-Matrix Product Sections 2.1, 2.2.1,
CE 311 K - Introduction to Computer Methods Daene C. McKinney
1 Chapter 2 Matrices Matrices provide an orderly way of arranging values or functions to enhance the analysis of systems in a systematic manner. Their.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Collaborative Filtering Matrix Factorization Approach
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Computer Vision Group Prof. Daniel Cremers Autonomous Navigation for Flying Robots Lecture 2.1: Recap on Linear Algebra Daniel Cremers Technische Universität.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Algebra 2: Lesson 5 Using Matrices to Organize Data and Solve Problems.
Sundermeyer MAR 550 Spring Laboratory in Oceanography: Data and Methods MAR550, Spring 2013 Miles A. Sundermeyer Linear Algebra & Calculus Review.
ECON 1150 Matrix Operations Special Matrices
Matrices Square is Good! Copyright © 2014 Curt Hill.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Machine Learning Introduction Study on the Coursera All Right Reserved : Andrew Ng Lecturer:Much Database Lab of Xiamen University Aug 12,2014.
Model representation Linear regression with one variable
Andrew Ng Linear regression with one variable Model representation Machine Learning.
Overview Definitions Basic matrix operations (+, -, x) Determinants and inverses.
Algebra 3: Section 5.5 Objectives of this Section Find the Sum and Difference of Two Matrices Find Scalar Multiples of a Matrix Find the Product of Two.
Matrices. Definitions  A matrix is an m x n array of scalars, arranged conceptually as m rows and n columns.  m is referred to as the row dimension.
Matrix Algebra and Regression a matrix is a rectangular array of elements m=#rows, n=#columns  m x n a single value is called a ‘scalar’ a single row.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Linear algebra: matrix Eigen-value Problems Eng. Hassan S. Migdadi Part 1.
Introduction to Matrices and Matrix Approach to Simple Linear Regression.
Relations, Functions, and Matrices Mathematical Structures for Computer Science Chapter 4 Copyright © 2006 W.H. Freeman & Co.MSCS Slides Relations, Functions.
ES 240: Scientific and Engineering Computation. Chapter 8 Chapter 8: Linear Algebraic Equations and Matrices Uchechukwu Ofoegbu Temple University.
Meeting 18 Matrix Operations. Matrix If A is an m x n matrix - that is, a matrix with m rows and n columns – then the scalar entry in the i th row and.
Slide Copyright © 2009 Pearson Education, Inc. 7.3 Matrices.
Matrices: Simplifying Algebraic Expressions Combining Like Terms & Distributive Property.
 6. Use matrices to represent and manipulate data, e.g., to represent payoffs or incidence relationships related in a network.  7. Multiply matrices.
Chapter 2-OPTIMIZATION
Sec 4.1 Matrices.
CSC 1051 – Data Structures and Algorithms I Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
CS 450: COMPUTER GRAPHICS TRANSFORMATIONS SPRING 2015 DR. MICHAEL J. REALE.
Table of Contents Matrices - Definition and Notation A matrix is a rectangular array of numbers. Consider the following matrix: Matrix B has 3 rows and.
Matrices. Variety of engineering problems lead to the need to solve systems of linear equations matrixcolumn vectors.
A rectangular array of numeric or algebraic quantities subject to mathematical operations. The regular formation of elements into columns and rows.
WEEK 2 SOFT COMPUTING & MACHINE LEARNING YOSI KRISTIAN Gradient Descent for Linear Regression.
Vectors, Matrices and their Products Hung-yi Lee.
Matrices. Matrix A matrix is an ordered rectangular array of numbers. The entry in the i th row and j th column is denoted by a ij. Ex. 4 Columns 3 Rows.
1 Matrix Math ©Anthony Steed Overview n To revise Vectors Matrices.
MTH108 Business Math I Lecture 20.
Linear Algebra review (optional)
Matrix Operations.
Unit 1: Matrices Day 1 Aug. 7th, 2012.
Matrix Operations.
Matrix Operations Monday, August 06, 2018.
Matrix Operations.
Matrix Operations SpringSemester 2017.
Section 7.4 Matrix Algebra.
Neural Networks and Backpropagation
7.3 Matrices.
Matrices Definition: A matrix is a rectangular array of numbers or symbolic elements In many applications, the rows of a matrix will represent individuals.
Collaborative Filtering Matrix Factorization Approach
The loss function, the normal equation,
Mathematical Foundations of BME Reza Shadmehr
Linear Algebra review (optional)
What is machine learning
Matrix Operations SpringSemester 2017.
Multiple features Linear Regression with multiple variables
Multiple features Linear Regression with multiple variables
Linear regression with one variable
Presentation transcript:

CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website: 4: Regression (continued) 1CSC M.A. Papalaskari - Villanova University T he slides in this presentation are adapted from: The Stanford online ML course

Last time Introduction to linear regression Intuition – least squares approximation Intuition – gradient descent algorithm Hands on: Simple example using excel CSC M.A. Papalaskari - Villanova University2

Today How to apply gradient descent to minimize the cost function for regression linear algebra refresher CSC M.A. Papalaskari - Villanova University3

Housing Prices (Portland, OR) Price (in 1000s of dollars) Size (feet 2 ) 4CSC M.A. Papalaskari - Villanova University Reminder: sample problem

Notation: m = Number of training examples x’s = “input” variable / features y’s = “output” variable / “target” variable Size in feet 2 (x) Price ($) in 1000's (y) …… Training set of housing prices (Portland, OR) 5CSC M.A. Papalaskari - Villanova University Reminder: Notation

Training Set Learning Algorithm h Size of house Estimate price Linear Hypothesis: Univariate linear regression) 6CSC M.A. Papalaskari - Villanova University Reminder: Learning algorithm for hypothesis function h

Training Set Learning Algorithm h Size of house Estimate price Linear Hypothesis: Univariate linear regression) 7CSC M.A. Papalaskari - Villanova University Reminder: Learning algorithm for hypothesis function h

Gradient descent algorithm Linear Regression Model 8CSC M.A. Papalaskari - Villanova University

Today How to apply gradient descent to minimize the cost function for regression 1.a closer look at the cost function 2.applying gradient descent to find the minimum of the cost function linear algebra refresher CSC M.A. Papalaskari - Villanova University9

Hypothesis: Parameters: Cost Function: Goal: 10CSC M.A. Papalaskari - Villanova University

Hypothesis: Parameters: Cost Function: Goal: Simplified 11CSC M.A. Papalaskari - Villanova University θ 0 = 0

y x (for fixed θ 1 this is a function of x )(function of the parameter θ 1 ) 12CSC M.A. Papalaskari - Villanova University θ 0 = 0 h θ (x) = x

y x 13CSC M.A. Papalaskari - Villanova University (for fixed θ 1 this is a function of x )(function of the parameter θ 1 ) θ 0 = 0 h θ (x) = 0.5x

y x 14CSC M.A. Papalaskari - Villanova University (for fixed θ 1 this is a function of x )(function of the parameter θ 1 ) θ 0 = 0 h θ (x) = 0

Hypothesis: Parameters: Cost Function: Goal: 15CSC M.A. Papalaskari - Villanova University What if θ 0 ≠ 0?

Price ($) in 1000’s Size in feet 2 (x) 16CSC M.A. Papalaskari - Villanova University h θ (x) = x (for fixed θ 0, θ 1, this is a function of x)(function of the parameters θ 0, θ 1 )

17CSC M.A. Papalaskari - Villanova University

(for fixed θ 0, θ 1, this is a function of x)(function of the parameters θ 0, θ 1 ) 18CSC M.A. Papalaskari - Villanova University

19CSC M.A. Papalaskari - Villanova University (for fixed θ 0, θ 1, this is a function of x)(function of the parameters θ 0, θ 1 )

20CSC M.A. Papalaskari - Villanova University (for fixed θ 0, θ 1, this is a function of x)(function of the parameters θ 0, θ 1 )

21CSC M.A. Papalaskari - Villanova University (for fixed θ 0, θ 1, this is a function of x)(function of the parameters θ 0, θ 1 )

Today How to apply gradient descent to minimize the cost function for regression 1.a closer look at the cost function 2.applying gradient descent to find the minimum of the cost function linear algebra refresher CSC M.A. Papalaskari - Villanova University22

Have some function Want Gradient descent algorithm outline: Start with some Keep changing to reduce until we hopefully end up at a minimum 23CSC M.A. Papalaskari - Villanova University

Have some function Want Gradient descent algorithm 24CSC M.A. Papalaskari - Villanova University

Have some function Want Gradient descent algorithm learning rate 25CSC M.A. Papalaskari - Villanova University

If α is too small, gradient descent can be slow. If α is too large, gradient descent can overshoot the minimum. It may fail to converge, or even diverge. 26CSC M.A. Papalaskari - Villanova University

at local minimum Current value of 27CSC M.A. Papalaskari - Villanova University

Gradient descent can converge to a local minimum, even with the learning rate α fixed. 28CSC M.A. Papalaskari - Villanova University

Gradient descent algorithm Linear Regression Model 29CSC M.A. Papalaskari - Villanova University

Gradient descent algorithm update and simultaneously 30CSC M.A. Papalaskari - Villanova University

  J(     ) 31CSC M.A. Papalaskari - Villanova University

  J(     ) 32CSC M.A. Papalaskari - Villanova University

33CSC M.A. Papalaskari - Villanova University

(for fixed, this is a function of x)(function of the parameters ) 34CSC M.A. Papalaskari - Villanova University

(for fixed, this is a function of x)(function of the parameters ) 35CSC M.A. Papalaskari - Villanova University

(for fixed, this is a function of x)(function of the parameters ) 36CSC M.A. Papalaskari - Villanova University

(for fixed, this is a function of x)(function of the parameters ) 37CSC M.A. Papalaskari - Villanova University

(for fixed, this is a function of x)(function of the parameters ) 38CSC M.A. Papalaskari - Villanova University

(for fixed, this is a function of x)(function of the parameters ) 39CSC M.A. Papalaskari - Villanova University

(for fixed, this is a function of x)(function of the parameters ) 40CSC M.A. Papalaskari - Villanova University

(for fixed, this is a function of x)(function of the parameters ) 41CSC M.A. Papalaskari - Villanova University

(for fixed, this is a function of x)(function of the parameters ) 42CSC M.A. Papalaskari - Villanova University

“Batch” Gradient Descent “Batch”: Each step of gradient descent uses all the training examples. Alternative: process part of the dataset for each step of the algorithm. T he slides in this presentation are adapted from: The Stanford online ML course 43CSC M.A. Papalaskari - Villanova University

Size (feet 2 ) Number of bedrooms Number of floors Age of home (years) Price ($1000) What’s next? We are not in univariate regression anymore: 44CSC M.A. Papalaskari - Villanova University

Size (feet 2 ) Number of bedrooms Number of floors Age of home (years) Price ($1000) What’s next? We are not in univariate regression anymore: 45CSC M.A. Papalaskari - Villanova University

Today How to apply gradient descent to minimize the cost function for regression 1.a closer look at the cost function 2.applying gradient descent to find the minimum of the cost function linear algebra refresher CSC M.A. Papalaskari - Villanova University46

Linear Algebra Review CSC M.A. Papalaskari - Villanova University47

Matrix Elements (entries of matrix) “ i, j entry” in the i th row, j th column Matrix: Rectangular array of numbers Dimension of matrix: number of rows x number of columns eg: 4 x 2 48CSC M.A. Papalaskari - Villanova University

49 Another Example: Representing communication links in a network b b a c e d e d Adjacency matrix Adjacency matrix a b c d e a b c d e a a b b c c d d e e

Vector: An n x 1 matrix. n-dimensional vector element 50CSC M.A. Papalaskari - Villanova University

Vector: An n x 1 matrix. n-dimensional vector 1-indexed vs 0-indexed: element 51CSC M.A. Papalaskari - Villanova University

Matrix Addition 52CSC M.A. Papalaskari - Villanova University

Scalar Multiplication 53CSC M.A. Papalaskari - Villanova University

Combination of Operands 54CSC M.A. Papalaskari - Villanova University

Matrix-vector multiplication 55CSC M.A. Papalaskari - Villanova University

Details: m x n matrix (m rows, n columns) n x 1 matrix (n-dimensional vector) m-dimensional vector To get y i, multiply A ’s i th row with elements of vector x, and add them up. 56CSC M.A. Papalaskari - Villanova University

Example 57CSC M.A. Papalaskari - Villanova University

House sizes: 58CSC M.A. Papalaskari - Villanova University

Example matrix-matrix multiplication

Details: m x k matrix (m rows, k columns) k x n matrix (k rows, n columns) m x n matrix 60CSC M.A. Papalaskari - Villanova University The i th column of the Matrix C is obtained by multiplying A with the i th column of B. (for i = 1, 2, …, n )

Example: Matrix-matrix multiplication 61CSC M.A. Papalaskari - Villanova University

House sizes: Matrix Have 3 competing hypotheses: CSC M.A. Papalaskari - Villanova University

Let and be matrices. Then in general, (not commutative.) E.g. 63CSC M.A. Papalaskari - Villanova University

Let Compute 64CSC M.A. Papalaskari - Villanova University

Identity Matrix For any matrix A, Denoted I (or I n x n or I n ). Examples of identity matrices: 2 x 2 3 x 3 4 x 4 65CSC M.A. Papalaskari - Villanova University

Matrix inverse: A -1 If A is an m x m matrix, and if it has an inverse, Matrices that don’t have an inverse are “singular” or “degenerate” 66CSC M.A. Papalaskari - Villanova University

Matrix Transpose Example: Let be an m x n matrix, and let Then is an n x m matrix, and 67CSC M.A. Papalaskari - Villanova University

Size (feet 2 ) Number of bedrooms Number of floors Age of home (years) Price ($1000) What’s next? We are not in univariate regression anymore: 68CSC M.A. Papalaskari - Villanova University