The Math of Machine Learning

Slides:



Advertisements
Similar presentations
Optimization : The min and max of a function
Advertisements

6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Definition  Regression Model  Regression Equation Y i =  0 +  1 X i ^ Given a collection of paired data, the regression equation algebraically describes.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
x – independent variable (input)
A B C k1k1 k2k2 Consecutive Reaction d[A] dt = -k 1 [A] d[B] dt = k 1 [A] - k 2 [B] d[C] dt = k 2 [B] [A] = [A] 0 exp (-k 1 t) [B] = [A] 0 (k 1 /(k 2 -k.
CHAPTER 4 ECONOMETRICS x x x x x Multiple Regression = more than one explanatory variable Independent variables are X 2 and X 3. Y i = B 1 + B 2 X 2i +
NOTES ON MULTIPLE REGRESSION USING MATRICES  Multiple Regression Tony E. Smith ESE 502: Spatial Data Analysis  Matrix Formulation of Regression  Applications.
Week Lecture 3Slide #1 Minimizing e 2 : Deriving OLS Estimators The problem Deriving b 0 Deriving b 1 Interpreting b 0 and b 1.
D Nagesh Kumar, IIScOptimization Methods: M2L3 1 Optimization using Calculus Optimization of Functions of Multiple Variables: Unconstrained Optimization.
Basic Mathematics for Portfolio Management. Statistics Variables x, y, z Constants a, b Observations {x n, y n |n=1,…N} Mean.
Lorelei Howard and Nick Wright MfD 2008
Simple Linear Regression Analysis
Classification and Prediction: Regression Analysis
Matrix Approach to Simple Linear Regression KNNL – Chapter 5.
Today Wrap up of probability Vectors, Matrices. Calculus
Collaborative Filtering Matrix Factorization Approach
More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.
Multiple Linear Regression - Matrix Formulation Let x = (x 1, x 2, …, x n )′ be a n  1 column vector and let g(x) be a scalar function of x. Then, by.
Ordinary Least-Squares Emmanuel Iarussi Inria. Many graphics problems can be seen as finding the best set of parameters for a model, given some data Surface.
Chem Math 252 Chapter 5 Regression. Linear & Nonlinear Regression Linear regression –Linear in the parameters –Does not have to be linear in the.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Model representation Linear regression with one variable
Andrew Ng Linear regression with one variable Model representation Machine Learning.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
Multiple Regression I KNNL – Chapter 6. Models with Multiple Predictors Most Practical Problems have more than one potential predictor variable Goal is.
Regression. Population Covariance and Correlation.
ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 3 Practical Notes Regularisation *Courtesy of Associate Professor Andrew.
M Machine Learning F# and Accord.net. Alena Dzenisenka Software architect at Luxoft Poland Member of F# Software Foundation Board of Trustees Researcher.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Simple Linear Regression. The term linear regression implies that  Y|x is linearly related to x by the population regression equation  Y|x =  +  x.
Introduction to Matrices and Matrix Approach to Simple Linear Regression.
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
M Machine Learning F# and Accord.net.
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
Regression. We have talked about regression problems before, as the problem of estimating the mapping f(x) between an independent variable x and a dependent.
Review of statistical modeling and probability theory Alan Moses ML4bio.
MathematicalMarketing Slide 5.1 OLS Chapter 5: Ordinary Least Square Regression We will be discussing  The Linear Regression Model  Estimation of the.
Data Mining Lectures Lecture 7: Regression Padhraic Smyth, UC Irvine ICS 278: Data Mining Lecture 7: Regression Algorithms Padhraic Smyth Department of.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
WEEK 2 SOFT COMPUTING & MACHINE LEARNING YOSI KRISTIAN Gradient Descent for Linear Regression.
The simple linear regression model and parameter estimation
Chapter 7. Classification and Prediction
Gradient descent David Kauchak CS 158 – Fall 2016.
Deep Feedforward Networks
REGRESSION G&W p
Machine Learning & Deep Learning
Machine Learning – Regression David Fenyő
Active Learning Lecture Slides
Ordinary Least Squares (OLS) Regression
Regression.
Multiple Regression.
Machine Learning – Regression David Fenyő
Matrices Definition: A matrix is a rectangular array of numbers or symbolic elements In many applications, the rows of a matrix will represent individuals.
Linear Regression.
REGRESSION.
Collaborative Filtering Matrix Factorization Approach
Regression Models - Introduction
Chapter 10. Numerical Solutions of Nonlinear Systems of Equations
Machine Learning Math Essentials Part 2
15.5 Directional Derivatives
How do we find the best linear regression line?
Cases. Simple Regression Linear Multiple Regression.
Supervised machine learning: creating a model
Multiple features Linear Regression with multiple variables
Multiple features Linear Regression with multiple variables
Chapter 15 Multiple Regression.
Regression Models - Introduction
Logistic Regression Geoff Hulten.
Analyzing Multivariable Change: Optimization
Presentation transcript:

The Math of Machine Learning Neil Thistlethwaite

Linear Regression θ represents our parameter vector for standard linear regression with a single feature it will be of size 2x1 h(x) is our hypothesis of the form (with one feature) X is the feature data, usually represented in the form of a mxn matrix, with m being the number of data points and n being the number of features y is the known values or “actual” values that we’re trying to perform regression on in conjunction with the supplied x-values

Linear Regression Cost Function The first thing we have to do to find an optimal “fit” for our line is come up with some kind of cost function to determine how good of a fit any given hypothesis is. This particular cost function J(θ) is squared residuals, which we choose because it guarantees there will be exactly one local minimum, which will also be the global minimum, at which point we will have our line of best fit.

Linear Regression Gradient Descent Gradient Descent is a technique for finding the global minimum where we calculate the gradient at a given point and try a new set of parameters in the direction of the gradient. The gradient is an operator from multivariable calculus which represents the direction of greatest increase of a function, by subtracting it from our current estimate, we move in the direction of greatest decrease, down the “hill”.

Gradient and Partial Derivatives A derivative represents the instantaneous rate of change, or slope, of a function at a given point. A partial derivative is much the same, except in multiple dimensions; to find a partial derivative with respect to a variable you do it the exact same way but pretend that all the other variables are constants.

Acknowledgements 6th period Parallel so Neil could actually write this lecture Thank kcheng Thanks