Download presentation
Presentation is loading. Please wait.
Published byMorris Powers Modified over 9 years ago
0
Lecture 3: Math Primer II
Machine Learning Andrew Rosenberg
1
Today Wrap up of probability Vectors, Matrices. Calculus
Derivation with respect to a vector.
2
Properties of probability density functions
Sum Rule Product Rule
3
Expected Values Given a random variable, with a distribution p(X), what is the expected value of X?
4
Multinomial Distribution
If a variable, x, can take 1-of-K states, we represent the distribution of this variable as a multinomial distribution. The probability of x being in state k is μk
5
Expected Value of a Multinomial
The expected value is the mean values.
6
Gaussian Distribution
One Dimension D-Dimensions
7
Gaussians
8
How machine learning uses statistical modeling
Expectation The expected value of a function is the hypothesis Variance The variance is the confidence in that hypothesis
9
Variance The variance of a random variable describes how much variability around the expected value there is. Calculated as the expected squared error.
10
Covariance The covariance of two random variables expresses how they vary together. If two variables are independent, their covariance equals zero.
11
Linear Algebra Vectors Matrices A one dimensional array.
If not specified, assume x is a column vector. Matrices Higher dimensional array. Typically denoted with capital letters. n rows by m columns
12
Transposition Transposing a matrix swaps columns and rows.
13
Transposition Transposing a matrix swaps columns and rows.
14
Addition Matrices can be added to themselves iff they have the same dimensions. A and B are both n-by-m matrices.
15
Multiplication To multiply two matrices, the inner dimensions must be the same. An n-by-m matrix can be multiplied by an m-by-k matrix
16
Inversion The inverse of an n-by-n or square matrix A is denoted A-1, and has the following property. Where I is the identity matrix is an n-by-n matrix with ones along the diagonal. Iij = 1 iff i = j, 0 otherwise
17
Identity Matrix Matrices are invariant under multiplication by the identity matrix.
18
Helpful matrix inversion properties
19
Norm The norm of a vector, x, represents the euclidean length of a vector.
20
Positive Definite-ness
Quadratic form Scalar Vector Positive Definite matrix M Positive Semi-definite Show that the quadratic form evaluates to a scalar. Why should we care about positive definite and semi definite?
21
Calculus Derivatives and Integrals Optimization
22
Derivatives A derivative of a function defines the slope at a point x.
23
Derivative Example
24
Integrals Integration is the inverse operation of derivation (plus a constant) Graphically, an integral can be considered the area under the curve defined by f(x)
25
Integration Example
26
Vector Calculus Derivation with respect to a matrix or vector Gradient
Change of Variables with a Vector
27
Derivative w.r.t. a vector
Given a vector x, and a function f(x), how can we find f’(x)?
28
Derivative w.r.t. a vector
Given a vector x, and a function f(x), how can we find f’(x)?
29
Example Derivation
30
Example Derivation Also referred to as the gradient of a function.
31
Useful Vector Calculus identities
Scalar Multiplication Product Rule
32
Useful Vector Calculus identities
Derivative of an inverse Change of Variable
33
Optimization Have an objective function that we’d like to maximize or minimize, f(x) Set the first derivative to zero.
34
Optimization with constraints
What if I want to constrain the parameters of the model. The mean is less than 10 Find the best likelihood, subject to a constraint. Two functions: An objective function to maximize An inequality that must be satisfied
35
Lagrange Multipliers Find maxima of f(x,y) subject to a constraint.
36
General form Maximizing: Subject to:
Introduce a new variable, and find a maxima.
37
Example Maximizing: Subject to:
Introduce a new variable, and find a maxima.
38
Example Now have 3 equations with 3 unknowns.
39
Example Eliminate Lambda Substitute and Solve
40
Why does Machine Learning need these tools?
Calculus We need to identify the maximum likelihood, or minimum risk. Optimization Integration allows the marginalization of continuous probability density functions Linear Algebra Many features leads to high dimensional spaces Vectors and matrices allow us to compactly describe and manipulate high dimension al feature spaces.
41
Why does Machine Learning need these tools?
Vector Calculus All of the optimization needs to be performed in high dimensional spaces Optimization of multiple variables simultaneously – Gradient Descent Want to take a marginal over high dimensional distributions like Gaussians.
42
Next Time Linear Regression Then Regularization
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.