The Elements of Linear Algebra

The Elements of Linear Algebra
Alexander G. Ororbia II The Pennsylvania State University IST 597: Foundations of Deep Learning

About this chapter Not a comprehensive survey of all of linear algebra
Focused on the subset most relevant to deep learning Larger subset: e.g., Linear Algebra by Georgi E. Shilov

Scalars A scalar is a single number
Integers, real numbers, rational numbers, etc. Denoted with italic font:

Vectors A vector is a 1-D array of numbers:
Can be real, binary, integer, etc. Example notation for type and size:

Matrices A matrix is a 2-D array of numbers:
Example notation for type and shape:

Tensors A tensor is an array of numbers, that may have
zero dimensions, and be a scalar one dimension, and be a vector two dimensions, and be a matrix or more dimensions.

Tensor = Multidimensional Array

Matrix Transpose

Matrix (Dot) Product m = m • n p p n Must match

Matrix Addition/Subtraction
Assume column-major matrices (for efficiency) Add/subtract operators follow basic properties of normal add/subtract Matrix A + Matrix B is computed element-wise 0.5 -0.7 -0.69 1.8 0.5 -0.7 -0.69 1.8 = 1.0 = -1.4 = -1.38 = 3.6 + =

Matrix-Matrix Multiply
Matrix-Matrix multiply (outer product) Vector-Vector multiply (dot product) The usual workhorse of learning algorithms Vectorizes sums of products 0.5 -0.7 -0.69 1.8 0.5 -0.7 -0.69 1.8 (.5 * .5) + (-.7 * -.69) (.5 * -.7) + (-.7 * 1.8) (-.69 * .5) + (1.8 * -.69) (-.69 * -.7) + (1.8 * 1.8) * =

Hadamard Product Multiply each A(I, j) to each corresponding B(I, j)
Element-wise multiplication 0.5 -0.7 -0.69 1.8 0.5 -0.7 -0.69 1.8 .5 * .5 = .25 -.7 * .7 = .49 -.69 * -.69 = .4761 1.8 * 1.8 = 3.24 =

Elementwise Functions
Applied to each element (i, j) of matrix argument Could be cos(.), sin(.), tanh(.), etc. Identity: 𝜑 v =v Logistic Sigmoid: 𝜑 v =𝜎 𝑣 = 1 1+ 𝑒 −𝑣 Linear Rectifier: 𝜑 v =max⁡(0,v) Softmax: 𝜑 𝑣 𝑐 = 𝑒 − 𝑣 𝑐 𝑐=1 𝑐= 𝐶 𝑒 − 𝑣 𝑐 𝝋( ) 𝜑(1.0) = 1 𝜑(-1.4) = 0 𝜑(-1.38) = 0 𝜑(1.8) = 1.8 0.5 -0.7 -0.69 1.8 =

Why do we care? Computation Graphs
Linear algebra operators arranged in a direct graph!

𝒙 𝟎 𝒘 𝟎 𝒘 𝟏 𝒙 𝟏 𝝋 H 𝒘 𝟐 𝒙 𝟐

𝒘 𝟎 𝒙 𝟎 = 𝒛 𝟎 𝒘 𝟏 𝒙 𝟏 = 𝒛 𝟏 𝝋 H 𝒘 𝟐 𝒙 𝟐 = 𝒛 𝟐

𝒘 𝟎 𝒘 𝟏 𝝋 H 𝒁= 𝒛 𝟎 + 𝒛 𝟏 + 𝒛 𝟐 𝒘 𝟐

𝒘 𝟎 𝒘 𝟏 𝝋 H 𝑯=𝝋(𝒁) 𝒘 𝟐

Vector Form (One Unit) This calculates activation value of single hidden unit that is connected to 3 sensors. 𝒉 𝟎 : 𝐰 𝟎 𝐰 𝟏 𝐰 𝟐 * 𝐱 𝟎 𝒙 𝟏 𝒙 𝟐 = 𝛗( 𝐰 𝟎 ∗ 𝐱 𝟎 + 𝐰 𝟏 ∗ 𝐱 𝟏 + 𝐰 𝟐 ∗ 𝐱 𝟐 )

Vector Form (Two Units)
This vectorization easily generalizes to multiple sensors feeding into multiple units. 𝒉 𝟎 : 𝐰 𝟎 𝐰 𝟏 𝐰 𝟐 𝐰 𝟑 𝐰 𝟒 𝐰 𝟓 * 𝐱 𝟎 𝒙 𝟏 𝒙 𝟐 = 𝛗( 𝐰 𝟎 ∗ 𝐱 𝟎 + 𝐰 𝟏 ∗ 𝐱 𝟏 + 𝐰 𝟐 ∗ 𝐱 𝟐 ) 𝛗( 𝐰 𝟑 ∗ 𝐱 𝟎 + 𝐰 𝟒 ∗ 𝐱 𝟏 + 𝐰 𝟓 ∗ 𝐱 𝟐 ) 𝒉 𝟏 : Known as vectorization!

Now Let Us Fully Vectorize This!
This vectorization is also important for formulating mini-batches. (Good for GPU-based processing.) 𝒉 𝟎 : 𝐰 𝟎 𝐰 𝟏 𝐰 𝟐 𝐰 𝟑 𝐰 𝟒 𝐰 𝟓 * 𝐱 𝟎 𝐱 𝟑 𝒙 𝟏 𝒙 𝟒 𝒙 𝟐 𝒙 𝟓 = 𝛗( 𝐰 𝟎 ∗ 𝐱 𝟎 + …) 𝛗( 𝐰 𝟎 ∗ 𝐱 𝟑 + …) 𝛗( 𝐰 𝟑 ∗ 𝐱 𝟎 + …) 𝛗( 𝐰 𝟑 ∗ 𝐱 𝟑 + …) 𝒉 𝟏 :

Identity Matrix

Systems of Equations expands to

Solving Systems of Equations
A linear system of equations can have: No solution Many solutions Exactly one solution: this means multiplication by the matrix is an invertible function

Matrix Inversion Matrix inverse: Solving a system using an inverse:
Numerically unstable, but useful for abstract analysis

Invertibility Matrix can’t be inverted if… More rows than columns
More columns than rows Redundant rows/columns (“linearly dependent”, “low rank”)

Norms Functions that measure how “large” a vector is
Similar to a distance between zero and the point represented by the vector

Norms Lp norm Most popular norm: L2 norm, p=2 (Euclidean)
Max norm, infinite p: (Manhattan)

Special Matrices and Vectors
Unit vector: Symmetric Matrix: Orthogonal matrix:

Eigendecomposition Eigenvector and eigenvalue:
Eigendecomposition of a diagonalizable matrix: Every real symmetric matrix has a real, orthogonal eigendecomposition:

Effect of Eigenvalues

Singular Value Decomposition
Similar to eigendecomposition More general; matrix need not be square Stanford lecture

Moore-Penrose Pseudoinverse
If the equation has: Exactly one solution: this is the same as the inverse. No solution: this gives us the solution with the smallest error Many solutions: this gives us the solution with the smallest norm of x. Linear Moore-Penrose

Computing the Pseudoinverse
The SVD allows the computation of the pseudoinverse: Take reciprocal of non-zero entries

Learning Linear Algebra
Do a lot of practice problems Linear Algebra Done Right  Linear Algebra for Dummies  html Start out with lots of summation signs and indexing into individual entries Code up a few basic matrix operations and compare to worked- out solutions Eventually you will be able to mostly use matrix and vector product notation quickly and easily

References This is a variation presentation of Ian Goodfellow’s slides, for Chapter 2 of Deep Learning ( ml) Mention cosine similarity!

The Elements of Linear Algebra

Similar presentations

Presentation on theme: "The Elements of Linear Algebra"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Elements of Linear Algebra

Similar presentations

Presentation on theme: "The Elements of Linear Algebra"— Presentation transcript:

Similar presentations

About project

Feedback