Download presentation
Presentation is loading. Please wait.
1
The Elements of Linear Algebra
Alexander G. Ororbia II The Pennsylvania State University IST 597: Foundations of Deep Learning
2
About this chapter Not a comprehensive survey of all of linear algebra
Focused on the subset most relevant to deep learning Larger subset: e.g., Linear Algebra by Georgi E. Shilov
3
Scalars A scalar is a single number
Integers, real numbers, rational numbers, etc. Denoted with italic font:
4
Vectors A vector is a 1-D array of numbers:
Can be real, binary, integer, etc. Example notation for type and size:
5
Matrices A matrix is a 2-D array of numbers:
Example notation for type and shape:
6
Tensors A tensor is an array of numbers, that may have
zero dimensions, and be a scalar one dimension, and be a vector two dimensions, and be a matrix or more dimensions.
7
Tensor = Multidimensional Array
8
Matrix Transpose
9
Matrix (Dot) Product m = m • n p p n Must match
10
Matrix Addition/Subtraction
Assume column-major matrices (for efficiency) Add/subtract operators follow basic properties of normal add/subtract Matrix A + Matrix B is computed element-wise 0.5 -0.7 -0.69 1.8 0.5 -0.7 -0.69 1.8 = 1.0 = -1.4 = -1.38 = 3.6 + =
11
Matrix-Matrix Multiply
Matrix-Matrix multiply (outer product) Vector-Vector multiply (dot product) The usual workhorse of learning algorithms Vectorizes sums of products 0.5 -0.7 -0.69 1.8 0.5 -0.7 -0.69 1.8 (.5 * .5) + (-.7 * -.69) (.5 * -.7) + (-.7 * 1.8) (-.69 * .5) + (1.8 * -.69) (-.69 * -.7) + (1.8 * 1.8) * =
12
Hadamard Product Multiply each A(I, j) to each corresponding B(I, j)
Element-wise multiplication 0.5 -0.7 -0.69 1.8 0.5 -0.7 -0.69 1.8 .5 * .5 = .25 -.7 * .7 = .49 -.69 * -.69 = .4761 1.8 * 1.8 = 3.24 =
13
Elementwise Functions
Applied to each element (i, j) of matrix argument Could be cos(.), sin(.), tanh(.), etc. Identity: 𝜑 v =v Logistic Sigmoid: 𝜑 v =𝜎 𝑣 = 1 1+ 𝑒 −𝑣 Linear Rectifier: 𝜑 v =max(0,v) Softmax: 𝜑 𝑣 𝑐 = 𝑒 − 𝑣 𝑐 𝑐=1 𝑐= 𝐶 𝑒 − 𝑣 𝑐 𝝋( ) 𝜑(1.0) = 1 𝜑(-1.4) = 0 𝜑(-1.38) = 0 𝜑(1.8) = 1.8 0.5 -0.7 -0.69 1.8 =
14
Why do we care? Computation Graphs
Linear algebra operators arranged in a direct graph!
15
𝒙 𝟎 𝒘 𝟎 𝒘 𝟏 𝒙 𝟏 𝝋 H 𝒘 𝟐 𝒙 𝟐
16
𝒘 𝟎 𝒙 𝟎 = 𝒛 𝟎 𝒘 𝟏 𝒙 𝟏 = 𝒛 𝟏 𝝋 H 𝒘 𝟐 𝒙 𝟐 = 𝒛 𝟐
17
𝒘 𝟎 𝒘 𝟏 𝝋 H 𝒁= 𝒛 𝟎 + 𝒛 𝟏 + 𝒛 𝟐 𝒘 𝟐
18
𝒘 𝟎 𝒘 𝟏 𝝋 H 𝑯=𝝋(𝒁) 𝒘 𝟐
19
Vector Form (One Unit) This calculates activation value of single hidden unit that is connected to 3 sensors. 𝒉 𝟎 : 𝐰 𝟎 𝐰 𝟏 𝐰 𝟐 * 𝐱 𝟎 𝒙 𝟏 𝒙 𝟐 = 𝛗( 𝐰 𝟎 ∗ 𝐱 𝟎 + 𝐰 𝟏 ∗ 𝐱 𝟏 + 𝐰 𝟐 ∗ 𝐱 𝟐 )
20
Vector Form (Two Units)
This vectorization easily generalizes to multiple sensors feeding into multiple units. 𝒉 𝟎 : 𝐰 𝟎 𝐰 𝟏 𝐰 𝟐 𝐰 𝟑 𝐰 𝟒 𝐰 𝟓 * 𝐱 𝟎 𝒙 𝟏 𝒙 𝟐 = 𝛗( 𝐰 𝟎 ∗ 𝐱 𝟎 + 𝐰 𝟏 ∗ 𝐱 𝟏 + 𝐰 𝟐 ∗ 𝐱 𝟐 ) 𝛗( 𝐰 𝟑 ∗ 𝐱 𝟎 + 𝐰 𝟒 ∗ 𝐱 𝟏 + 𝐰 𝟓 ∗ 𝐱 𝟐 ) 𝒉 𝟏 : Known as vectorization!
21
Now Let Us Fully Vectorize This!
This vectorization is also important for formulating mini-batches. (Good for GPU-based processing.) 𝒉 𝟎 : 𝐰 𝟎 𝐰 𝟏 𝐰 𝟐 𝐰 𝟑 𝐰 𝟒 𝐰 𝟓 * 𝐱 𝟎 𝐱 𝟑 𝒙 𝟏 𝒙 𝟒 𝒙 𝟐 𝒙 𝟓 = 𝛗( 𝐰 𝟎 ∗ 𝐱 𝟎 + …) 𝛗( 𝐰 𝟎 ∗ 𝐱 𝟑 + …) 𝛗( 𝐰 𝟑 ∗ 𝐱 𝟎 + …) 𝛗( 𝐰 𝟑 ∗ 𝐱 𝟑 + …) 𝒉 𝟏 :
22
Identity Matrix
23
Systems of Equations expands to
24
Solving Systems of Equations
A linear system of equations can have: No solution Many solutions Exactly one solution: this means multiplication by the matrix is an invertible function
25
Matrix Inversion Matrix inverse: Solving a system using an inverse:
Numerically unstable, but useful for abstract analysis
26
Invertibility Matrix can’t be inverted if… More rows than columns
More columns than rows Redundant rows/columns (“linearly dependent”, “low rank”)
27
Norms Functions that measure how “large” a vector is
Similar to a distance between zero and the point represented by the vector
28
Norms Lp norm Most popular norm: L2 norm, p=2 (Euclidean)
Max norm, infinite p: (Manhattan)
29
Special Matrices and Vectors
Unit vector: Symmetric Matrix: Orthogonal matrix:
30
Eigendecomposition Eigenvector and eigenvalue:
Eigendecomposition of a diagonalizable matrix: Every real symmetric matrix has a real, orthogonal eigendecomposition:
31
Effect of Eigenvalues
32
Singular Value Decomposition
Similar to eigendecomposition More general; matrix need not be square Stanford lecture
33
Moore-Penrose Pseudoinverse
If the equation has: Exactly one solution: this is the same as the inverse. No solution: this gives us the solution with the smallest error Many solutions: this gives us the solution with the smallest norm of x. Linear Moore-Penrose
34
Computing the Pseudoinverse
The SVD allows the computation of the pseudoinverse: Take reciprocal of non-zero entries
35
Trace
37
Learning Linear Algebra
Do a lot of practice problems Linear Algebra Done Right Linear Algebra for Dummies html Start out with lots of summation signs and indexing into individual entries Code up a few basic matrix operations and compare to worked- out solutions Eventually you will be able to mostly use matrix and vector product notation quickly and easily
38
References This is a variation presentation of Ian Goodfellow’s slides, for Chapter 2 of Deep Learning ( ml) Mention cosine similarity!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.