Formal Computational Skills Matrices 1. Overview Motivation: many mathematical uses eg Writing networks operations Solving linear equations Calculating.

Formal Computational Skills Matrices 1

Overview Motivation: many mathematical uses eg Writing networks operations Solving linear equations Calculating transformations Changing coordinate systems By the end you should: Be able to add/subtract/multiply 2 matrices Be able to add/subtract/multiply 2 vectors Use matrix inverses to solve linear equations Write network operations with matrices Advanced topics - will also discuss: Matrices as transformations Eigenvectors and eigenvalues of a matrix

Today’s Topics Matrix/Vector Basics Matrix definitions (square matrix, identity etc) Matrix addition/subtraction/multiplication Matrix inverse Vector definitions Vectors as geometric objects Tomorrow’s Topics Uses of Matrices Matrices as sets of linear equations Networks as matrices Solving sets of linear equations (Briefly) Matrix operations as transformations (Briefly) Eigenvectors and eigenvalues of a matrix

Matrices A matrix W is a 2d set of m x n numbers (often denoted by a capital (and sometimes bold) letter eg is a 2 x 3 matrix W has m rows and n columns and we say W has dimensions m x n Each number or element in W is usually represented by a small w ij and indexed with the row and column it is in ie w ij is in the i’th row and j’th column 2 eg p 12 = 2

What is w 32 + w 21 ? Is it a) 4 b) 5 c) 6 What are the dimensions of L? 3 x 4 or 4 x 3 … Answer:4 x 3 Note in both (all) cases indexed as row then column 3 w 32 = 3 2 w 21 = 2 Answer:b) 5

Square and Diagonal Matrices A diagonal matrix D is a square matrix where all off-diagonal elements (ie elements where i is not equal to j) equal zero (where the upside-down A means “for all”) eg Mathematically we say: If m = n, say W is a square matrix eg

Identity Matrix The identity matrix I is a diagonal matrix where all the diagonal elements are 1 It is ALWAYS represented by a capital I and it plays the role of the number 1 in multiplication and ‘division’ If you multiply a matrix by I you get that matrix: I A = A I = A just as if you multiply a number by 1: 1 x 1.6 = 1.6 x 1 = 1.6 MATHS FACT: all maths ‘objects’ obey the same rules. One is they have to have a 1 (known as identity) which a) leaves things unchanged when you multiply by it b) Is what you get when you divide something by itself

Matrix Transpose the transpose of a matrix P is written as P T and is P with the rows and columns swapped round eg so [P T ] ij = [P] ji = p ji Qu: if P has dimensions m x n what dimensions does P T have? Answer: n x m One useful equation: (AB) T = B T A T

Addition/Subtraction Adding/subtracting a scalar k (ie a constant): A = W + k = k+W means W with k added to each element ie a ij = w ij + k ie if C = A + B then c ij = a ij + b ij Matrix plus/minus a matrix: both must be of the same size and we add each element together (point-by-point or point-wise operation) eg

Matrix-Constant Multiplication if A=Bk = kB, where k is a constant: multiply all elements in B by k ie a ij = kb ij

Matrix-Matrix Multiplication if A=BC, to get the value in the i’th row and j’th column in A, take the i’th row in B, multiply each element by the corresponding element in the j’th column in C, and add them together n columns n rows so inner dimensions must match ie  m x p (n x p)(m x n) Formula: p columns m rows

Matrix Inverses Given the complications of matrix multiplication, what about division?!?! Use a matrix inverse: Inverse of A is written A -1 Analogy to numbers – inverse of 2 is 2 -1 = ½. Just as: 2 x 2 -1 = 2 -1 x 2 = 1, A A -1 = A -1 A = I (the identity matrix) This defines the inverse ie: if AB = I and BA = I then B = A -1

It is used for division since dividing by 2 is like multiplying by 2 -1 ie If : y = 2z, then: 2 -1 y = 2 -1 2z = 1z = z. Similarly if: Y = WX then: W -1 Y = W -1 WX = I X = X more later... Calculating A -1 is a pain: use a computer. Note however, that to do it, need to know the determinant of A which is written as |A|: analogous to ‘size’ of a matrix Now try questions 1-4 off the work sheet.

Finally, often said matrix multiplication is associative but not commutative. What does this mean? A 2 = AA but how about A 1/2 ?? Powers and Things A(BC) = (AB)C (associative) but, in general: (not commutative) There are ways but we shouldn’t worry apart from special case of diagonal matrices which are easy

Vectors A vector is a special matrix where m or n is 1. Column vector if n is 1: Strictly (and I’m pretty strict) all vectors are column vectors so unless told otherwise assume a column vector so if you see v T then it is a row vector Row vector if m is 1: v = (1, 2, 3) Usually denoted by a small letter, sometimes bold v, sometimes underlined v, sometimes bold and underlined v and sometimes with an arrow over it: and sometimes just with a square-ish font. I’m going to (pretty much) stick with an underlined letter

A vector with n elements v = (v 1, v 2, …, v n ) T is said to be n- dimensional and is a point or position ie an object in nD space. v 1 = (1, 6) T 6 2 1 3 v 2 = (3, 2) T x x Here have 2D vectors and elements specify the coordinates By convention, in 2D 1st element refers to the horizontal (x-) axis and 2 nd to the vertical (y-) axis Vectors have both direction and length and so often used for eg speed (think of arrows representing wind on weather maps) Often visualised as a line/arrow joining the origin and that point

As they are matrices, vectors follow all rules of matrices so are eg added and subtracted in a point-wise way v 3 = v 1 + v 2 = (1, 6) T + (3, 2) T = (1+3, 6+2) T = (4, 8) T Vector Addition/Subtraction v1v1 However it can also be viewed geometrically: 6 2 1 3 v2v2 v 3 = v 1 + v 2 v2v2 If you were at v 1 then added v 2 on to the end you would be at v 3 = (4, 8) T

v1v1 Can also use this to get the vector between 2 vectors 6 2 1 3 v3v3 Geometrically, subtracting a vector is the equivalent of going backwards along the arrow so: v 3 - v 1 = v 2 Then forwards along v 3 (+v 3 ) Get to v 3 from v 1 by going backwards along v 1 (ie - v 1 ) - v 1 v3v3 – v 1 = v 2 u = – v 1 + v 3 eg what is vector u from v 1 to v 3 ?

At any point in the path, the sum of the vectors points to current position so minus the sum points back to the start Rather than have to ‘remember’ all the vectors of the path, just keep adding the last bit walked to a ‘home’ vector. Then return home directly by following minus the home vector Organisms (ants, us etc) use this for path integration to get back home: Imagine a path of (random) vectors out to some goal

Length of Vectors Length of v written |v| (or ||v||) From Pythagoras: v = (3, 4) T 4 3 Similarly if: and in general: Note if |v| = 1 v is known as a unit vector As the vector between u and v is v - u: distance between 2 points is:

Vector Multiplication Like matrices, but can be viewed geometrically 3v = (3, 6) T 6 3 Standard vector vector multiplication: vectors must be the same length, 1 st one a row vector, 2 nd a column vector eg and in general: v Note v T v = |v| 2 since eg Eg multiplying by a constant k makes v k times longer 3v = 3(1, 2) T = (3, 6) T Vector vector multiplication also known as the dot product (u.v) and the inner product. Result is a single number 1 2

What does inner product mean?? Question: What is an angle in 10 dimensions? Also can be interpreted as the projection of one vector onto another (usually unit) vector (cf principal component analysis) u v t u, (|u|=1) v t |v| cos(t) So u T v = 0 if vectors are orthogonal (perpendicular t=90, cos(t)=0) In n-dimensions use the dot product to define the angle between 2 vectors u T v = |u| |v| cos(t) where t is the angle between the 2 vectors uv and maximised if vectors are parallel (t=0 so cos(t) = 1) u v

Summary of Main Points How to multiply 2 matrices What a matrix inverse is How adding vectors can be interpreted geometrically How to calculate vector length |v| and distance between 2 vectors v T v = |v| 2 u T v = |u||v|cos(angle between vectors) To project u onto v if |v|=1 do u T v

Formal Computational Skills Matrices 2

Yesterday’s Topics Matrix/Vector Basics Matrix definitions (square matrix, identity etc) Matrix addition/subtraction/multiplication Matrix inverse Vector definitions Vectors as geometric objects Today’s Topics Uses of Matrices Matrices as sets of equations Networks written as matrices Solving sets of linear equations (Briefly) Matrix operations as transformations (Briefly) Eigenvectors and eigenvalues of a matrix

Equations as Matrices Suppose we have the following: 2x 1 + 3x 2 + 4x 3 = y 1 Or, using vectors: w T =(w 1, w 2, w 3 ) and x = get: w T x = y 1 Can write this as a matrix operation: Similarly, can write: w 1 x 1 + w 2 x 2 + w 3 x 3 = y 1 as Bit more concise but not great

Sets of Equations as Matrices However, suppose we have several equations involving x eg 2x 1 + 3x 2 + 4x 3 = y 1 4x 1 + 3x 2 + 8x 3 = y 2 This becomes: where y = Or: Similarly: w 11 x 1 + w 12 x 2 + w 13 x 3 = y 1 w 21 x 1 + w 22 x 2 + w 23 x 3 = y 2 becomes: Or: Wx = y where:

Matrices as Neural Networks Will encounter this notation when dealing with Artificial Neural Networks (ANNs) x1x1 x3x3 w2w2 y = w 1 x 1 + w 2 x 2 + w 3 x 3 x2x2  w3w3 w1w1 into output vectors (y) Comes from connectionist picture of the brain as electrical impulses travelling along axons, modulated by synaptic strengths and being summed at synapses (x)(x) via a set (matrix) of numbers known as weights (W) associated with connections from inputs to outputs Can think of networks as functions which transforms input vectors

Above ANN takes 3D input vector x T =(x 1, x 2, x 3 ). x1x1 x3x3 w2w2 y = w 1 x 1 + w 2 x 2 + w 3 x 3 x2x2  w3w3 w1w1 Thus sum is the same as if we multiplied w and x so can write output as It therefore has 3 connections from output to input each with an associated weight. Thus W is a 3d vector w T =(w 1, w 2, w 3 )). Inputs travel along connections and are multiplied by weights and summed to give the (1D) output y (say y is a weighted sum of the inputs).

If we have more than one output, need a matrix of weights W Represent all weights by matrix W where each weight vector is a row of W and the ij’th element of W is the weight w ij y2y2 x1x1 x3x3 w 22 y1y1 x2x2  w 13 w 11 w 21  w 23 w 12 Effectively have one weight vector for each output

Each output is a weighted sum of input and corresponding weight vector ie a row of W multiplied by x y 2 = w 21 x 1 + w 22 x 2 + w 23 x 3 = x1x1 x3x3 w 22 y 1 = w 11 x 1 + w 12 x 2 + w 13 x 3 = x2x2  w 13 w 11 w 21  w 23 w 12 Note weights indexed (oddly) as w to from so that the matrix multiplication works Thus writing output as a vector y network operation is: Wx = y So for n-dimensional input data i’th output is ie

Finally, suppose we have many input vectors x i = (x 1i, x 2i, x 3i ) T Since we know that a single output y = Wx we have: Each input generates a different output vector. Thus make a matrix Y where the i’th column is the output due to the application of the i’th input vector to the network Make a matrix X where the i’th column of X is the i’th input vector Might not seem like a great leap but very convenient for mathematical manipulation (and matlab programming) eg …

Solving Linear Equations Suppose we have to solve: After some (tedious) calculations solve to get: x 1 = -2, x 2 = 1. x 1 + 2x 2 = 0 3x 1 + 7x 2 = 1 Instead write as: Wx = y where: and: Giving: then solve the equations by multiplying both sides by W -1 since: So if Wx = y solve via x = W -1 y using computer.... But not always so simple … W -1 Wx = I x = x = W -1 y

Sometimes W -1 does not exist ie W is singular (or near-singular). Why? Problem could be underdetermined eg  (usually) no unique solution [eg x 1 =1, x 2 =6 or x 1 =2, x 2 =4 etc] Or one row is a linear combination (ie made up out of) of the others (Row 2 = 2 x Row 1) (Row 3 = Row 1+Row 2) Same problem occurs if one equation is written twice Problem is that we need more data (number of unknowns = number of equations/bits of info needed).

Or problem could be overdetermined eg more outputs than inputs: contradictory solutions (x=2 and x=1) For W -1 to exist, W must be square and its rows must not duplicate info ie they must be linearly independent In networks, used to find weights for Radial Basis Function networks and single layer networks (eg Bishop p.92-95) This is often not the case so to avoid problems we use the pseudoinverse of W If the problem is underdetermined finds the solution with the smallest sum of squares of elements of x and if overdetermined, finds an approximate solution (What sort? could be investigated…)

Matrix Vector Multiplication: Matrices as Transformations If the dimensionality is right as above, can view a matrix vector multiplication as a transformation of one vector into another If U is diagonal. Get expansion/contraction along the axes x x x x 5v 1 2v 2 x x x x v1v1 v2v2 etc x x eg

Get a rotation anticlockwise thru t by: So we see that (1,0) T goes to (cos t, sin t) T In general, transformations produced by matrix multiplication can be broken down into a rotation followed by an expansion-contraction followed by another rotation (1,0) T (0,1) T (cos t,sin t) T (-sin t, cos t) T and (0, 1) T goes to (-sin t, cos t) T

Eigenvectors and Eigenvalues If : Ax = x For some scalar not = 0, then we say that x is an eigenvector of A with eigenvalue. Turns out eigenvectors are VERY useful Clearly, x is not unique since if: Ax = x, then: A2x = 2Ax = 2 x = 2x so it is usual to scale x so that it has length = 1.

Intuition: direction of x is unchanged by being transformed by A so it in some sense reflects the principal direction (or axis) of the transformation Starting from an eigenvector, x however, get: x x x x xx ie Repeatedly transform v by A. Start at v then Av then AAv= A 2 v Most starting points result in curved trajectories etc … Ax = x,A 2 x = 2 x, A 4 x = 4 x, … A 3 x = 3 x, So trajectory is a straight line x x x x Note if | | > 1, x expands. If not, will contract

Eigenvector Facts true if A is the covariance matrix and many other important matrices, (unit length) eigenvectors will be orthogonal (say they are orthonormal): x i T x j = 1 if i = j x i T x j = 0 else ie a ij =a ji If A is symmetric eg: This means that the eigenvectors form a set of basis vectors and any vector can be expressed as a linear sum of the eigenvectors ie they form a new rotated co-ordinate system If the data is d-dimensional there will be d eigenvectors

Summary of Main Points When dealing with networks, Wx = y means “outputs y are weighted sum of input x. That is: y is the network output given input x” Similarly, WX = Y means “each column of Y is the network output operating on the corresponding column of X” Matrix inverse can be used to solve linear equations, but pseudoinverse is more robust In networks, use pseudoinverse to calculate ‘best’ weights that will transform training input vectors into known target vectors Matrix vector multiplication can be seen as a transformation (eg rotations and expansions) If Ax = x x is an eigenvector, with eigenvalue Eigenvectors and eigenvalues tell us about main axes and behaviour of matrix transformations

Formal Computational Skills Matrices 1. Overview Motivation: many mathematical uses eg Writing networks operations Solving linear equations Calculating.

Similar presentations

Presentation on theme: "Formal Computational Skills Matrices 1. Overview Motivation: many mathematical uses eg Writing networks operations Solving linear equations Calculating."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Formal Computational Skills Matrices 1. Overview Motivation: many mathematical uses eg Writing networks operations Solving linear equations Calculating.

Similar presentations

Presentation on theme: "Formal Computational Skills Matrices 1. Overview Motivation: many mathematical uses eg Writing networks operations Solving linear equations Calculating."— Presentation transcript:

Similar presentations

About project

Feedback