Multivariate Data and Matrix Algebra Review

Multivariate Data and Matrix Algebra Review
BMTRY 726 5/15/2018

Syllabus Instructor and Contact information: Bethany Wolf 135 Cannon Place Office 302B Office hours Monday 2-3 or by appointment Grading: Grades will be based on assigned problem sets, a mid-term exam, class participation, and a final project. Problem sets will require active manipulation of datasets provided by the instructor using standard statistical packages (e.g. R and SAS). Class participation will include participation presenting journal articles. The breakdown of contribution to the final course is as follows: Homework assignments: 50% Mid-term exam: 20% Final Project: 20% Class Participation: 10%

A few things before we begin
May 24-25th I will be teaching a Summer Institute on Survival Analysis which we will have in lieu of class on the 24th May 29th and May 31st we are in EL115 Class participation discussion

What is ‘Multivariate’ Data?
Data in which each sampling unit contributes to more than one outcome. For example…. Sampling Unit Cancer patients Serum concentrations on a panel of protein markers are collected in chemotherapy patients Smoking cessation participants Collect background information and smoking behavior at multiple visits Post-operative patient outcome Multiple measures of how a patient is doing post-operatively: patient self-reported pain, opioid consumption, ICU/Hospital length of stay Diabetics Each subject assigned to different glucose control option (medication, diet, diet and medication). Fasting blood glucose is monitored at 0, 3, 6, 9, 12, and 15 months.

Goals of Multivariate Analysis
Data reduction and structural simplification Say we collect p biological markers to examine patient response to chemotherapy. Ideally we might like to summarize patient response as some simple combination of the markers. How can variation in the p markers be summarized?

Sorting and grouping data Participants are enrolled in a smoking cessation program for several years Information about the background of each subject and smoking behavior at multiple visits Some patients quit while others do not Can we use the background and smoking behavior information to classify those that quit and those that do not in order to screen future participants?

Investigating dependence among variables Subjects take a standardized test with different categories of questions Sentence completion Number sequences Orientation of patterns Arithmetic (etc.) Can correlation among scores be attributed to variation in one or more unobserved factors? Intelligence Cognitive ability Critical thinking

Prediction based on relationship between variables We conduct a microarray experiment to compare tumor and healthy tissue We want to develop a reliable classification tool based on the gene expression information from our experiment

Hypothesis testing Participants in a diabetes study are placed into one of three treatment groups Fasting blood glucose is evaluated at 0, 3, 6, 9, 12, and 15 months We want to test the hypothesis that treatment groups are different.

Multivariate Data Properties
What property/ies of multivariate data make commonly used statistical approached inappropriate?

Notation & Data Organization
Consider an example where we have 15 tumor markers collected on 30 tissue samples The 15 markers are variables and our samples represent the subjects in the data. These data can most easily be expressed as an 30 by 15 array/matrix

Notation & Data Organization
More generally, let i = 1, 2,…, n represent the unique samples And let j = 1, 2,…, p represent a set of variables collected in a study

Random Vectors Each experimental unit has multiple outcome measures thus we can arrange the ith subject’s j = 1, 2,…, p outcomes as a vector. is a random variable as are it’s individual elements p denotes the number of outcomes for subject i i = 1, 2,…, n is the number subjects

Descriptive Statistics
We can calculate familiar descriptive statistics for this array Mean Variance Covariance (Correlation)

Arranged as Arrays Means Covariance

Quick Example Find the mean and variance of

Easier in R We can calculate these values in R
> A<-matrix(c(1,2,3,3,4,5,2,3,7), nrow=3, ncol=3, byrow=T) > A [,1] [,2] [,3] [1,] [2,] [3,] > colMeans(A) [1] 2 3 5 > var(A) [1,] [2,] [3,]

Distance Many multivariate statistics are based on the idea of distance For example, if we are comparing two groups we might look at the difference in their means Euclidean distance

Concept of Euclidean Distance
Start with distance from the origin for 2-dimensions What about a p dimensional point? What about between two p dimensional points?

Distance But why is Euclidean distance inappropriate in statistics?
This leads us to the idea of statistical distance Consider a case where we have two measures

Statistical Distance Consider a case where we have two measures

Statistical Distance Our expression of statistical distance can be generalized to p variables to any fixed set of points

Now onto some linear algebra basics…

Basic Matrix Operations
Can I add A2x3 and B3x3? What is the product of matrix A and scalar c? When can I multiply the two matrices A and B?

Matrix Transposes The transpose of an n x m matrix A, denoted as A’, is an m x n matrix whose ijth element is the jith element of A Properties of a transpose:

Quick Examples: Matrix Transposes
Consider the two matrices

Types of Matrices Square matrix: Idempotent: Symmetric:
A square matrix is diagonal :

More Definitions An n x n matrix A is nonsingular if there exists an matrix Bn x n such that B is the multiplicative inverse of A and can be written as A square matrix with no multiplicative inverse is said to be…. We can calculate the inverse of a matrix assuming one exists but it is tedious (let the computer do it).

Finding an Inverse in R We can find inverses by hand, but in most cases it is tedious (I won’t ask you to) Instead use R (base package): > A<-matrix(c(1,2,-3,-1,1,-1,0,-2,3), nrow=3, ncol=3, byrow=T) > B<-solve(A); B [,1] [,2] [,3] [1,] [2,] [3,] > A%*%B # Just a check to show this is the inverse [,1] [,2] [,3] [1,] e e+00 [2,] e e-16 [3,] e e+00

Matrix Determinant The determinant of a square matrix A is a scalar given by What is the determinant of

Matrix Determinant What about the determinant of the 3x3 matrix?

Matrix Determinant Using this result what is the determinant of

Easier in R… We can calculate the determinant of a matrix in R using functions in the base package > A<-matrix(c(1,4,0,2,2,1,-1,3,0), nrow=3, ncol=3, byrow=T) > det(A) [1] -7 Note, R will also give you an error if you try to calculate it for a non-square matrix

A Little on Vectors The inner product of two vectors is useful in statistics Think about this in terms of linear regression…

Orthogonal an Orthonormal vectors
A collection of m-dimensional vectors, x1, x2,…, xp are orthogonal if… The collection of vectors is said to be orthonormal if what 2 conditions are met?

Linear Dependence The p of m-dimensional vectors, , are linearly dependent if there is a set of constants, c1,c2,…,cp not all zero for which

Linear Dependence Conversely, if no such set of non-zero constants exists, the vectors are linearly independent.

Rank of a Matrix Row rank is the number of rows
Column rank is the number of cols Find the column rank of

Rank of a Matrix How are row and column rank related?
If a matrix is not square, what is the maximum rank a matrix can have? If the rank of Amxn is min(m, n), then A is said to be full rank What does rank tell us about linear dependence of the vectors that make up the matrix?

Orthogonal Matrices A square matrix Anxn is said to be orthogonal if its columns form an orthonormal set. This can be easily be determined by showing that

Eigenvalues and Eigenvectors
The eigenvalues of an Anxn matrix are scalar values that are the solutions to for a set of eigenvectors, We typically normalize so that

Example: Eigen Values Find the eigenvalues for

Example: Eigen Vectors
Find the first eigenvector for

Quadratic Forms Given a symmetric matrix Anxn and an n-dimensional vector x, The scalar quantity is referred to as a quadratic form. For example, the expression is a quadratic form for some matrix A where x is a vector

Positive Definite Matrices
A symmetric matrix A is said to be positive definite if this implies

Spectral Decomposition
We can use eigenvalues and vectors to yield the spectral decomposition of a symmetric matrix A Using the spectral decomposition and quadratic forms, we can then show that a symmetric matrix is positive definite.

Quadratic Forms, Spectral Decomposition, and Positive Definite Matrices
Given the quadratic form, show A is positive definite

Positive Definite Matrices
A real symmetric matrix is:

Trace Let A be an nxn matrix, the trace of A is given by
Properties of the trace:

Back to Random Vectors Define Y as a random vector
Then the population mean vector is:

Random Vectors Cont’d So Yj is a random variable whose mean and variance can be expressed by:

Covariance of Random Vectors
We then define the covariance between the jth and kth trait in Y as Yielding the covariance matrix

Correlation Matrix of Y
The correlation matrix for Y is

Properties of a Covariance Matrix
is symmetric (i.e. sij = sji for all i,j) is positive semi-definite for any vector of constants

Linear Combinations Consider linear combinations of the elements of Y
If Y has mean m and covariance S, then

Linear Combinations Cont’d
If S is not positive definite then for at least one

Next Time We will start discussing properties of the multivariate normal distribution…

Multivariate Data and Matrix Algebra Review

Similar presentations

Presentation on theme: "Multivariate Data and Matrix Algebra Review"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Multivariate Data and Matrix Algebra Review

Similar presentations

Presentation on theme: "Multivariate Data and Matrix Algebra Review"— Presentation transcript:

Similar presentations

About project

Feedback