Multivariate Data and Matrix Algebra Review

Slides:



Advertisements
Similar presentations
3D Geometry for Computer Graphics
Advertisements

1 G Lect 4M Interpreting multiple regression weights: suppression and spuriousness. Partial and semi-partial correlations Multiple regression in.
3_3 An Useful Overview of Matrix Algebra
Eigenvalues and eigenvectors
Computer Graphics Recitation 5.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Linear and generalised linear models
MOHAMMAD IMRAN DEPARTMENT OF APPLIED SCIENCES JAHANGIRABAD EDUCATIONAL GROUP OF INSTITUTES.
Boot Camp in Linear Algebra Joel Barajas Karla L Caballero University of California Silicon Valley Center October 8th, 2008.
Linear regression models in matrix terms. The regression function in matrix terms.
Matrices CS485/685 Computer Vision Dr. George Bebis.
Stats & Linear Models.
Lecture 10A: Matrix Algebra. Matrices: An array of elements Vectors Column vector Row vector Square matrix Dimensionality of a matrix: r x c (rows x columns)
Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012.
Separate multivariate observations
1 Statistical Analysis Professor Lynne Stokes Department of Statistical Science Lecture 5QF Introduction to Vector and Matrix Operations Needed for the.
Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka Virginia de Sa (UCSD) Cogsci 108F Linear.
Compiled By Raj G. Tiwari
Linear Algebra Review 1 CS479/679 Pattern Recognition Dr. George Bebis.
1 February 24 Matrices 3.2 Matrices; Row reduction Standard form of a set of linear equations: Chapter 3 Linear Algebra Matrix of coefficients: Augmented.
Some matrix stuff.
Digital Image Processing, 3rd ed. © 1992–2008 R. C. Gonzalez & R. E. Woods Gonzalez & Woods Matrices and Vectors Objective.
Statistics and Linear Algebra (the real thing). Vector A vector is a rectangular arrangement of number in several rows and one column. A vector is denoted.
Multivariate Statistics Matrix Algebra I W. M. van der Veld University of Amsterdam.
Linear algebra: matrix Eigen-value Problems Eng. Hassan S. Migdadi Part 1.
A Review of Some Fundamental Mathematical and Statistical Concepts UnB Mestrado em Ciências Contábeis Prof. Otávio Medeiros, MSc, PhD.
Introduction to Matrices and Matrix Approach to Simple Linear Regression.
Special Topic: Matrix Algebra and the ANOVA Matrix properties Types of matrices Matrix operations Matrix algebra in Excel Regression using matrices ANOVA.
Review of Matrix Operations Vector: a sequence of elements (the order is important) e.g., x = (2, 1) denotes a vector length = sqrt(2*2+1*1) orientation.
Stats & Summary. The Woodbury Theorem where the inverses.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
Instructor: Mircea Nicolescu Lecture 8 CS 485 / 685 Computer Vision.
Chapter 61 Chapter 7 Review of Matrix Methods Including: Eigen Vectors, Eigen Values, Principle Components, Singular Value Decomposition.
Unsupervised Learning II Feature Extraction
Boot Camp in Linear Algebra TIM 209 Prof. Ram Akella.
1 Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors. Review Matrices.
Matrix Algebra Definitions Operations Matrix algebra is a means of making calculations upon arrays of numbers (or data). Most data sets are matrix-type.
EE611 Deterministic Systems Vector Spaces and Basis Changes Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Matrices Introduction.
MTH108 Business Math I Lecture 20.
Introduction to Vectors and Matrices
MAT 322: LINEAR ALGEBRA.
Linear Algebra Review.
Matrices and Vector Concepts
PREDICT 422: Practical Machine Learning
Linear Algebra review (optional)
Review of Linear Algebra
CS479/679 Pattern Recognition Dr. George Bebis
Review of Matrix Operations
Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.
L5 matrix.
Matrices and Vectors Review Objective
Systems of First Order Linear Equations
Matrices Definition: A matrix is a rectangular array of numbers or symbolic elements In many applications, the rows of a matrix will represent individuals.
Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.
CS485/685 Computer Vision Dr. George Bebis
Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.
Derivative of scalar forms
Chapter 3 Linear Algebra
Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.
Matrix Algebra and Random Vectors
Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors.
Multivariate Statistical Analysis
Symmetric Matrices and Quadratic Forms
Eigen Decomposition Based on the slides by Mani Thomas and book by Gilbert Strang. Modified and extended by Longin Jan Latecki.
Introduction to Vectors and Matrices
Linear Algebra review (optional)
Subject :- Applied Mathematics
Symmetric Matrices and Quadratic Forms
Marios Mattheakis and Pavlos Protopapas
Presentation transcript:

Multivariate Data and Matrix Algebra Review BMTRY 726 5/15/2018

Syllabus Instructor and Contact information: Bethany Wolf 135 Cannon Place Office 302B 876-1940 wolfb@musc.edu Office hours Monday 2-3 or by appointment Grading: Grades will be based on assigned problem sets, a mid-term exam, class participation, and a final project. Problem sets will require active manipulation of datasets provided by the instructor using standard statistical packages (e.g. R and SAS). Class participation will include participation presenting journal articles. The breakdown of contribution to the final course is as follows: Homework assignments: 50% Mid-term exam: 20% Final Project: 20% Class Participation: 10%

A few things before we begin May 24-25th I will be teaching a Summer Institute on Survival Analysis which we will have in lieu of class on the 24th May 29th and May 31st we are in EL115 Class participation discussion

What is ‘Multivariate’ Data? Data in which each sampling unit contributes to more than one outcome. For example…. Sampling Unit Cancer patients Serum concentrations on a panel of protein markers are collected in chemotherapy patients Smoking cessation participants Collect background information and smoking behavior at multiple visits Post-operative patient outcome Multiple measures of how a patient is doing post-operatively: patient self-reported pain, opioid consumption, ICU/Hospital length of stay Diabetics Each subject assigned to different glucose control option (medication, diet, diet and medication). Fasting blood glucose is monitored at 0, 3, 6, 9, 12, and 15 months.

Goals of Multivariate Analysis Data reduction and structural simplification Say we collect p biological markers to examine patient response to chemotherapy. Ideally we might like to summarize patient response as some simple combination of the markers. How can variation in the p markers be summarized?

Goals of Multivariate Analysis Sorting and grouping data Participants are enrolled in a smoking cessation program for several years Information about the background of each subject and smoking behavior at multiple visits Some patients quit while others do not Can we use the background and smoking behavior information to classify those that quit and those that do not in order to screen future participants?

Goals of Multivariate Analysis Investigating dependence among variables Subjects take a standardized test with different categories of questions Sentence completion Number sequences Orientation of patterns Arithmetic (etc.) Can correlation among scores be attributed to variation in one or more unobserved factors? Intelligence Cognitive ability Critical thinking

Goals of Multivariate Analysis Prediction based on relationship between variables We conduct a microarray experiment to compare tumor and healthy tissue We want to develop a reliable classification tool based on the gene expression information from our experiment

Goals of Multivariate Analysis Hypothesis testing Participants in a diabetes study are placed into one of three treatment groups Fasting blood glucose is evaluated at 0, 3, 6, 9, 12, and 15 months We want to test the hypothesis that treatment groups are different.

Multivariate Data Properties What property/ies of multivariate data make commonly used statistical approached inappropriate?

Notation & Data Organization Consider an example where we have 15 tumor markers collected on 30 tissue samples The 15 markers are variables and our samples represent the subjects in the data. These data can most easily be expressed as an 30 by 15 array/matrix

Notation & Data Organization More generally, let i = 1, 2,…, n represent the unique samples And let j = 1, 2,…, p represent a set of variables collected in a study

Random Vectors Each experimental unit has multiple outcome measures thus we can arrange the ith subject’s j = 1, 2,…, p outcomes as a vector. is a random variable as are it’s individual elements p denotes the number of outcomes for subject i i = 1, 2,…, n is the number subjects

Descriptive Statistics We can calculate familiar descriptive statistics for this array Mean Variance Covariance (Correlation)

Arranged as Arrays Means Covariance

Quick Example Find the mean and variance of

Easier in R We can calculate these values in R > A<-matrix(c(1,2,3,3,4,5,2,3,7), nrow=3, ncol=3, byrow=T) > A [,1] [,2] [,3] [1,] 1 2 3 [2,] 3 4 5 [3,] 2 3 7 > colMeans(A) [1] 2 3 5 > var(A) [1,] 1 1 1 [2,] 1 1 1 [3,] 1 1 4

Distance Many multivariate statistics are based on the idea of distance For example, if we are comparing two groups we might look at the difference in their means Euclidean distance

Concept of Euclidean Distance Start with distance from the origin for 2-dimensions What about a p dimensional point? What about between two p dimensional points?

Distance But why is Euclidean distance inappropriate in statistics? This leads us to the idea of statistical distance Consider a case where we have two measures

Statistical Distance Consider a case where we have two measures

Statistical Distance Consider a case where we have two measures

Statistical Distance Our expression of statistical distance can be generalized to p variables to any fixed set of points

Now onto some linear algebra basics…

Basic Matrix Operations Can I add A2x3 and B3x3? What is the product of matrix A and scalar c? When can I multiply the two matrices A and B?

Matrix Transposes The transpose of an n x m matrix A, denoted as A’, is an m x n matrix whose ijth element is the jith element of A Properties of a transpose:

Quick Examples: Matrix Transposes Consider the two matrices

Types of Matrices Square matrix: Idempotent: Symmetric: A square matrix is diagonal :

More Definitions An n x n matrix A is nonsingular if there exists an matrix Bn x n such that B is the multiplicative inverse of A and can be written as A square matrix with no multiplicative inverse is said to be…. We can calculate the inverse of a matrix assuming one exists but it is tedious (let the computer do it).

Finding an Inverse in R We can find inverses by hand, but in most cases it is tedious (I won’t ask you to) Instead use R (base package): > A<-matrix(c(1,2,-3,-1,1,-1,0,-2,3), nrow=3, ncol=3, byrow=T) > B<-solve(A); B [,1] [,2] [,3] [1,] 1 0 1 [2,] 3 3 4 [3,] 2 2 3 > A%*%B # Just a check to show this is the inverse [,1] [,2] [,3] [1,] 1.000000e+00 0 0.000000e+00 [2,] -4.440892e-16 1 -4.440892e-16 [3,] 0.000000e+00 0 1.000000e+00

Matrix Determinant The determinant of a square matrix A is a scalar given by What is the determinant of

Matrix Determinant What about the determinant of the 3x3 matrix?

Matrix Determinant Using this result what is the determinant of

Easier in R… We can calculate the determinant of a matrix in R using functions in the base package > A<-matrix(c(1,4,0,2,2,1,-1,3,0), nrow=3, ncol=3, byrow=T) > det(A) [1] -7 Note, R will also give you an error if you try to calculate it for a non-square matrix

A Little on Vectors The inner product of two vectors is useful in statistics Think about this in terms of linear regression…

Orthogonal an Orthonormal vectors A collection of m-dimensional vectors, x1, x2,…, xp are orthogonal if… The collection of vectors is said to be orthonormal if what 2 conditions are met?

Linear Dependence The p of m-dimensional vectors, , are linearly dependent if there is a set of constants, c1,c2,…,cp not all zero for which

Linear Dependence Conversely, if no such set of non-zero constants exists, the vectors are linearly independent.

Rank of a Matrix Row rank is the number of rows Column rank is the number of cols Find the column rank of

Rank of a Matrix How are row and column rank related? If a matrix is not square, what is the maximum rank a matrix can have? If the rank of Amxn is min(m, n), then A is said to be full rank What does rank tell us about linear dependence of the vectors that make up the matrix?

Orthogonal Matrices A square matrix Anxn is said to be orthogonal if its columns form an orthonormal set. This can be easily be determined by showing that

Eigenvalues and Eigenvectors The eigenvalues of an Anxn matrix are scalar values that are the solutions to for a set of eigenvectors, . We typically normalize so that

Example: Eigen Values Find the eigenvalues for

Example: Eigen Vectors Find the first eigenvector for

Quadratic Forms Given a symmetric matrix Anxn and an n-dimensional vector x, The scalar quantity is referred to as a quadratic form. For example, the expression is a quadratic form for some matrix A where x is a vector

Positive Definite Matrices A symmetric matrix A is said to be positive definite if this implies

Spectral Decomposition We can use eigenvalues and vectors to yield the spectral decomposition of a symmetric matrix A Using the spectral decomposition and quadratic forms, we can then show that a symmetric matrix is positive definite.

Quadratic Forms, Spectral Decomposition, and Positive Definite Matrices Given the quadratic form, show A is positive definite

Positive Definite Matrices A real symmetric matrix is:

Trace Let A be an nxn matrix, the trace of A is given by Properties of the trace:

Back to Random Vectors Define Y as a random vector Then the population mean vector is:

Random Vectors Cont’d So Yj is a random variable whose mean and variance can be expressed by:

Covariance of Random Vectors We then define the covariance between the jth and kth trait in Y as Yielding the covariance matrix

Correlation Matrix of Y The correlation matrix for Y is

Properties of a Covariance Matrix is symmetric (i.e. sij = sji for all i,j) is positive semi-definite for any vector of constants

Linear Combinations Consider linear combinations of the elements of Y If Y has mean m and covariance S, then

Linear Combinations Cont’d If S is not positive definite then for at least one

Next Time We will start discussing properties of the multivariate normal distribution…