Correlation. The sample covariance matrix: where.

Slides:



Advertisements
Similar presentations
Canonical Correlation
Advertisements

1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
The General Linear Model. The Simple Linear Model Linear Regression.
Multivariate distributions. The Normal distribution.
An introduction to Principal Component Analysis (PCA)
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Principal Component Analysis
Chapter 4 Multiple Regression.
Canonical Correlation: Equations Psy 524 Andrew Ainsworth.
Probability theory 2011 The multivariate normal distribution  Characterizing properties of the univariate normal distribution  Different definitions.
Canonical correlations
Ch 7.3: Systems of Linear Equations, Linear Independence, Eigenvalues
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.
Simple Linear Regression Analysis
Probability theory 2008 Outline of lecture 5 The multivariate normal distribution  Characterizing properties of the univariate normal distribution  Different.
Techniques for studying correlation and covariance structure
The Multivariate Normal Distribution, Part 2 BMTRY 726 1/14/2014.
Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012.
Review Guess the correlation. A.-2.0 B.-0.9 C.-0.1 D.0.1 E.0.9.
Maximum Likelihood Estimation
Lecture 5 Correlation and Regression
Summarized by Soo-Jin Kim
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 
PHY 301: MATH AND NUM TECH Chapter 5: Linear Algebra Applications I.Homogeneous Linear Equations II.Non-homogeneous equation III.Eigen value problem.
Multivariate Analysis of Variance (MANOVA). Outline Purpose and logic : page 3 Purpose and logic : page 3 Hypothesis testing : page 6 Hypothesis testing.
Profile Analysis. Definition Let X 1, X 2, …, X p denote p jointly distributed variables under study Let  1,  2, …,  p denote the means of these variables.
CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.
The Multiple Correlation Coefficient. has (p +1)-variate Normal distribution with mean vector and Covariance matrix We are interested if the variable.
Repeated Measures Designs. In a Repeated Measures Design We have experimental units that may be grouped according to one or several factors (the grouping.
Some matrix stuff.
Marginal and Conditional distributions. Theorem: (Marginal distributions for the Multivariate Normal distribution) have p-variate Normal distribution.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Multivariate Statistics Matrix Algebra I W. M. van der Veld University of Amsterdam.
Vector Norms and the related Matrix Norms. Properties of a Vector Norm: Euclidean Vector Norm: Riemannian metric:
1 Inferences About The Pearson Correlation Coefficient.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
1 Sample Geometry and Random Sampling Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking.
Repeated Measures Designs. In a Repeated Measures Design We have experimental units that may be grouped according to one or several factors (the grouping.
Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.
Chapter 7 Multivariate techniques with text Parallel embedded system design lab 이청용.
Lecture 26 Molecular orbital theory II
1 Matrix Algebra and Random Vectors Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking.
Multivariate Analysis of Variance
Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 
M.Sc. in Economics Econometrics Module I Topic 4: Maximum Likelihood Estimation Carol Newman.
Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.
1.7 Linear Independence. in R n is said to be linearly independent if has only the trivial solution. in R n is said to be linearly dependent if there.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
Summary of the Statistics used in Multiple Regression.
1 Canonical Correlation Analysis Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking.
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L11.1 Lecture 11: Canonical correlation analysis (CANCOR)
Chapter 10 Canonical Correlation Analysis. Introduction Canonical correlation analysis focuses on the correlation between a linear combination of the.
Computacion Inteligente Least-Square Methods for System Identification.
Canonical Correlation Analysis (CCA). CCA This is it! The mother of all linear statistical analysis When ? We want to find a structural relation between.
Lecture 2 Survey Data Analysis Principal Component Analysis Factor Analysis Exemplified by SPSS Taylan Mavruk.
Inference about the slope parameter and correlation
Principal Component Analysis
Factor Analysis An Alternative technique for studying correlation and covariance structure.
Section 4.1 Eigenvalues and Eigenvectors
Inference for the mean vector
Quantitative Methods Simple Regression.
Techniques for studying correlation and covariance structure
Matrix Algebra and Random Vectors
Factor Analysis An Alternative technique for studying correlation and covariance structure.
数据的矩阵描述.
Principal Component Analysis
Chapter-1 Multivariate Normal Distributions
Presentation transcript:

Correlation

The sample covariance matrix: where

The sample correlation matrix: where

Note: where

Tests for Independence and Non-zero correlation

Tests for Independence The test statistic If independence is true then the test statistic t will have a t - distributions with = n –2 degrees of freedom. The test is to reject independence if: Test for zero correlation (Independence between a two variables)

The test statistic If H 0 is true the test statistic z will have approximately a Standard Normal distribution Test for non-zero correlation (H 0 :    We then reject H 0 if:

Partial Correlation Conditional Independence

Recall has p-variate Normal distribution with mean vector and Covariance matrix Then the conditional distribution of given is q i -variate Normal distribution with mean vector and Covariance matrix

is called the matrix of partial variances and covariances. is called the partial covariance (variance if i = j) between x i and x j given x 1, …, x q. is called the partial correlation between x i and x j given x 1, …, x q.

Let denote the sample Covariance matrix Let is called the sample partial covariance (variance if i = j) between x i and x j given x 1, …, x q.

Also is called the sample partial correlation between x i and x j given x 1, …, x q.

The test statistic If independence is true then the test statistic t will have a t - distributions with = n – p - 2 degrees of freedom. The test is to reject independence if: Test for zero partial correlation correlation (Conditional independence between a two variables given a set of p Independent variables) = the partial correlation between y i and y j given x 1, …, x p.

The test statistic If H 0 is true the test statistic z will have approximately a Standard Normal distribution Test for non-zero partial correlation We then reject H 0 if:

The Multiple Correlation Coefficient Testing independence between a single variable and a group of variables

has (p +1)-variate Normal distribution with mean vector and Covariance matrix We are interested if the variable y is independent of the vector Definition The multiple correlation coefficient is the maximum correlation between y and a linear combination of the components of

This vector has a bivariate Normal distribution with mean vector and Covariance matrix We are interested if the variable y is independent of the vector Derivation The multiple correlation coefficient is the maximum correlation between y and a linear combination of the components of

Thus we want to choose to maximize The multiple correlation coefficient is the maximum correlation between y and The correlation between y and Equivalently

Note:

The multiple correlation coefficient is independent of the value of k.

We are interested if the variable y is independent of the vector The sample Multiple correlation coefficient Then the sample Multiple correlation coefficient is

Testing for independence between y and The test statistic If independence is true then the test statistic F will have an F- distributions with 1 = p degrees of freedom in the numerator and 1 = n – p + 1 degrees of freedom in the denominator The test is to reject independence if:

Canonical Correlation Analysis

The problem Quite often when one has collected data on several variables. The variables are grouped into two (or more) sets of variables and the researcher is interested in whether one set of variables is independent of the other set. In addition if it is found that the two sets of variates are dependent, it is then important to describe and understand the nature of this dependence. The appropriate statistical procedure in this case is called Canonical Correlation Analysis.

Canonical Correlation: An Example In the following study the researcher was interested in whether specific instructions on how to relax when taking tests and how to increase Motivation, would affect performance on standardized achievement tests Reading, Language and Mathematics

A group of 65 third- and fourth-grade students were rated after the instruction and immediately prior taking the Scholastic Achievement tests on: In addition data was collected on the three achievement tests how relaxed they were (X 1 ) and how motivated they were (X 2 ). Reading (Y 1 ), Language (Y 2 ) and Mathematics (Y 3 ). The data were tabulated on the next page

Definition: (Canonical variates and Canonical correlations) have p-variate Normal distribution with and Let be such that U 1 and V 1 have achieved the maximum correlation  1. and Then U 1 and V 1 are called the first pair of canonical variates and  1 is called the first canonical correlation coefficient.

derivation: ( 1 st pair of Canonical variates and Canonical correlation) has covariance matrixThus Now

derivation: ( 1 st pair of Canonical variates and Canonical correlation) has covariance matrixThus Now hence

Thus we want to choose is at a maximum so that is at a maximum or Let

Computing derivatives and

Thus This shows thatis an eigenvector of k is the largest eigenvalue of andis the eigenvector associated with the largest eigenvalue.

Also and

Summary: are found by finding, eigenvectors of the matrices associated with the largest eigenvalue (same for both matrices) The first pair of canonical variates The largest eigenvalue of the two matrices is the square of the first canonical correlation coefficient  1

Note: then have exactly the same eigenvalues (same for both matrices) Proof: and

The remaining canonical variates and canonical correlation coefficients are found by finding, so that 1. (U 2,V 2 ) are independent of (U 1,V 1 ). The second pair of canonical variates 2. The correlation between U 2 and V 2 is maximized The correlation,  2, between U 2 and V 2 is called the second canonical correlation coefficient.

are found by finding, so that 1. (U i,V i ) are independent of (U 1,V 1 ), …, (U i-1,V i-1 ). The i th pair of canonical variates 2. The correlation between U i and V i is maximized The correlation,  2, between U 2 and V 2 is called the second canonical correlation coefficient.

derivation: ( 2 nd pair of Canonical variates and Canonical correlation) has covariance matrix Now

and maximizing Is equivalent to maximizing subject to Using the Lagrange multiplier technique

Now and alsogives the restrictions

These equations can used to show that are eigenvectors of the matrices associated with the 2 nd largest eigenvalue (same for both matrices) The 2 nd largest eigenvalue of the two matrices is the square of the 2 nd canonical correlation coefficient  2

Coefficients for the i th pair of canonical variates, are eigenvectors of the matrices associated with the i th largest eigenvalue (same for both matrices) The i th largest eigenvalue of the two matrices is the square of the i th canonical correlation coefficient  i continuing

Example Variables relaxation Score (X 1 ) motivation score (X 2 ). Reading (Y 1 ), Language (Y 2 ) and Mathematics (Y 3 ).

Summary Statistics

Canonical Correlation statistics Statistics

continued

Summary U 1 = Relax Mot V 1 = Read Lang Math  1 =.592 U 2 = Relax Mot V 2 = Math Read Lang  2 =.159