Description of Multivariate Data. Multivariate Analysis The analysis of many variables.

Slides:



Advertisements
Similar presentations
Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition.
Advertisements

Discriminant Analysis and Classification. Discriminant Analysis as a Type of MANOVA  The good news about DA is that it is a lot like MANOVA; in fact.
Ch 7.7: Fundamental Matrices
Mutidimensional Data Analysis Growth of big databases requires important data processing.  Need for having methods allowing to extract this information.
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
The Laws of Linear Combination James H. Steiger. Goals for this Module In this module, we cover What is a linear combination? Basic definitions and terminology.
Lecture 7: Principal component analysis (PCA)
Principal component analysis (PCA)
Dimensional reduction, PCA
Probability theory 2011 The multivariate normal distribution  Characterizing properties of the univariate normal distribution  Different definitions.
Regionalized Variables take on values according to spatial location. Given: Where: A “structural” coarse scale forcing or trend A random” Local spatial.
Canonical correlations
Data Basics. Data Matrix Many datasets can be represented as a data matrix. Rows corresponding to entities Columns represents attributes. N: size of the.
SOWK 6003 Social Work Research Week 10 Quantitative Data Analysis
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.
Basic Mathematics for Portfolio Management. Statistics Variables x, y, z Constants a, b Observations {x n, y n |n=1,…N} Mean.
Techniques for studying correlation and covariance structure
Lecture II-2: Probability Review
Linear regression models in matrix terms. The regression function in matrix terms.
1 10. Joint Moments and Joint Characteristic Functions Following section 6, in this section we shall introduce various parameters to compactly represent.
Multivariate Data and Matrix Algebra Review BMTRY 726 Spring 2012.
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
Separate multivariate observations
1 Chapter 2 Matrices Matrices provide an orderly way of arranging values or functions to enhance the analysis of systems in a systematic manner. Their.
Maximum Likelihood Estimation
Orthogonal Matrices and Spectral Representation In Section 4.3 we saw that n  n matrix A was similar to a diagonal matrix if and only if it had n linearly.
Chapter 2 Dimensionality Reduction. Linear Methods
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
Multivariate Analysis of Variance (MANOVA). Outline Purpose and logic : page 3 Purpose and logic : page 3 Hypothesis testing : page 6 Hypothesis testing.
CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.
Discriminant Function Analysis Basics Psy524 Andrew Ainsworth.
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
Chapter 6 Linear Programming: The Simplex Method
Digital Image Processing, 3rd ed. © 1992–2008 R. C. Gonzalez & R. E. Woods Gonzalez & Woods Matrices and Vectors Objective.
Multivariate Statistics Matrix Algebra I W. M. van der Veld University of Amsterdam.
N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.
Section 2.3 Properties of Solution Sets
1 Sample Geometry and Random Sampling Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking.
1 Matrix Algebra and Random Vectors Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking.
AGC DSP AGC DSP Professor A G Constantinides©1 Signal Spaces The purpose of this part of the course is to introduce the basic concepts behind generalised.
Colorado Center for Astrodynamics Research The University of Colorado 1 STATISTICAL ORBIT DETERMINATION Probability and statistics review ASEN 5070 LECTURE.
Principle Component Analysis and its use in MA clustering Lecture 12.
Principal Component Analysis (PCA)
Joint Moments and Joint Characteristic Functions.
Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
1 Canonical Correlation Analysis Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking.
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L11.1 Lecture 11: Canonical correlation analysis (CANCOR)
1 Principal Components Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking and Multimedia.
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
1 Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors. Review Matrices.
Regression Analysis Part A Basic Linear Regression Analysis and Estimation of Parameters Read Chapters 3, 4 and 5 of Forecasting and Time Series, An Applied.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Statistics made easy Basic principles. Statistics made easy Assume we are interested in getting some information about the average height of people in.
Principal Component Analysis
Chapter 9: Systems of Equations and Inequalities; Matrices
Principal Components Analysis
Dimension Reduction via PCA (Principal Component Analysis)
Machine Learning Math Essentials Part 2
Matrix Algebra and Random Vectors
X.1 Principal component analysis
Principal Components Analysis
数据的矩阵描述.
Chapter_19 Factor Analysis
Multivariate Methods Berlin Chen
Principal Component Analysis
Multivariate Methods Berlin Chen, 2005 References:
Principal Component Analysis
Presentation transcript:

Description of Multivariate Data

Multivariate Analysis The analysis of many variables

Multivariate Analysis : The analysis of many variables More precisely and also more traditionally this term stands fo the study of a random sample pf n objects (units or cases) such that on each object we measure p variables or characteristics. So that for each object there is a vector: Each with p components:

The variables will be correlated as they are measured on the same object. This may lead to incorrect and inadequate analysis: A common practice is to treat each variable separately by applying methods of univariate analysis.

The challenge of multivariate analysis is to untangle the overlapping information provided by a set of correlated variables and to reveal the underlying structure.

This is done by a variety of methods, some of which are generalizations of univariate methods and some which are multivariate with without univariate counterparts

The purpose of this course is to describe and perhaps justify these methods, and also provide some guidance about how to select an appropriate method for a given multivariate data set.

Example x 1 = age (in years) at entry to university, Randomly select n = 5 students as objects and for each student measure: x 2 = mark out of 100 in an exam at the end of the first year, x 3 = sex (0 = female, 1= male)

The result may look something like this: Objectsx1x1 x2x2 x3x It is of interest to note that the variables in the example are not of the same type: –x 1 is a continuous variable, –x 2 is a discrete variable and –x 3 is a binary variable

variables The Data Matrix

where We can write = the i th row of X.

where We can also write

is the p-vector denoting the p observations on the first object, while In this notation is the n-vector denoting the observations on the first variable The rows form a random sample while the columns do not (this is emphasized in the notation by the use of parentheses)

The objective of multivariate analysis will be a attempt to find some feature of the variables (i.e. the columns of the data matrix) At other times, the objective of multivariate analysis will be a attempt to find some feature of the individuals (i.e. the rows of the data matrix) The feature that we often look for is grouping of the individuals or of the variables. We will give a classification of multivariate methods later

Summarization of the data

Even when n and p are moderately large, the amount of information (np elements of the data matrix) can be overwhelming and it is necessary to find ways of summarizing data. Later on we will discuss way of graphical representation of the data

Definitions: 1.The sample mean for the i th variable 2.The sample variance for the i th variable 3.The sample covariance between the i th variable and the j th variable

Defn: The sample mean vector Putting the definitions together we are led to the following definitions:

Defn: The sample covariance matrix

Expressing the sample mean vector and the sample covariance matrix in terms of the data matrix

The sample mean vector

Note where is the n-vector whose components are all equal to 1.

The sample covariance matrix

We can write

then because

The final step is to realize that that It is easy to check that

So that

In the text book And then

Another Expression for S

Note: and

Thus Hence

Data are frequently scaled as well as centered. The scaling is done by introducing: Defn: the sample correlation coefficient for (between) the i th and the j th variables the sample correlation matrix

Obviously and using the Schwartz’s inequality If R = I then we say the variables are uncorrelated

Note: if we denote Then it can be checked that

Measures of Multivariate Scatter

The sample variance-covariance matrix S is an obvious generalization of the univariate concept of variance, which measures scatter about the mean. Sometimes it is convenient to have a single number to measure the overall multivariate scatter.

There are two common measures of this type: Defn: The generalized sample variance Defn: The total sample variance

In both cases, large values indicate a high degree of scatter about the centroid: low values indicate concentration about the centroid: Using the eigenvalues 1, 2, …, p of the matrix S,it can be shown that

If p = 0 then This says that there is a linear dependence amongst the variables. Normally, S is positive definite and all the eigenvalues are positive.

Linear combinations

Taking linear combinations of variables is one of the most important tools of multivariate analysis. This is for basically two reasons: 1.A few appropriately chosen combinations may provide more of the information than a lot of the original variables. (this is called dimension reduction.) 2.Linear combinations can simplify the structure of the variance-covariance matrix, which can help in the interpretation of the data.

For a given vector of constraints: We consider a linear combination For i = 1, 2, …, n. Then

And the variance of the Y’s is