Download presentation
Presentation is loading. Please wait.
Published byDarrell Stanley Modified over 8 years ago
1
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
2
Multivariate analysis Consists of a collection of methods that can be used when several measurements are made on each individual or object in one or more samples. We will refer measurements as variables and objects or individuals as units Using multivariate analysis, the variables can be examined simultaneously in order to access the key features of the process that produced them. The multivariate approach enables us to explore the joint performance of the variables and determine the effect of each variable in the presence of the others
3
Goal of many multivariate approaches is simplification We seek to express what is going on in terms of a reduced set of dimensions -> exploratory techniques Generate hypotheses rather than test them If goal is formal hypothesis test -> descriptive and inferential statistics Allow several variables to be tested and still preserve the significance level Do this for any intercorrelation structure of the variables
4
Basic types of data A single sample with several variables measured on each sampling unit (subject or object) A single sample with two sets of variables measured on each unit Two samples with several variables measured on each unit Three or more samples with several variables measured on each unit
5
A single sample with several variables measured on each sampling unit (subject or object) Test the hypothesis that the means of the variables have specified values. Test the hypothesis that the variables are uncorrelated and have a common variance. Find a small set of linear combinations of the original variables that summarizes most of the variation in the data (principal components). Express the original variables as linear functions of a smaller set of underlying variables that account for the original variables and their intercorrelations (factor analysis).
6
A single sample with two sets of variables measured on each unit: Determine the number, the size, and the nature of relationships between the two sets of variables (canonical correlation). For example, you may wish to relate a set of interest variables to a set of achievement variables. How much overall correlation is there between these two sets? Find a model to predict one set of variables from the other set (multivariate multiple regression).
7
Two samples with several variables measured on each unit Compare the means of the variables across the two samples (Hotelling’s T 2 -test). Find a linear combination of the variables that best separates the two samples (discriminant analysis). Find a function of the variables that accurately allocates the units into the two groups (classification analysis)
8
Three or more samples with several variables measured on each unit Compare the means of the variables across the groups (multivariate analysis of variance). Extension of discrimination analysis to more than two groups. Extension of clasification analysis to more than two groups
9
Canonical correlation Is concerned with the amount of linear relationship between two sets of variables We often measure two types of variables on each research unit, for example a set of aptitude variables and a set of achievement variables, set of teacher behaviour and set of student behaviour, a set of ecological variables and a set of environmental variables
10
Multivariate regression We consider the linear relationship between one or more y´s (dependent variables) and one or more x´s (the independent variables) One aspect of interest will be choosing which variables to include in the model if this is not already known We can distinguish three cases according to the number of variables: Simple linear regression: one y and one x Multiple linear regression: one y and several x’s Multivariate multiple linear regression: several y’s and several x’s
11
Discriminant Analysis: Description of Group Separation There are two major objectives in separation of groups: 1. Description of group separation, in which linear functions of the variables (discriminant functions) are used to describe the differences between two or more groups. The goals of descriptive discriminant analysis include identifying the relative contribution of the p variables to separation of the groups and finding the optimal plane on which the points can be projected to best illustrate the configuration of the groups 2. Prediction or allocation of observations to groups, in which linear or quadratic functions of the variables (classification functions) are employed to assign an individual sampling unit to one of the groups. The measured values in the observation vector for an individual or object are evaluated by the classification functions to find the group to which the individual most likely belongs Discriminant functions are linear combinations of variables that best separate groups.
12
Principal components analysis In principal component analysis, we seek to maximize the variance of a linear combination of the variables For example, we might want to rank students on the basis of their scores on achievement tests in English, mathematics, reading, and so on. An average score would provide a single scale on which to compare the students, but with unequal weights we can spread the s principal component analysis is a one-sample technique applied to data with no groupings among the observationstudents out further on the scale and obtain a better ranking
13
Principal components analysis Principal components, on the other hand, are concerned only with the core structure of a single sample of observations on p variables. None of the variables is designated as dependent, and no grouping of observations is assumed The first principal component is the linear combination with maximal variance; we are essentially searching for a dimension along which the observations are maximally separated or spread out The second principal component is the linear combination with maximal variance in a direction orthogonal to the first principal component, and so on.
14
Principal components analysis In some applications, the principal components are an end in themselves and may be amenable to interpretation More often they are obtained for use as input to another analysis. For example, two situations in regression where principal components may be useful are (1) if the number of independent variables is large relative to the number of observations, a test may be ineffective or even impossible If the independent variables are highly correlated, the estimates of repression coefficients may be unstable. In such cases, the independent variables can be reduced to a smaller number of principal components that will yield a better test or more stable estimates of the regression coefficients.
15
Factor Analysis In factor analysis we represent the variables as linear combinations of a few random variables called factors The factors are underlying constructs or latent variables that “generate” the y’s Like the original variables, the factors vary from individual to individual; but unlike the variables, the factors cannot be measured or observed. The existence of these hypothetical variables is therefore open to question. The goal of factor analysis is to reduce the redundancy among the variables by using a smaller number of factors
16
PCA vs Factor analysis Factor analysis is related to principal component analysis in that both seek a simpler structure in a set of variables but they differ in many respects, two differences in basic approach are as follows Principal components are defined as linear combinations of the original variables. In factor analysis, the original variables are expressed as linear combinations of the factors. In principal component analysis, we explain a large part of the total variance of the variables, In factor analysis, we seek to account for the covariances or correlations among the variables.
17
Factor analysis In practice, there are some data sets for which the factor analysis model does not provide a satisfactory fit. Thus, factor analysis remains somewhat subjective in many applications, and it is considered controversial by some statisticians. Sometimes a few easily interpretable factors emerge, but for other data sets, neither the number of factors nor the interpretation is clear.
18
Cluster analysis In cluster analysis we search for patterns in a data set by grouping the (multivariate) observations into clusters. The goal is to find an optimal grouping for which the observations or objects within each cluster are similar, but the clusters are dissimilar to each other. Cluster analysis differs fundamentally from discrimination analysis. In classification analysis, we allocate the observations to a known number of predefined groups or populations. In cluster analysis, neither the number of groups nor the groups themselves are known in advance
19
Cluster analysis To group the observations into clusters, many techniques begin with similarities between all pairs of observations In many cases the similarities are based on some measure of distance. Other cluster methods use a preliminary choice for cluster centers or a comparison of within- and between-cluster variability Two common approaches to clustering the observation vectors are: hierarchical clustering partitioning
20
Cluster analysis In hierarchical clustering we typically start with n clusters, one for each observation, and end with a single cluster containing all n observations. At each step, an observation or a cluster of observations is absorbed into another cluster. In partitioning, we simply divide the observations into g clusters. This can be done by starting with an initial partitioning or with cluster centers and then reallocating the observations according to some optimality criterion.
22
Source and recommended literature for further reading: Rencher A: Methods of Multivariate analysis, Second Edition, Brigham Young University, Wiley-Interscience, 2002, ISBN 9780471418894
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.