Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multivariate Analysis Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.

Similar presentations


Presentation on theme: "Multivariate Analysis Harry R. Erwin, PhD School of Computing and Technology University of Sunderland."— Presentation transcript:

1 Multivariate Analysis Harry R. Erwin, PhD School of Computing and Technology University of Sunderland

2 Resources Everitt, BS, and G Dunn (2001) Applied Multivariate Data Analysis, London:Arnold. Everitt, BS (2005) An R and S-PLUS® Companion to Multivariate Analysis, London:Springer

3 Roadmap PBL group assignments Multivariate data graphics tutorials Testing distributional assumptions Principle components analysis Cluster analysis Summary

4 PBL group assignments Two groups

5 Multivariate data graphics tutorials Available on the module website Covers both standard and lattice graphics

6 Testing distributional assumptions For these techniques to work, the data have to be distributed in a multivariate normal distribution. There are two ways of testing this: –Examine each variable separately (this does not imply the data follow a multivariate normal distribution) –Convert the data to a single number (a generalised distance) and plot against an appropriate chi-squared distribution.

7 Separate Examination X has two columns, and the combined data are bivariate normal: par(mfrow=c(1,2) qqnorm(X[,1],ylab= “Ordered observations”) qqline(X[,1]) qqnorm(X[,2],ylab= “Ordered observations”) qqline(X[,2])

8 Comparison to a chi-squared distribution Same data, using chisplot available at http://biostatistics.iop.kcl.ac.uk/publications/everit t/ http://biostatistics.iop.kcl.ac.uk/publications/everit t/ par(mfrow=c(1,1) chisplot(X)

9 Principle components analysis (PCA) Describe the variation of a set of multivariate data in terms of a set of uncorrelated variables, each a linear combination of the original variables. The goal is to reduce the number of meaningful variables to a small number that summarise the data set. Deals with highly correlated explanatory variables. Representative of projection pursuit methods.

10 Cluster analysis A tool for classifying a phenomenon that sorts the samples into a small number of groups or clusters, usually non-overlapping. These clusters may not be unique. –Predictive clustering –Clustering based on causation Hence a cluster analysis is neither true nor false, but is simply useful.

11 Cluster analysis approaches Agglomerative hierarchical clustering (fusion from the bottom-up) K-means type methods (partition from the top down) Classification maximum likelihood methods (assume a model for the shape of the clusters) Or you can simply use the tree library. library(tree) model<-tree(ozone~.,data=ozone.pollution) plot(model) text(model)

12 Summary Multivariate statistics is usually done from the point of view that there are no laws of scientific inference— ‘anything goes’. First, you explore the data to come up with hypotheses— the models. Then you confirm the models on a second data set. If you have a single data set, split it into two parts, one for exploration and one for confirmation. Good data analysis is based on the skilful interpretation of evidence and the subsequent development of hunches.


Download ppt "Multivariate Analysis Harry R. Erwin, PhD School of Computing and Technology University of Sunderland."

Similar presentations


Ads by Google