Multivariate Analysis Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.

Slides:



Advertisements
Similar presentations
Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition.
Advertisements

Plotting Multivariate Data Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Business Statistics - QBM117 Scatter diagrams and measures of association.
Econ 140 Lecture 41 More on Univariate Populations Lecture 4.
Multivariate Methods Pattern Recognition and Hypothesis Testing.
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 14 Using Multivariate Design and Analysis.
DNA Microarray Bioinformatics - #27611 Program Normalization exercise (from last week) Dimension reduction theory (PCA/Clustering) Dimension reduction.
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
CSE 300: Software Reliability Engineering Topics covered: Software metrics and software reliability Software complexity and software quality.
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker Part of the slides is adapted from Chris Workman.
Business Statistics - QBM117 Statistical inference for regression.
Correlation and Regression Analysis
Techniques for studying correlation and covariance structure
1 Multivariate Normal Distribution Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking.
Multivariate Analysis Techniques
Multivariate Statistics Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Correlation & Regression
Regression and Correlation Methods Judy Zhong Ph.D.
Advanced Statistical Methods for Research Math 736/836
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 
Multivariate Statistical Data Analysis with Its Applications
ArrayCluster: an analytic tool for clustering, data visualization and module finder on gene expression profiles 組員:李祥豪 謝紹陽 江建霖.
● Final exam Wednesday, 6/10, 11:30-2:30. ● Bring your own blue books ● Closed book. Calculators and 2-page cheat sheet allowed. No cell phone/computer.
Multiple Regression The Basics. Multiple Regression (MR) Predicting one DV from a set of predictors, the DV should be interval/ratio or at least assumed.
Introduction to Statistics Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition Instructor’s Presentation Slides 1.
Multivariate Data Analysis Chapter 2 – Examining Your Data
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.3 Two-Way ANOVA.
Introduction to Basic Statistical Tools for Research OCED 5443 Interpreting Research in OCED Dr. Ausburn OCED 5443 Interpreting Research in OCED Dr. Ausburn.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
Module III Multivariate Analysis Techniques- Framework, Factor Analysis, Cluster Analysis and Conjoint Analysis Research Report.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 
Advanced Statistics Factor Analysis, I. Introduction Factor analysis is a statistical technique about the relation between: (a)observed variables (X i.
Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
1 Statistics & R, TiP, 2011/12 Multivariate Methods  Multivariate data  Data display  Principal component analysis Unsupervised learning technique 
Clustering / Scaling. Cluster Analysis Objective: – Partitions observations into meaningful groups with individuals in a group being more “similar” to.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses pt.1.
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
AP Statistics Review Day 1 Chapters 1-4. AP Exam Exploring Data accounts for 20%-30% of the material covered on the AP Exam. “Exploratory analysis of.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Overview G. Jogesh Babu. R Programming environment Introduction to R programming language R is an integrated suite of software facilities for data manipulation,
PREDICT 422: Practical Machine Learning
Inference for the mean vector
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Correlation and Regression
Techniques for studying correlation and covariance structure
Descriptive Statistics vs. Factor Analysis
Principal Components Analysis
Statistical Data Analysis
Warmup A study was done comparing the number of registered automatic weapons (in thousands) along with the murder rate (in murders per 100,000) for 8.
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Principal Component Analysis
Presentation transcript:

Multivariate Analysis Harry R. Erwin, PhD School of Computing and Technology University of Sunderland

Resources Everitt, BS, and G Dunn (2001) Applied Multivariate Data Analysis, London:Arnold. Everitt, BS (2005) An R and S-PLUS® Companion to Multivariate Analysis, London:Springer

Roadmap PBL group assignments Multivariate data graphics tutorials Testing distributional assumptions Principle components analysis Cluster analysis Summary

PBL group assignments Two groups

Multivariate data graphics tutorials Available on the module website Covers both standard and lattice graphics

Testing distributional assumptions For these techniques to work, the data have to be distributed in a multivariate normal distribution. There are two ways of testing this: –Examine each variable separately (this does not imply the data follow a multivariate normal distribution) –Convert the data to a single number (a generalised distance) and plot against an appropriate chi-squared distribution.

Separate Examination X has two columns, and the combined data are bivariate normal: par(mfrow=c(1,2) qqnorm(X[,1],ylab= “Ordered observations”) qqline(X[,1]) qqnorm(X[,2],ylab= “Ordered observations”) qqline(X[,2])

Comparison to a chi-squared distribution Same data, using chisplot available at t/ t/ par(mfrow=c(1,1) chisplot(X)

Principle components analysis (PCA) Describe the variation of a set of multivariate data in terms of a set of uncorrelated variables, each a linear combination of the original variables. The goal is to reduce the number of meaningful variables to a small number that summarise the data set. Deals with highly correlated explanatory variables. Representative of projection pursuit methods.

Cluster analysis A tool for classifying a phenomenon that sorts the samples into a small number of groups or clusters, usually non-overlapping. These clusters may not be unique. –Predictive clustering –Clustering based on causation Hence a cluster analysis is neither true nor false, but is simply useful.

Cluster analysis approaches Agglomerative hierarchical clustering (fusion from the bottom-up) K-means type methods (partition from the top down) Classification maximum likelihood methods (assume a model for the shape of the clusters) Or you can simply use the tree library. library(tree) model<-tree(ozone~.,data=ozone.pollution) plot(model) text(model)

Summary Multivariate statistics is usually done from the point of view that there are no laws of scientific inference— ‘anything goes’. First, you explore the data to come up with hypotheses— the models. Then you confirm the models on a second data set. If you have a single data set, split it into two parts, one for exploration and one for confirmation. Good data analysis is based on the skilful interpretation of evidence and the subsequent development of hunches.