Methods of multivariate analysis Ing. Jozef Palkovič, PhD.

Slides:



Advertisements
Similar presentations
The Robert Gordon University School of Engineering Dr. Mohamed Amish
Advertisements

Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Canonical Correlation
Variance reduction techniques. 2 Introduction Simulation models should be coded such that they are efficient. Efficiency in terms of programming ensures.
Chapter Nineteen Factor Analysis.
Dimension reduction (1)
« هو اللطیف » By : Atefe Malek. khatabi Spring 90.
COMPUTER AIDED DIAGNOSIS: FEATURE SELECTION Prof. Yasser Mostafa Kadah –
Chapter 17 Overview of Multivariate Analysis Methods
Lecture 7: Principal component analysis (PCA)
1 Multivariate Statistics ESM 206, 5/17/05. 2 WHAT IS MULTIVARIATE STATISTICS? A collection of techniques to help us understand patterns in and make predictions.
Multivariate Methods Pattern Recognition and Hypothesis Testing.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 14 Using Multivariate Design and Analysis.
Multivariate Analysis Techniques
Discriminant Analysis Testing latent variables as predictors of groups.
Multivariate Methods EPSY 5245 Michael C. Rodriguez.
Inferential statistics Hypothesis testing. Questions statistics can help us answer Is the mean score (or variance) for a given population different from.
Factor Analysis Psy 524 Ainsworth.
Business Research Methods William G. Zikmund Chapter 24 Multivariate Analysis.
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multivariate Data Analysis Chapter 8 - Canonical Correlation Analysis.
Canonical Correlation Analysis, Redundancy Analysis and Canonical Correspondence Analysis Hal Whitehead BIOL4062/5062.
Understanding Statistics
بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance.
1 Multivariate Analysis (Source: W.G Zikmund, B.J Babin, J.C Carr and M. Griffin, Business Research Methods, 8th Edition, U.S, South-Western Cengage Learning,
Chapter 24 Multivariate Statistical Analysis © 2010 South-Western/Cengage Learning. All rights reserved. May not be scanned, copied or duplicated, or posted.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Microarray data analysis David A. McClellan, Ph.D. Introduction to Bioinformatics Brigham Young University Dept. Integrative Biology.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Discriminant Analysis Discriminant analysis is a technique for analyzing data when the criterion or dependent variable is categorical and the predictor.
Measurement Models: Exploratory and Confirmatory Factor Analysis James G. Anderson, Ph.D. Purdue University.
Available at Chapter 13 Multivariate Analysis BCB 702: Biostatistics
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition Instructor’s Presentation Slides 1.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.
Lecture 12 Factor Analysis.
Correlation & Regression Analysis
Multivariate Analysis and Data Reduction. Multivariate Analysis Multivariate analysis tries to find patterns and relationships among multiple dependent.
Module III Multivariate Analysis Techniques- Framework, Factor Analysis, Cluster Analysis and Conjoint Analysis Research Report.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Applied Quantitative Analysis and Practices LECTURE#28 By Dr. Osman Sadiq Paracha.
Chapter 8 Relationships Among Variables. Outline What correlational research investigates Understanding the nature of correlation What the coefficient.
Principal Component Analysis
D/RS 1013 Discriminant Analysis. Discriminant Analysis Overview n multivariate extension of the one-way ANOVA n looks at differences between 2 or more.
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L11.1 Lecture 11: Canonical correlation analysis (CANCOR)
FACTOR ANALYSIS.  The basic objective of Factor Analysis is data reduction or structure detection.  The purpose of data reduction is to remove redundant.
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
FACTOR ANALYSIS CLUSTER ANALYSIS Analyzing complex multidimensional patterns.
Multivariate Analysis - Introduction. What is Multivariate Analysis? The expression multivariate analysis is used to describe analyses of data that have.
Multiple Regression.
PREDICT 422: Practical Machine Learning
Multiple Regression Prof. Andy Field.
LECTURE 11: Advanced Discriminant Analysis
Multivariate Analysis - Introduction
Descriptive Statistics vs. Factor Analysis
EPSY 5245 EPSY 5245 Michael C. Rodriguez
Multivariate Statistics
Principal Component Analysis
Product moment correlation
Multivariate Methods Berlin Chen, 2005 References:
Multivariate Analysis - Introduction
Canonical Correlation Analysis
MGS 3100 Business Analysis Regression Feb 18, 2016
Unsupervised Learning
Presentation transcript:

Methods of multivariate analysis Ing. Jozef Palkovič, PhD.

Multivariate analysis  Consists of a collection of methods that can be used when several measurements are made on each individual or object in one or more samples.  We will refer measurements as variables and objects or individuals as units  Using multivariate analysis, the variables can be examined simultaneously in order to access the key features of the process that produced them.  The multivariate approach enables us to explore the joint performance of the variables and determine the effect of each variable in the presence of the others

 Goal of many multivariate approaches is simplification  We seek to express what is going on in terms of a reduced set of dimensions -> exploratory techniques  Generate hypotheses rather than test them  If goal is formal hypothesis test -> descriptive and inferential statistics  Allow several variables to be tested and still preserve the significance level  Do this for any intercorrelation structure of the variables

Basic types of data  A single sample with several variables measured on each sampling unit (subject or object)  A single sample with two sets of variables measured on each unit  Two samples with several variables measured on each unit  Three or more samples with several variables measured on each unit

A single sample with several variables measured on each sampling unit (subject or object)  Test the hypothesis that the means of the variables have specified values.  Test the hypothesis that the variables are uncorrelated and have a common variance.  Find a small set of linear combinations of the original variables that summarizes most of the variation in the data (principal components).  Express the original variables as linear functions of a smaller set of underlying variables that account for the original variables and their intercorrelations (factor analysis).

A single sample with two sets of variables measured on each unit:  Determine the number, the size, and the nature of relationships between the two sets of variables (canonical correlation). For example, you may wish to relate a set of interest variables to a set of achievement variables.  How much overall correlation is there between these two sets?  Find a model to predict one set of variables from the other set (multivariate multiple regression).

Two samples with several variables measured on each unit  Compare the means of the variables across the two samples (Hotelling’s T 2 -test).  Find a linear combination of the variables that best separates the two samples (discriminant analysis).  Find a function of the variables that accurately allocates the units into the two groups (classification analysis)

Three or more samples with several variables measured on each unit  Compare the means of the variables across the groups (multivariate analysis of variance).  Extension of discrimination analysis to more than two groups.  Extension of clasification analysis to more than two groups

Canonical correlation  Is concerned with the amount of linear relationship between two sets of variables  We often measure two types of variables on each research unit, for example a set of aptitude variables and a set of achievement variables, set of teacher behaviour and set of student behaviour, a set of ecological variables and a set of environmental variables

Multivariate regression  We consider the linear relationship between one or more y´s (dependent variables) and one or more x´s (the independent variables)  One aspect of interest will be choosing which variables to include in the model if this is not already known  We can distinguish three cases according to the number of variables:  Simple linear regression: one y and one x  Multiple linear regression: one y and several x’s  Multivariate multiple linear regression: several y’s and several x’s

Discriminant Analysis: Description of Group Separation  There are two major objectives in separation of groups: 1. Description of group separation, in which linear functions of the variables (discriminant functions) are used to describe the differences between two or more groups. The goals of descriptive discriminant analysis include identifying the relative contribution of the p variables to separation of the groups and finding the optimal plane on which the points can be projected to best illustrate the configuration of the groups 2. Prediction or allocation of observations to groups, in which linear or quadratic functions of the variables (classification functions) are employed to assign an individual sampling unit to one of the groups. The measured values in the observation vector for an individual or object are evaluated by the classification functions to find the group to which the individual most likely belongs  Discriminant functions are linear combinations of variables that best separate groups.

Principal components analysis  In principal component analysis, we seek to maximize the variance of a linear combination of the variables  For example, we might want to rank students on the basis of their scores on achievement tests in English, mathematics, reading, and so on. An average score would provide a single scale on which to compare the students, but with unequal weights we can spread the s  principal component analysis is a one-sample technique applied to data with no groupings among the observationstudents out further on the scale and obtain a better ranking

Principal components analysis  Principal components, on the other hand, are concerned only with the core structure of a single sample of observations on p variables. None of the variables is designated as dependent, and no grouping of observations is assumed  The first principal component is the linear combination with maximal variance; we are essentially searching for a dimension along which the observations are maximally separated or spread out  The second principal component is the linear combination with maximal variance in a direction orthogonal to the first principal component, and so on.

Principal components analysis  In some applications, the principal components are an end in themselves and may be amenable to interpretation  More often they are obtained for use as input to another analysis.  For example, two situations in regression where principal components may be useful are (1) if the number of independent variables is large relative to the number of observations, a test may be ineffective or even impossible  If the independent variables are highly correlated, the estimates of repression coefficients may be unstable. In such cases, the independent variables can be reduced to a smaller number of principal components that will yield a better test or more stable estimates of the regression coefficients.

Factor Analysis  In factor analysis we represent the variables as linear combinations of a few random variables called factors  The factors are underlying constructs or latent variables that “generate” the y’s  Like the original variables, the factors vary from individual to individual; but unlike the variables, the factors cannot be measured or observed. The existence of these hypothetical variables is therefore open to question.  The goal of factor analysis is to reduce the redundancy among the variables by using a smaller number of factors

PCA vs Factor analysis  Factor analysis is related to principal component analysis in that both seek a simpler structure in a set of variables but they differ in many respects, two differences in basic approach are as follows  Principal components are defined as linear combinations of the original variables. In factor analysis, the original variables are expressed as linear combinations of the factors.  In principal component analysis, we explain a large part of the total variance of the variables, In factor analysis, we seek to account for the covariances or correlations among the variables.

Factor analysis  In practice, there are some data sets for which the factor analysis model does not provide a satisfactory fit. Thus, factor analysis remains somewhat subjective in many applications, and it is considered controversial by some statisticians. Sometimes a  few easily interpretable factors emerge, but for other data sets, neither the number of factors nor the interpretation is clear.

Cluster analysis  In cluster analysis we search for patterns in a data set by grouping the (multivariate) observations into clusters.  The goal is to find an optimal grouping for which the observations or objects within each cluster are similar, but the clusters are dissimilar to each other.  Cluster analysis differs fundamentally from discrimination analysis. In classification analysis, we allocate the observations to a known number of predefined groups or populations. In cluster analysis, neither the number of groups nor the groups themselves are known in advance

Cluster analysis  To group the observations into clusters, many techniques begin with similarities between all pairs of observations  In many cases the similarities are based on some measure of distance. Other cluster methods use a preliminary choice for cluster centers or a comparison of within- and between-cluster variability  Two common approaches to clustering the observation vectors are:  hierarchical clustering  partitioning

Cluster analysis  In hierarchical clustering we typically start with n clusters, one for each observation, and end with a single cluster containing all n observations. At each step, an observation or a cluster of observations is absorbed into another cluster.  In partitioning, we simply divide the observations into g clusters. This can be done by starting with an initial partitioning or with cluster centers and then reallocating the observations according to some optimality criterion.

Source and recommended literature for further reading: Rencher A: Methods of Multivariate analysis, Second Edition, Brigham Young University, Wiley-Interscience, 2002, ISBN