Download presentation
Presentation is loading. Please wait.
Published byIris Gibbs Modified over 8 years ago
1
Introduction to Multivariate Data Analysis Pekka Malo 30E00500 – Quantitative Empirical Research Spring 2016
2
Agenda Course schedule and practical arrangements Basic concepts of multivariate analysis Selecting a multivariate technique Guidelines for multivariate analysis Tutorial: Missing value analysis 04.01.16 Missing Value Analysis 2
3
What is Multivariate Data Analysis? Analyzing situations when you have –Many independent (i.e. explanatory) variables and/or –Many dependent (i.e. response/explained) variables –Varying degrees of correlations between variables Multivariate statistics ~ statistical methods that simultaneously analyze several measurements on each data-case (e.g. individual) under investigation. 04.01.16 Missing Value Analysis 3
4
Why Multivariate Statistics? Difficulty of addressing complicated research questions with univariate tools Several drivers for increasing popularity, e.g. –Availability of nicely packaged software –Greater complexity of contemporary research –Large amounts of data –Emergence of data mining perspective (finding unforeseen patterns and associations) Measurement Hypothesis testing Explanation & prediction 04.01.16 Missing Value Analysis 4
5
Qualitative & Quantitative Research Quantitative Qualitative Confirmatory Exploratory 04.01.16 Missing Value Analysis 5
6
Confirmatory Analysis 04.01.16 Missing Value Analysis 6 Theory / Pre-Knowledge Hypothesis Data / Random Sample Statistical Analysis Verification New Hypothesis / Theory Population
7
Exploratory Analysis 04.01.16 Missing Value Analysis 7 New Theory / Invention Data / Random Sample Statistical Analysis New Hypothesis / Theory Population New Data (Random) Verification Data Analysis
8
Experimental vs. Non-experimental Research Experimental research Researcher has control over the levels of at least one IV Definition of levels Implementation Random assignment Control over other influential factors Attempt to create populations by ”treating” subgroups from an originally homogeneous group differently Statistical tests to examine whether the treatment had an effect (i.e. do the samples still come from the same population) Non-experimental research Researcher cannot control the assignment of subjects to the levels of IV(s) Difficult to attribute causality to an IV Most multivariate techniques have been developed for non-experimntal research Investigation of relationships among variables in some predefined population (correlational / survey research) 04.01.16 Missing Value Analysis 8
9
Part I: Some Useful Concepts 9
10
Data Types and Measurement Scales Data Qualitative (non-metric) Nominal (categorization) Ordinal (rank order) Quantitative (metric) Interval (differences) Ratio 04.01.16 Missing Value Analysis 10
11
Measurement Error Examples of potential causes –Data entry errors –Imprecise measurement –Inability of respondents to provide accurate information Affects observed relationships and reduces the strength of multivariate techniques Measurement error ~ the degree to which observed values are representative of true values (all variables tend to have some error) 04.01.16 Missing Value Analysis 11
12
Validity and Reliability Reducing Measurement Error Validity ~ the degree to which a measure accurately represents what it is expected to Do we understand the target of measurement? Are we asking the right questions? Reliability ~ the degree to which the observed variable measures the “true” value and is “error free” Does the measure produce consistent results? 04.01.16 Missing Value Analysis 12
13
Statistical Significance and Power Type I Errors: H 0 is true, but H 1 is accepted ( risks) Type II Errors: H 1 is true, but H 0 is accepted (ß risks) Power (1- ß): Probability of rejecting H 0 when it is false H 0 true H 0 false Fail to Reject H 0 1- Type II error Reject H 0 r Type I error 1- Power Ref: Hair et al. 04.01.16 Missing Value Analysis 13
14
Determinants of Statistical Power Effect size –Magnitude of the effect of interest –Larger effects are easier to find Alpha –Choice of strict alpha reduces power –Need to strike a balance between level of alpha risk and the resulting power Sample size –The larger the sample size, the higher the power –Very large samples can also lead to oversensitivity 04.01.16 Missing Value Analysis 14
15
Power, Alpha-risk and Sample Size s Ref: Hair et al. Hair et al. (2010): Multivariate Data Analysis 04.01.16 Missing Value Analysis 15
16
Part II: Classification of multivariate methods 16
17
Classification of MV methods Type of relationship is being examined –Can the variables be divided into independent and dependent classifications based on some theory? Number of dependent variables –How many variables are treated as dependent in a single analysis? Type of dependent / independent variables –How are the variables measured (metric vs. non-metric)? 04.01.16 Missing Value Analysis 17
18
Dependence vs. Interdependence p 1 p 2 : Dependence –(Logistic) Regression Analysis –Analysis of Variance (ANOVA) –Discriminant Analysis –Canonical Correlation Analysis –Conjoint Analysis p1 p2 : Interdependence –Principal Component Analysis –Factor Analysis –Cluster Analysis –Loglinear Models –Correspondence Analysis 04.01.16 Missing Value Analysis 18
19
Type of relationship Number of DV(s) Type of interdependence Structural equations modeling Canonical correlation analysis Multivariate analysis of variance Scale of variables Several DV(s) Multiple relations Dependence Interdependence Several DV(s) Single relation Metric DV(s) Metric IV(s) Metric DV(s) Non-metric IV(s) One DV Single relation Multiple Regression; Conjoint Analysis Discriminant analysis; Linear probability models Metric DV Non-metric DV Factor analysis (FA); Confirmatory FA; Principal component analysis Between variables Cluster analysis Between Respondents/cases A Decision Tree 04.01.16 Missing Value Analysis 19
20
Garbage in, roses out? Data is the foundation for analytics – Quality & quantity concerns – Sample size affects all results – Know your data well Careful model evaluation needed – generalization vs. over-fitting – prefer parsimonious model – ensure practical significance as well statistical significance 04.01.16 Missing Value Analysis 20
21
Structured Approach to Modeling Define the research problem Stage 1 What is the research problem and objectives? What multivariate techniques should be used? Develop the analysis plan Stage 2 Implementation issues (sample sizes, allowable variable types, estimation) Check the model assumptions Stage 3 Are the underlying assumptions of the chosen multivariate model satisfied? e.g. normality, linearity, independence of error terms, equality of variances Evaluate overall model fit Stage 4 Does the model achieve acceptable levels on statistical criteria (e.g. significance)? Are the proposed relationships identified? Is the result practically significant? Interpret the variates Stage 5 Analysing effects for individual variables by examining the estimated coefficients Is there empirical evidence of multivariate relationships that can be generalized? Validate the model Stage 6 How does the model on a hold-out dataset? Demonstrate the generalizability of the results to total population 04.01.16 Missing Value Analysis 21
22
Part III: Preparing for multivariate analysis Missing Value Analysis and Imputation 22
23
What is Missing Data? Missing data often occur when a respondent fails to answer one or more questions in a survey. Missing Data ~ information not available for a subject (or case) about whom other information is available. 04.01.16 Missing Value Analysis 23
24
24 What is Missing Data? 04.01.16 Missing Value Analysis
25
Why Do We Have Missing Data? 25 Missing Data Process ~ Any systematic event external to the respondent (such as data entry errors or data collection problems) or any action on the part of the respondent (such as refusal to answer a question) that leads to missing data 04.01.16 Missing Value Analysis
26
Why Do We Have Missing Data? Examples of reasons (from survey research): Respondent does not want to respond to a question Respondent is not able to respond to a question Question too difficult or complicated Ignored by accident Missing Value has a meaning 26 04.01.16 Missing Value Analysis
27
Is it serious? Well, it depends on … 1.The pattern of missing data 2.The amount of missing data 3.The reasons why it is missing 27 04.01.16 Missing Value Analysis
28
Patterns of Missing Data MCAR = missing completely at random –The distribution of missing data is unpredictable (i.e. the cases with missing data are indistinguishable from cases with complete data) MAR = missing at random (a.k.a. ignorable non- response) –The pattern is predictable from other variables in the data MNAR = missing not at random or non-ignorable –The pattern is related to the dependent variable and cannot be ignored 28 MCAR (The Good) MAR (The Bad) MNAR (The Ugly) 04.01.16 Missing Value Analysis
29
Patterns of Missing Data (cont’d) Let us assume we have a data set [Y, X]: –Y denotes the complete data consisting of two parts: Yobs, the observed data, and Ymis, the data which has missing values –X is additional data –Mi = 1 if i-th observation has missing value in Y The data is MCAR if MAR if MNAR if 04.01.16 Missing Value Analysis 29
30
Practice Spotting the Patterns In groups of 2-3 people, go through the stack of cases and label them as MCAR, MAR, or MNAR. Write down your choice on each slide. Prepare to explain your choice for the other groups. 04.01.16 Missing Value Analysis 30
31
How Much Missing Data Is Too Much? Hair et al. (2010) Missing data under 10% for an individual case or observation can generally be ignored, except when the missing data occur in a specific non-random fashion (e.g., concentration in a specific set of questions, attrition at the end of the questionnaire, etc.). The number of cases with no missing data must be sufficient for the selected analysis technique if replacement values will not be substituted (imputed) for the missing data. 04.01.16 Missing Value Analysis 31
32
How to Deal with Missing Values? 04.01.16 Missing Value Analysis 32
33
04.01.16 Missing Value Analysis 33
34
04.01.16 Missing Value Analysis 34 Imputation = inserting a value into data in a “more or less fabricated way”
35
Ways to Deal With Missing Values Use all available –Compute distribution characteristics and relationships from all valid values Replacement –Case substitution –Mean substitution –Cold deck (i.e. external source) –Hot deck –Model based ■Expectation maximization ■Regression ■Multiple imputation (combine several models) 04.01.16 Missing Value Analysis 35
36
Rules of Thumb for Imputation Hair et al. (2010) When the amount of missing data is … Under 10%: – Any of the imputation methods should be fine. 10 to 20%: –For MCAR data, consider hot deck case substitution and regression methods –For MAR, model-based methods are necessary Over 20%: –If imputation is considered necessary, then use regression for MCAR and model-based for MAR 04.01.16 Missing Value Analysis 36
37
Imputation with valid data 04.01.16 Missing Value Analysis 37
38
Imputation using known values 04.01.16 Missing Value Analysis 38
39
Imputation by calculating values 04.01.16 Missing Value Analysis 39
40
Imputation of MAR processes 04.01.16 Missing Value Analysis 40
41
Imputation can be used also when… Existing value is partially missing (e.g., available as interval but unique value is not given) Existing value appears to be incorrect or “corrupted” Existing value is too confidential and cannot be revealed (e.g., some company datasets with detailed personal information) 04.01.16 Missing Value Analysis 41
42
Watch out for data with time dependence! 04.01.16 Missing Value Analysis 42 Source: anychart.com
43
04.01.16 Missing Value Analysis 43
44
Tutorial I: Missing values Form groups of 1-3 students. Check that SPSS is installed on your computer. Q1: Why worry about missing values? Q2: How to fix the missing value problem? 04.01.16 Missing Value Analysis 44
45
Further reading Flexible Imputation of Missing Data (Chapman & Hall/CRC Interdisciplinary Statistics) by Stef van Buuren Links: –http://www.stefvanbuuren.nl/mi/index.htmlhttp://www.stefvanbuuren.nl/mi/index.html –http://www.stefvanbuuren.nl/mi/Course.htmlhttp://www.stefvanbuuren.nl/mi/Course.html 04.01.16 Missing Value Analysis 45
46
Thank you! 46
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.