Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multivariate Data Analysis

Similar presentations


Presentation on theme: "Multivariate Data Analysis"— Presentation transcript:

1 Multivariate Data Analysis
Chao-Min Chiu National Sun Yat-sen University 2010 Lecture 1

2 The Instructor BA: National Taiwan Institute of Technology
Industrial Management MS: National Taiwan Institute of Technology Information Management PhD: Rutgers University (The State University of New Jersey) Computers & Information Systems Phone: #4733 2010 Lecture 1

3 What Is It? Referring to all statistical methods that simultaneously analyze multiple measurements on each individual or object under investigation Loosely: any simultaneous analysis of more than two variables Truly Multivariate: All variables must be random and interrelated in the such ways that their different effects cannot meaningfully be interpreted separately Purpose: To measure, explain and predict the degree of relationship among variates (multiple combinations of variables) 2010 Lecture 1

4 Basic Concepts X1 = income X2 = education The Variate
A linear combination of variables with empirically determined weights A variable of n weighted variables (X1 to Xn) Variate Value = w1X1+w2X2+ … + wnXn Each respondent has a variate value (Y’). The Y’ value is a linear combination of the entire set of variables. It is the dependent variable. Potential Independent Variables: X1 = income X2 = education X3 = family size X4 = ?? 2010 Lecture 1

5 Measurement Scales The key: measurement
Important in accurately representing the concept of interest Instrumental in the selection of the appropriate multivariate method of analysis Nonmetric (qualitative) Metric (quantitative) 2010 Lecture 1

6 Types of Data and Measurement Scales
Nonmetric or Qualitative Metric or Quantitative Nominal Scale Ordinal Scale Interval Scale Ratio Scale 2010 Lecture 1

7 Measurement Scales Nominal Scales: Assigns numbers as a way to label or identify subjects or objects. E.g., Gender (0: female; 1:male) Demographic attributes: gender、religion、occupation etc. Ordinal Scales: Variables can be ordered or ranked in relation to the amount of the attribute possessed. Satisfaction with products: (1: product A > 2: product B > 3: product C) 2010 Lecture 1

8 Measurement Scales Interval Scales: permitting nearly any mathematical operation to be performed. Difference between any two adjacent points on any part of the scale are equal. use an arbitrary zero point, e.g., temperature. Ratio Scales: All mathematical operations are permissible. use an absolute zero point, e.g., height. 2010 Lecture 1

9 Measurement Error Measurement Error: The degree to which the observed values are not representative of the “true” values Must address validity and reliability Validity is the degree to which a measure accurately represents what it is supposed to. Reliability is the degree to which the observed variable measures the true value and is error free; thus, it is the opposite of measurement error. 2010 Lecture 1

10 Measurement Error Multivariate measurements (summated scales)
Several variables are joined to in a composite measure to represent a concept Using several variables as indicators Satisfaction (S1, S2, S3, and S4) Perceived Usefulness (PU1, …… PU6) The impact of measurement error cannot be directly seen, requiring efforts to reduce it. 2010 Lecture 1

11 Statistical Significance & Statistical Power
Rely on statistical inference, except for cluster analysis & multidimensional scaling Require specifying the acceptable levels of statistical error Most common: Type I error, or Alpha (α) The prob. of rejecting the null hypothesis when actually true Showing statistical significance when it actually is not present – a “false positive” 2010 Lecture 1

12 When specifying Type I error, you also determine Type II error, or beta
Failing to reject the null hypothesis when it is actually false An more interesting prob. is 1-ß, the power of statistical inference: the prob of correctly rejecting the null hypothesis when it should be rejected; i.e., statistical inference will be indicated if it is present 2010 Lecture 1

13 Hypothesis testing Reality H0: no difference (H0 : true)
Ha: difference (H0 : false) H0 (Accept H0) Beta Type II Error 1 - Alpha Statistical Decision Alpha Type I Error Ha (Reject H0) 1 - Beta Power 2010 Lecture 1

14 Power is determined by three factors
Effect size: the actual magnitude of the effect of interest in the population, e.g., the SD of mean difference, and the actual correlation between the variables Alpha: As alpha becomes more restrictive (smaller), power decreases Reducing the chance of finding an incorrect significant effect, the probability of correctly finding an effect also decreases Sample size: Given an alpha level, increased sample size always produce greater power The danger of producing “too much” power 2010 Lecture 1

15 2010 Lecture 1

16 2010 Lecture 1

17 The relationships among alpha, sample size, effect size and power are very complicated
General guideline of Cohen: alpha level at least .05 with power levels of 80% Power become acceptable at sample sizes of 100 or more in situations with a moderate effect size (0.5). If anticipating the effects to be small -> larger sample sizes and/or less restrictive alpha levels 2010 Lecture 1

18 A Classification Dependence Techniques: having a dependent variable to be predicted Multiple regression Interdependence Techniques: simultaneous analysis of a set of variables Factor analysis 2010 Lecture 1

19 Dependence Techniques
Two characteristics: the # of dependent variables Single vs. multiple the type of measurement scale employed by the variables Metric (quantitative/numerical) vs. nonmetric (qualitative/categorical) 2010 Lecture 1

20 Dependence Methods Canonical correlation MANOVA ANOVA
Y1 + Y Yn (metric, nonmetric) = X1 + X Xn (metric, nonmetric) MANOVA Y1 + Y Yn (metric) = X1 + X Xn (nonmetric) ANOVA Y1 (metric) = X1 + X Xn (nonmetric) Multiple Discriminant Analysis Y1 (nonmetric) = X1 + X Xn (metric) 2010 Lecture 1

21 Multiple Regression Analysis Conjoint Analysis
Y1 (metric) = X1 + X Xn (metric, nonmetric) Conjoint Analysis Y1 (metric, nonmetric) = X1 + X Xn (nonmetric) Structural Equation Modeling Y1 (metric) = X11 + X X1n (metric, nonmetric) Y2 (metric) = X21 + X X2n (metric, nonmetric) Ym (metric) = Xm1 + Xm Xmn (metric, nonmetric) 2010 Lecture 1

22 TAM PU1 PU2 PU3 Perceived Usefulness Intention Usage to Use Behavior
Ease of Use Perceived Usefulness PU1 0.82 PU2 0.88 PU3 0.90 2010 Lecture 1

23 Problems A substitute for the necessary conceptual development
Complexity in the results and interpretation No single “answer” exists Some guidelines as a philosophy of multivariate analysis 2010 Lecture 1

24 Guidelines and Interpretation
Practical & statistical significance Focusing solely on the achieved significance of the results without understanding their interpretations, good or bad Look at practical significance: “So what?” Substantive & theoretical implications E.g., repurchase intention, predicted as 50% at the .05 significance level, yet could vary as much as ±20% 2010 Lecture 1

25 Sample Size Affects All Results
Too little statistical power for the test to identify significant results Too easily an “overfitting” of the data such that the results are artificially good (no generalizability) Large sample size: overly sensitive Multiple groups with unequal sample sizes Always assess the results in light of the sample used in the analysis 2010 Lecture 1

26 Know Your Data Complex relationships require careful examination
More rigorous examine the data because the influence of outliers, violations of assumptions, and missing data “Know where to look” for alternative formulations of the original model, such nonlinear and interactive relationships 2010 Lecture 1

27 Strive for Model Parsimony
Conceptual model development first Specification error: omitting a critical predictor variable; inserting variables indiscriminately – let the technique sort out Increasing fit at the expense of overfit Mask the true effect because of multicollinearity Avoiding conceptually irrelevant variables 2010 Lecture 1

28 Look at Your Error Rarely do we achieve the best prediction in the first analysis, so modification? But, “where does one go from here?” Look at the errors in prediction Residuals, Misclassification, outliers Errors: not a measure of failure, nor merely something to eliminate As a starting point for diagnosing the validity and an indication of the unexplained relationships 2010 Lecture 1

29 Validate Your Results Complex interrelationships means specific only to the sample and not generalizable Sufficient observations per estimate: overfitting Split the sample, holdout & test A bootstrapping technique Gathering a separate sample The objective is not to find the “best” fit, but develop a model that best describe the population as a whole. 2010 Lecture 1

30 Model Building A series of guidelines that emphasizes a model-building approach Focus on analyzing a well-defined research plan, starting with a conceptual model detailing the relationships to be examined Then empirical issues can be addressed Interpretation results Diagnosis of generalizability 2010 Lecture 1

31 Define the Research Problem, Objectives, and Multivariate Technique to Be Used
Specifying in conceptual terms Primary importance of conceptual model, or theory Defining the concepts and identifying the fundamental relationships to be investigated A conceptual model need not be complex and detailed A concept, rather than a variable, is defined Identify the ideas or topics of interest Dependency vs. Interdependency 2010 Lecture 1

32 Develop the Analysis Plan
Turn to the implementation issues With a techniques chosen, develop a analysis plan that addresses the set of issues particular to its purpose and design Minimum or desired sample sizes Allowable or required types of variables Estimation methods Resolving specific details and finalize the model formulation and requirements for data collection 2010 Lecture 1

33 Evaluate the Assumptions Underlying the Multivariate Technique
First: to evaluate the underlying assumptions, conceptual or statistical For dependency techniques: Multivariate normality, linearity, independence of error terms, equality of variance Conceptual assumptions: Model formulation and types of relationship represented 2010 Lecture 1

34 Estimate the Multivariate Model and Assess Overall Model Fit
Estimate the model and assess overall model fit Choose among options to meet specific characteristics of the data or maximize the fit to the data Level of significance; proposed relationships; practical significance Before proceeding, obtain an acceptable model Results unduly affected by any single or small set of observations? Results are robust and stable 2010 Lecture 1

35 Interpret the Variate(s)
Interpreting the variate(s) reveals the nature of the multivariate relationship Individual variables: estimated coefficients Multiple variates: underlying dimensions of comparison or association May lead to additional respecification of variables and/or model formulation Identify empirical evidence of multivariate relationships in the sample data that can be generalized to the total population 2010 Lecture 1

36 Validate the Multivariate Model
Before accepting the results, subject to one final set of diagnostic analyses that assess the degree of generalizability of the results by the available validation methods Adding little to the results, but can be viewed as “insurance” that the results are The most descriptive of the data Generalize to the population 2010 Lecture 1

37 Bootstrapping, a computational nonparametric technique for " re-sampling, " enables researchers to draw a conclusion about the characteristics of a population strictly from the existing sample 2010 Lecture 1


Download ppt "Multivariate Data Analysis"

Similar presentations


Ads by Google