Download presentation
Presentation is loading. Please wait.
1
Multivariate Analysis Lec 4
Regression Analysis A single dependent (criterion) variable and several independent (predictor) variables Weighted independent variables to ensure maximal prediction Regression variate To apply The data must be metric, or appropriately transformed Decision which variable is to be dependent and which remaining variables will be independent Fall, 2008 Multivariate Analysis Lec 4
2
Multivariate Analysis Lec 4
An Example Fall, 2008 Multivariate Analysis Lec 4
3
Setting a Baseline Prediction without an Independent Variables
Predicted # of credit cards = Average # of credit cards But how accurate is the baseline prediction The sum of squared errors (SSE) Giving the amount of prediction errors Fall, 2008 Multivariate Analysis Lec 4
4
Multivariate Analysis Lec 4
Fall, 2008 Multivariate Analysis Lec 4
5
Multivariate Analysis Lec 4
Simple Regression Fall, 2008 Multivariate Analysis Lec 4
6
Specifying the Equation
Ŷ = B0 + B1V1 Regression coefficient Prediction error – residual (e) Least squares Fall, 2008 Multivariate Analysis Lec 4
7
Multivariate Analysis Lec 4
Fall, 2008 Multivariate Analysis Lec 4
8
Confidence Interval for the Prediction
Like to estimate the range of predicted values that we might expect, rather than replying just on the single (point) estimate Standard error of the estimate (SEE) – establish the upper and lower bounds for our prediction Fall, 2008 Multivariate Analysis Lec 4
9
Assessing prediction Accuracy
The sum of squares regression (SSR) Total sum of squares (TSS) TSS = SSE + SSR Coefficient of Determination (R2) = SSR/TSS = (TSS-SSE)/TSS Sign and strength of the relationship Fall, 2008 Multivariate Analysis Lec 4
10
Multivariate Analysis Lec 4
Multiple Regression The impact of multicollinearity Reducing the predictive power of any single independent variable The multiple regression equation Predicted # cards = b0+b1V1+b2V2+e Fall, 2008 Multivariate Analysis Lec 4
11
Multivariate Analysis Lec 4
Fall, 2008 Multivariate Analysis Lec 4
12
A Decision Process for MRA
Factors that impact the creation, estimation, interpretation, and validation of a regression analysis Fall, 2008 Multivariate Analysis Lec 4
13
Multivariate Analysis Lec 4
Fall, 2008 Multivariate Analysis Lec 4
14
Stage 1: Objectives of MR
Research problems appropriate for MR Prediction Maximize the overall predictive power (achieve acceptable levels of predictive accuracy) Compare two or more sets of independent variables Explanation – interpretation of the variate The importance of the IVs The types of relationships found The interrelationships among the independent variables Fall, 2008 Multivariate Analysis Lec 4
15
Multivariate Analysis Lec 4
Specifying a statistical relationship A functional relationship and a statistical relationship Fall, 2008 Multivariate Analysis Lec 4
16
Multivariate Analysis Lec 4
Selection of DV and IV’s Indiscriminately and solely on empirical ground Measurement error in DV Specification error for IV’s selection Inclusion of irrelevant variables Reduce model parsimony Mask or replace the effects of more useful variables Make the test of significance less precise Exclusion of relevant variables (most troublesome) Bias the results Measurement error in IV’s Fall, 2008 Multivariate Analysis Lec 4
17
Stage 2: Research Design
Sample size Statistical power and sample size Detecting significant R2 and coefficients > 20: simple regression and too lower the power <1000: tests are too powerful Fall, 2008 Multivariate Analysis Lec 4
18
Multivariate Analysis Lec 4
Fixed vs. Ransom Effects Predictors A random IV, selected at random Most regression models based on survey data are random effects models: the IV’s are randomly selected from the population (inference regarding the population) Two estimation procedures are the same except for error terms In the random effects model, a portion of the random error comes from the sampling of the independent variables Procedures based on the fixed model are robust Fall, 2008 Multivariate Analysis Lec 4
19
Multivariate Analysis Lec 4
Creating additional variables The basic relationship – the linear association between metric DV and IV’s based on product moment correlation Transformations Desire to deal with nonmetric data and nonlinear relationships Theoretical reason: the nature of data Data derived: by examining the data Fall, 2008 Multivariate Analysis Lec 4
20
Multivariate Analysis Lec 4
Incorporating nonmetric data with dummy variables Indicator coding Differences in means from the reference category (a all zeroes category) Fall, 2008 Multivariate Analysis Lec 4
21
Multivariate Analysis Lec 4
Representing curvilinear effects with polynomials Power transformations of an independent variable Only simple curvilinear relationships – U-shaped relationships No statistical means for assessing whether the curvilinear or linear relationship model is more appropriate Accommodate only univariate relationships Y = b0+b1X1+b2X12 Interaction or moderator effects Y = b0+b1X1+b2X2+b3X1X2 Fall, 2008 Multivariate Analysis Lec 4
22
Multivariate Analysis Lec 4
Stage 3: Assumptions Assessing individual variables vs. the variate Principal measure of predictive error for the variate: the residual Need some form of standardization: Studentized residual Residual plot The residual vs. the predicted dependent values Null plot: when all assumptions are met Fall, 2008 Multivariate Analysis Lec 4
23
Multivariate Analysis Lec 4
Fall, 2008 Multivariate Analysis Lec 4
24
Multivariate Analysis Lec 4
Linearity of the phenomenon The degree to which the change in the dependent variable is associated to the IV’s The regression coefficient is constant across the range of values for IV The concept of correlation: linearity Partial regression plots The relationship between a specific IV and the DV Non-horizontal line, slope up or down Examine the residuals around the line Fall, 2008 Multivariate Analysis Lec 4
25
Multivariate Analysis Lec 4
Constant variance of the error term Unequal variance (heteroscedasticity): the most common assumption violation The most common pattern of residual plot: triangle-shaped in either direction A diamond-shaped – more variance in the midrange A number of violations can occur simultaneously Levene test for homogeneity of variance Remedies Weighted least squares Variance-stabilizing transformations Fall, 2008 Multivariate Analysis Lec 4
26
Multivariate Analysis Lec 4
Independence of the Error Terms Basic assumption in regression: each predicted value is independent, I.e., not sequenced by any variable Association with time Basic model conditions change: seasonal effect Normality of the error term distribution Normal probability plot Fall, 2008 Multivariate Analysis Lec 4
27
Multivariate Analysis Lec 4
Fall, 2008 Multivariate Analysis Lec 4
28
Stage 4: Estimating the Regression Model and Overall Model Fit
Select a method for model specification Assess the significance of the overall model Any undue influential observations Fall, 2008 Multivariate Analysis Lec 4
29
Multivariate Analysis Lec 4
General approaches to variable selection Confirmatory specification - though in concept, must assured that the set of variables achieve the maximum prediction while maintaining a parsimonious model Sequential search methods Stepwise estimation: Based on incremental contribution Only one variable at a time, no combined effect Forward addition and backward elimination: Largely trial-and-error process Caveats Multicollinearity Multiple significance tests in the stepwise procedure Fall, 2008 Multivariate Analysis Lec 4
30
Multivariate Analysis Lec 4
Fall, 2008 Multivariate Analysis Lec 4
31
Multivariate Analysis Lec 4
Testing the regression variate for meeting the regression assumptions Examining the statistical significance of our model The F ratio Adjusted R2 Significance tests of regression coefficient Sampling variation for estimated regression coefficients (Table 4.8) Fall, 2008 Multivariate Analysis Lec 4
32
Multivariate Analysis Lec 4
Fall, 2008 Multivariate Analysis Lec 4
33
Multivariate Analysis Lec 4
Identifying influential observations Fall, 2008 Multivariate Analysis Lec 4
34
Stage 5: Interpreting the Regression Variate
Using the regression coefficients Used to calculate the predicted values Interpretation: individual IV’s impact The problem of measurement scales Beta coefficient (unit: SD) Used when collinearity is minimal Interpreted only in the context of (relation to) other IV’s in the equation The levels affect the beta value Fall, 2008 Multivariate Analysis Lec 4
35
Multivariate Analysis Lec 4
Assessing multicollinearity The effect of multicollinearity Estimation: the ability of the regression procedure and the researcher to represent and understand the effects of each IV in the regression variate Limited the size of the coefficient of determination (difficult to add unique explanatory prediction) Difficult to determine the contribution of individual IV Fall, 2008 Multivariate Analysis Lec 4
36
Multivariate Analysis Lec 4
Identifying multicollinearity The extent of collinearity The degree to which the estimated coefficients are affected The simplest and obvious means: correlational matrix (generally, 0.9 and above) However, collinearity may be due to combined effect Common measures The tolerance value (the variation of an IV explained by other IV’s): 0.1 Its inverse – the variance inflation factor (VIF): 10 Fall, 2008 Multivariate Analysis Lec 4
37
Multivariate Analysis Lec 4
Fall, 2008 Multivariate Analysis Lec 4
38
Multivariate Analysis Lec 4
Remedies for multicollinearity Omitting some IV(s) Use the model for prediction only Use the simple correlation between a DV and an IV to understand the relationship Use a more sophisticated method, such as Bayesian regression or regression on principal components Make a judgment on the variables included in the regression variate, which should always be guided by the theoretical background of the study Fall, 2008 Multivariate Analysis Lec 4
39
Stage 6: Validation the Results
Additional or split sample Calculating the PRESS statistic Estimate n –1 regression models Similar to R square for predictive accuracy Similar to bootstrapping Comparing regression models R square increases with the # of IV’s Use adjusted R square Fall, 2008 Multivariate Analysis Lec 4
40
Multivariate Analysis Lec 4
Predicting with the model Apply the model to a new set of data Factors to be considered Considering the sampling variations from both samples: confidence intervals of predictions Conditions and relationships have not changed Use the model to estimate beyond the range of IVs Fall, 2008 Multivariate Analysis Lec 4
41
Multivariate Analysis Lec 4
Fall, 2008 Multivariate Analysis Lec 4
42
Multivariate Analysis Lec 4
Fall, 2008 Multivariate Analysis Lec 4
43
Multivariate Analysis Lec 4
Fall, 2008 Multivariate Analysis Lec 4
44
Multivariate Analysis Lec 4
Fall, 2008 Multivariate Analysis Lec 4
45
Multivariate Analysis Lec 4
Interpreting the regression variate Y = X9 + .369X X12 + (-.417)X7 + .174X11 Multicollinearity Fall, 2008 Multivariate Analysis Lec 4
46
Multivariate Analysis Lec 4
Fall, 2008 Multivariate Analysis Lec 4
47
Multivariate Analysis Lec 4
Fall, 2008 Multivariate Analysis Lec 4
48
Multivariate Analysis Lec 4
Fall, 2008 Multivariate Analysis Lec 4
49
Multivariate Analysis Lec 4
Fall, 2008 Multivariate Analysis Lec 4
50
Multivariate Analysis Lec 4
Fall, 2008 Multivariate Analysis Lec 4
51
Multivariate Analysis Lec 4
Fall, 2008 Multivariate Analysis Lec 4
52
Multivariate Analysis Lec 4
Fall, 2008 Multivariate Analysis Lec 4
53
Multivariate Analysis Lec 4
Fall, 2008 Multivariate Analysis Lec 4
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.