SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Basic Relationships Purpose of multiple regression Different types of multiple regression Standard multiple regression Steps in solving standard multiple regression problems
SW388R7 Data Analysis & Computers II Slide 2 Purpose of multiple regression The purpose of multiple regression is to analyze the relationship between metric or dichotomous independent variables and a metric dependent variable. If there is a relationship, using the information in the independent variables will improve our accuracy in predicting values for the dependent variable.
SW388R7 Data Analysis & Computers II Slide 3 Types of multiple regression There are three types of multiple regression, each of which is designed to answer a different question: Standard multiple regression is used to evaluate the relationships between a set of independent variables and a dependent variable. Hierarchical, or sequential, regression is used to examine the relationships between a set of independent variables and a dependent variable, after controlling for the effects of some other independent variables on the dependent variable. Stepwise, or statistical, regression is used to identify the subset of independent variables that has the strongest relationship to a dependent variable.
SW388R7 Data Analysis & Computers II Slide 4 Standard multiple regression - 1 In standard multiple regression, all of the independent variables are entered into the regression equation at the same time. The minimum expectation for multiple regression is that there is a statistically significant relationship between the set of independent variable and the dependent variable. An F test is used to determine if the relationship can be generalized to the population represented by the sample. Multiple R and R² measure the strength of the relationship between the set of independent variables and the dependent variable.
SW388R7 Data Analysis & Computers II Slide 5 Standard multiple regression - 2 If there is an overall relationship between the set of independent variables and the dependent variable, we interpret the individual relationships of the independent variables. A t-test is used to evaluate the individual relationship between each independent variable and the dependent variable. If the relationship is statistically significant, its impact on the dependent variable is stated as higher (lower) scores on the independent variable are associated with (higher) lower scores on the dependent variable.
SW388R7 Data Analysis & Computers II Slide 6 Standard multiple regression - 3 If there is an overall relationship between the set of independent variables and the dependent variable, we can answer the question of which of the statistically significant predictors has the largest influence on the dependent variable, makes the largest difference in the value of the dependent variable. The b coefficients represent the change in the dependent variable for a one-unit change in the independent variable. But, we cannot compare the b coefficients because they are scaled in different units. However, the beta coefficients are standardized for comparison. The variable with the largest value for beta (positive or negative) has the largest influence on the value of the dependent variable.
SW388R7 Data Analysis & Computers II Slide 7 Plan for regression assignments In this class, we will focus on the basic evaluation of relationships in standard multiple regression. In the next class, we will include the evaluation of assumptions and outliers, and validation analysis to produce a more complete standard multiple regression solution. In the following class, we will look at alternate methods for including variables in multiple regression: hierarchical multiple regression and stepwise multiple regression.
SW388R7 Data Analysis & Computers II Slide 8 Question 1 To answer the first question, we examine the level of measurement for each variable listed in the problem. Multiple regression requires that the dependent variable be metric and the independent variables be metric or dichotomous.
SW388R7 Data Analysis & Computers II Slide 9 Answer 1 "Frequency of attendance at religious services" [attend] is ordinal, satisfying the metric level of measurement requirement for the dependent variable, if we follow the convention of treating ordinal level variables as metric. Since some data analysts do not agree with this convention, a note of caution should be included in our interpretation. "Strength of religious affiliation" [reliten] and "frequency of prayer" [pray] are ordinal, satisfying the metric or dichotomous level of measurement requirement for independent variables, if we follow the convention of treating ordinal level variables as metric. Since some data analysts do not agree with this convention, a note of caution should be included in our interpretation. True with caution is the correct answer.
SW388R7 Data Analysis & Computers II Slide 10 Question 2 Having satisfied the level of measurement requirements, we turn our attention to the sample size requirements. To answer this question, and those after it, we need to compute the standard multiple regression in SPSS.
SW388R7 Data Analysis & Computers II Slide 11 Request a standard multiple regression To compute a multiple regression in SPSS, select the Regression | Linear command from the Analyze menu.
SW388R7 Data Analysis & Computers II Slide 12 Specify the variables and selection method First, move the dependent variable attend to the Dependent text box. Second, move the independent variables reliten and pray to the Independent(s) list box. Third, select the method for entering the variables into the analysis from the drop down Method menu. In this example, we accept the default of Enter for direct entry of all variables, which produces a standard multiple regression. Fourth, click on the Statistics… button to specify the statistics options that we want.
SW388R7 Data Analysis & Computers II Slide 13 Specify the statistics output options Second, mark the checkboxes for Model Fit and Descriptives. Third, click on the Continue button to close the dialog box. First, mark the checkboxes for Estimates on the Regression Coefficients panel.
SW388R7 Data Analysis & Computers II Slide 14 Request the regression output Click on the OK button to request the regression output.
SW388R7 Data Analysis & Computers II Slide 15 Answer 2 In the Descriptive Statistics table in the SPSS output, we see the number of cases with valid data for all of the variables included in our analysis. With 2 independent variables, we satisfy both the minimum and the preferred sample size requirement.
SW388R7 Data Analysis & Computers II Slide 16 Question 3 In order for the finding about overall relationship to be true, it must satisfy two conditions. First, the F test for the regression must be statistically significant at the stated alpha level. Second, the strength of the relationship must be correctly stated. If the relationship is true, but involves ordinal variables, a caution is added.
SW388R7 Data Analysis & Computers II Slide 17 Overall Relationship Between Independent Variables and the Dependent Variable - 1 The probability of the F statistic (49.824) for the overall regression relationship is <0.001, less than or equal to the level of significance of We reject the null hypothesis that there is no relationship between the set of independent variables and the dependent variable (R² = 0). We support the research hypothesis that there is a statistically significant relationship between the set of independent variables and the dependent variable.
SW388R7 Data Analysis & Computers II Slide 18 Overall Relationship Between Independent Variables and the Dependent Variable - 2 The Multiple R for the relationship between the set of independent variables and the dependent variable is 0.689, which would be characterized as strong using the rule of thumb that a correlation less than or equal to 0.20 is characterized as very weak; greater than 0.20 and less than or equal to 0.40 is weak; greater than 0.40 and less than or equal to 0.60 is moderate; greater than 0.60 and less than or equal to 0.80 is strong; and greater than 0.80 is very strong.
SW388R7 Data Analysis & Computers II Slide 19 Answer 3 We satisfied both conditions: the F test for the regression was statistically significant and the strength of the relationship was correctly identified. A caution results from the inclusion of ordinal variables.
SW388R7 Data Analysis & Computers II Slide 20 Question 4 In order for findings about individual relationships to be true, they must satisfy two conditions. First, the t test for the b coefficient must be statistically significant at the stated alpha level. Second, the statement of the relationship must be correct. If the relationship is true, but involves ordinal variables, a caution is added.
SW388R7 Data Analysis & Computers II Slide 21 Relationship of Individual Independent Variable to Dependent Variable - 1 Based on the statistical test of the b coefficient (t = 5.857, p<0.001) for the independent variable "strength of religious affiliation" [reliten], the null hypothesis that the slope or b coefficient was equal to 0 was rejected. The research hypothesis that there was a relationship between strength of religious affiliation and frequency of attendance at religious services was supported.
SW388R7 Data Analysis & Computers II Slide 22 Relationship of Individual Independent Variable to Dependent Variable - 2 Higher numeric values for strength of religious affiliation meant that survey respondents have been more strongly affiliated with their religion. To check whether the statement of the relationship is correct or not, we need to understand the pattern of the coding for the variable when it is ordinal level of measurement.
SW388R7 Data Analysis & Computers II Slide 23 Relationship of Individual Independent Variable to Dependent Variable - 3 Higher numeric values for frequency of attendance at religious services meant that survey respondents have attended religious services more often.
SW388R7 Data Analysis & Computers II Slide 24 Relationship of Individual Independent Variable to Dependent Variable - 4 The positive sign of the b coefficient (1.138) meant the relationship between the numeric values for strength of religious affiliation and frequency of attendance at religious services was a direct relationship, implying that higher numeric values for the independent variable (strength of religious affiliation) were associated with higher numeric values for the dependent variable (frequency of attendance at religious services). The correct statement in the relationship is: "survey respondents who have been more strongly affiliated with their religion have attended religious services more often".
SW388R7 Data Analysis & Computers II Slide 25 Answer 4 While the hypothesis test supports the existence of a relationship, the statement of the relationship in the problem is opposite to the correct statement, so the answer to the question is false.
SW388R7 Data Analysis & Computers II Slide 26 Question 5 The next question asks us to evaluate the relationship for the second independent variable. There will be a separate question for each of the independent variables.
SW388R7 Data Analysis & Computers II Slide 27 Relationship of Individual Independent Variable to Dependent Variable - 1 Based on the statistical test of the b coefficient (t = 4.145, p<0.001) for the independent variable "frequency of prayer" [pray], the null hypothesis that the slope or b coefficient was equal to 0 was rejected. The research hypothesis that there was a relationship between frequency of prayer and frequency of attendance at religious services was supported.
SW388R7 Data Analysis & Computers II Slide 28 Relationship of Individual Independent Variable to Dependent Variable - 2 Higher numeric values for frequency of prayer meant that survey respondents have prayed more often. To check whether the statement of the relationship is correct or not, we need to understand the pattern of the coding for the variable when it is ordinal level of measurement.
SW388R7 Data Analysis & Computers II Slide 29 Relationship of Individual Independent Variable to Dependent Variable - 3 The positive sign of the b coefficient (0.554) meant the relationship between frequency of prayer and frequency of attendance at religious services was a direct relationship, implying that higher numeric values for the independent variable (frequency of prayer) were associated with higher numeric values for the dependent variable (frequency of attendance at religious services). The correct statement in the relationship is: "survey respondents who have prayed more often have attended religious services more often".
SW388R7 Data Analysis & Computers II Slide 30 Answer 5 The hypothesis test supports the existence of the relationship, the statement of the relationship in the problem is a correct statement, so the answer to the question is true. A caution results from the inclusion of ordinal variables.
SW388R7 Data Analysis & Computers II Slide 31 Question 6 The next question asks us to identify which predictor has the largest effect on the dependent variable. The largest effect is operationally defined as the largest change in the dependent variable associated with a one-unit change in the independent variables.
SW388R7 Data Analysis & Computers II Slide 32 Independent Variable with Largest Effect on the Dependent Variable - 1 To answer this question, we look for the largest value in the column of standardized beta coefficients, irrespective of sign. In this example, the beta coefficient of for strength of affiliation is larger than the beta coefficient of for how often the respondent prays.
SW388R7 Data Analysis & Computers II Slide 33 Answer 6 The answer to the question is true because the correct variable was identified as having the largest influence on the dependent variable. A caution results from the inclusion of ordinal variables.
SW388R7 Data Analysis & Computers II Slide 34 Steps in answering questions about standard multiple regression - 1 Incorrect application of a statistic Yes No Is the dependent variable metric and the independent variables metric or dichotomous? Question: Variables included in the analysis satisfy the level of measurement requirements?
SW388R7 Data Analysis & Computers II Slide 35 Standard multiple regression - 2 Compute the standard multiple regression in SPSS Yes Ratio of cases to independent variables at least 5 to 1? Yes No Inappropriate application of a statistic Question: Number of variables and cases satisfy sample size requirements? Yes Ratio of cases to independent variables at preferred sample size of at least 15 to 1? No True True with caution
SW388R7 Data Analysis & Computers II Slide 36 Standard multiple regression - 3 Yes Probability of F test of regression less than/equal to level of significance? No False Yes Strength of relationship for included variables interpreted correctly? No False Question: Finding about overall relationship between dependent variable and independent variables. Ordinal variables included in the relationship? No Yes True True with caution
SW388R7 Data Analysis & Computers II Slide 37 Standard multiple regression - 4 Yes Probability of t test between each IV and DV <= level of significance? Yes No Yes Direction of relationship between IV and DV interpreted correctly? Yes No False Question: Finding about individual relationship between independent variable and dependent variable. Ordinal variables included in the relationship? No Yes True True with caution
SW388R7 Data Analysis & Computers II Slide 38 Standard multiple regression - 5 Does the stated variable have the largest beta coefficient (ignoring sign)? No False Question: Finding about independent variable with largest impact on dependent variable. Ordinal variables included in the relationship? No Yes True True with caution Yes