SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Basic Relationships Purpose of multiple regression Different types of multiple regression.

Slides:



Advertisements
Similar presentations
Principal component analysis
Advertisements

SW388R6 Data Analysis and Computers I Slide 1 Paired-Samples T-Test of Population Mean Differences Key Points about Statistical Test Sample Homework Problem.
One-sample T-Test of a Population Mean
5/15/2015Slide 1 SOLVING THE PROBLEM The one sample t-test compares two values for the population mean of a single variable. The two-sample test of a population.
Strategy for Complete Regression Analysis
Assumption of normality
Outliers Split-sample Validation
Detecting univariate outliers Detecting multivariate outliers
Chi-square Test of Independence
Outliers Split-sample Validation
Principal component analysis
Discriminant Analysis – Basic Relationships
Multiple Regression – Assumptions and Outliers
Multiple Regression – Basic Relationships
Multinomial Logistic Regression Basic Relationships
Regression Analysis We have previously studied the Pearson’s r correlation coefficient and the r2 coefficient of determination as measures of association.
Assumption of Homoscedasticity
SW388R6 Data Analysis and Computers I Slide 1 One-sample T-test of a Population Mean Confidence Intervals for a Population Mean.
Logistic Regression – Basic Relationships
Testing Assumptions of Linear Regression
Logistic Regression – Complete Problems
SW388R7 Data Analysis & Computers II Slide 1 Assumption of normality Transformations Assumption of normality script Practice problems.
Correlation Question 1 This question asks you to use the Pearson correlation coefficient to measure the association between [educ4] and [empstat]. However,
SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Split Sample Validation General criteria for split sample validation Sample problems.
Assumption of linearity
SW388R7 Data Analysis & Computers II Slide 1 Discriminant Analysis – Basic Relationships Discriminant Functions and Scores Describing Relationships Classification.
SW388R6 Data Analysis and Computers I Slide 1 Chi-square Test of Goodness-of-Fit Key Points for the Statistical Test Sample Homework Problem Solving the.
8/15/2015Slide 1 The only legitimate mathematical operation that we can use with a variable that we treat as categorical is to count the number of cases.
Stepwise Binary Logistic Regression
Sampling Distribution of the Mean Problem - 1
SW318 Social Work Statistics Slide 1 Estimation Practice Problem – 1 This question asks about the best estimate of the mean for the population. Recall.
Simple Linear Regression
Slide 1 SOLVING THE HOMEWORK PROBLEMS Simple linear regression is an appropriate model of the relationship between two quantitative variables provided.
8/20/2015Slide 1 SOLVING THE PROBLEM The two-sample t-test compare the means for two groups on a single variable. the The paired t-test compares the means.
SW388R7 Data Analysis & Computers II Slide 1 Logistic Regression – Hierarchical Entry of Variables Sample Problem Steps in Solving Problems.
8/23/2015Slide 1 The introductory statement in the question indicates: The data set to use: GSS2000R.SAV The task to accomplish: a one-sample test of a.
Slide 1 Stepwise Multiple Regression. Slide 2 Different Methods for Entering Variables in Multiple Regression  Different types of multiple regression.
SW388R7 Data Analysis & Computers II Slide 1 Assumption of Homoscedasticity Homoscedasticity (aka homogeneity or uniformity of variance) Transformations.
Hierarchical Binary Logistic Regression
Chi-Square Test of Independence Practice Problem – 1
Multinomial Logistic Regression Basic Relationships
Stepwise Multiple Regression
SW388R7 Data Analysis & Computers II Slide 1 Multinomial Logistic Regression: Complete Problems Outliers and Influential Cases Split-sample Validation.
Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.
Slide 1 Hierarchical Multiple Regression. Slide 2 Differences between standard and hierarchical multiple regression  Standard multiple regression is.
SW388R7 Data Analysis & Computers II Slide 1 Logistic Regression – Hierarchical Entry of Variables Sample Problem Steps in Solving Problems Homework Problems.
6/2/2016Slide 1 To extend the comparison of population means beyond the two groups tested by the independent samples t-test, we use a one-way analysis.
SW388R6 Data Analysis and Computers I Slide 1 Independent Samples T-Test of Population Means Key Points about Statistical Test Sample Homework Problem.
SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample.
SW388R6 Data Analysis and Computers I Slide 1 Multiple Regression Key Points about Multiple Regression Sample Homework Problem Solving the Problem with.
11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the.
Slide 1 The introductory statement in the question indicates: The data set to use (2001WorldFactBook) The task to accomplish (association between variables)
Chi-square Test of Independence
SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample.
SW318 Social Work Statistics Slide 1 One-way Analysis of Variance  1. Satisfy level of measurement requirements  Dependent variable is interval (ordinal)
SW388R6 Data Analysis and Computers I Slide 1 One-way Analysis of Variance and Post Hoc Tests Key Points about Statistical Test Sample Homework Problem.
SW318 Social Work Statistics Slide 1 Percentile Practice Problem (1) This question asks you to use percentile for the variable [marital]. Recall that the.
SW388R6 Data Analysis and Computers I Slide 1 Percentiles and Standard Scores Sample Percentile Homework Problem Solving the Percentile Problem with SPSS.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.
Practice Problem: Lambda (1)
SW388R7 Data Analysis & Computers II Slide 1 Detecting Outliers Detecting univariate outliers Detecting multivariate outliers.
12/23/2015Slide 1 The chi-square test of independence is one of the most frequently used hypothesis tests in the social sciences because it can be used.
SW388R7 Data Analysis & Computers II Slide 1 Principal component analysis Strategy for solving problems Sample problem Steps in principal component analysis.
(Slides not created solely by me – the internet is a wonderful tool) SW388R7 Data Analysis & Compute rs II Slide 1.
SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption.
Assumption of normality
Discriminant Analysis – Basic Relationships
Multiple Regression – Split Sample Validation
Multinomial Logistic Regression: Complete Problems
Presentation transcript:

SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Basic Relationships Purpose of multiple regression Different types of multiple regression Standard multiple regression Steps in solving standard multiple regression problems

SW388R7 Data Analysis & Computers II Slide 2 Purpose of multiple regression  The purpose of multiple regression is to analyze the relationship between metric or dichotomous independent variables and a metric dependent variable.  If there is a relationship, using the information in the independent variables will improve our accuracy in predicting values for the dependent variable.

SW388R7 Data Analysis & Computers II Slide 3 Types of multiple regression  There are three types of multiple regression, each of which is designed to answer a different question:  Standard multiple regression is used to evaluate the relationships between a set of independent variables and a dependent variable.  Hierarchical, or sequential, regression is used to examine the relationships between a set of independent variables and a dependent variable, after controlling for the effects of some other independent variables on the dependent variable.  Stepwise, or statistical, regression is used to identify the subset of independent variables that has the strongest relationship to a dependent variable.

SW388R7 Data Analysis & Computers II Slide 4 Standard multiple regression - 1  In standard multiple regression, all of the independent variables are entered into the regression equation at the same time.  The minimum expectation for multiple regression is that there is a statistically significant relationship between the set of independent variable and the dependent variable. An F test is used to determine if the relationship can be generalized to the population represented by the sample.  Multiple R and R² measure the strength of the relationship between the set of independent variables and the dependent variable.

SW388R7 Data Analysis & Computers II Slide 5 Standard multiple regression - 2  If there is an overall relationship between the set of independent variables and the dependent variable, we interpret the individual relationships of the independent variables.  A t-test is used to evaluate the individual relationship between each independent variable and the dependent variable.  If the relationship is statistically significant, its impact on the dependent variable is stated as higher (lower) scores on the independent variable are associated with (higher) lower scores on the dependent variable.

SW388R7 Data Analysis & Computers II Slide 6 Standard multiple regression - 3  If there is an overall relationship between the set of independent variables and the dependent variable, we can answer the question of which of the statistically significant predictors has the largest influence on the dependent variable, makes the largest difference in the value of the dependent variable.  The b coefficients represent the change in the dependent variable for a one-unit change in the independent variable. But, we cannot compare the b coefficients because they are scaled in different units.  However, the beta coefficients are standardized for comparison. The variable with the largest value for beta (positive or negative) has the largest influence on the value of the dependent variable.

SW388R7 Data Analysis & Computers II Slide 7 Plan for regression assignments  In this class, we will focus on the basic evaluation of relationships in standard multiple regression.  In the next class, we will include the evaluation of assumptions and outliers, and validation analysis to produce a more complete standard multiple regression solution.  In the following class, we will look at alternate methods for including variables in multiple regression: hierarchical multiple regression and stepwise multiple regression.

SW388R7 Data Analysis & Computers II Slide 8 Question 1 To answer the first question, we examine the level of measurement for each variable listed in the problem. Multiple regression requires that the dependent variable be metric and the independent variables be metric or dichotomous.

SW388R7 Data Analysis & Computers II Slide 9 Answer 1 "Frequency of attendance at religious services" [attend] is ordinal, satisfying the metric level of measurement requirement for the dependent variable, if we follow the convention of treating ordinal level variables as metric. Since some data analysts do not agree with this convention, a note of caution should be included in our interpretation. "Strength of religious affiliation" [reliten] and "frequency of prayer" [pray] are ordinal, satisfying the metric or dichotomous level of measurement requirement for independent variables, if we follow the convention of treating ordinal level variables as metric. Since some data analysts do not agree with this convention, a note of caution should be included in our interpretation. True with caution is the correct answer.

SW388R7 Data Analysis & Computers II Slide 10 Question 2 Having satisfied the level of measurement requirements, we turn our attention to the sample size requirements. To answer this question, and those after it, we need to compute the standard multiple regression in SPSS.

SW388R7 Data Analysis & Computers II Slide 11 Request a standard multiple regression To compute a multiple regression in SPSS, select the Regression | Linear command from the Analyze menu.

SW388R7 Data Analysis & Computers II Slide 12 Specify the variables and selection method First, move the dependent variable attend to the Dependent text box. Second, move the independent variables reliten and pray to the Independent(s) list box. Third, select the method for entering the variables into the analysis from the drop down Method menu. In this example, we accept the default of Enter for direct entry of all variables, which produces a standard multiple regression. Fourth, click on the Statistics… button to specify the statistics options that we want.

SW388R7 Data Analysis & Computers II Slide 13 Specify the statistics output options Second, mark the checkboxes for Model Fit and Descriptives. Third, click on the Continue button to close the dialog box. First, mark the checkboxes for Estimates on the Regression Coefficients panel.

SW388R7 Data Analysis & Computers II Slide 14 Request the regression output Click on the OK button to request the regression output.

SW388R7 Data Analysis & Computers II Slide 15 Answer 2 In the Descriptive Statistics table in the SPSS output, we see the number of cases with valid data for all of the variables included in our analysis. With 2 independent variables, we satisfy both the minimum and the preferred sample size requirement.

SW388R7 Data Analysis & Computers II Slide 16 Question 3 In order for the finding about overall relationship to be true, it must satisfy two conditions. First, the F test for the regression must be statistically significant at the stated alpha level. Second, the strength of the relationship must be correctly stated. If the relationship is true, but involves ordinal variables, a caution is added.

SW388R7 Data Analysis & Computers II Slide 17 Overall Relationship Between Independent Variables and the Dependent Variable - 1 The probability of the F statistic (49.824) for the overall regression relationship is <0.001, less than or equal to the level of significance of We reject the null hypothesis that there is no relationship between the set of independent variables and the dependent variable (R² = 0). We support the research hypothesis that there is a statistically significant relationship between the set of independent variables and the dependent variable.

SW388R7 Data Analysis & Computers II Slide 18 Overall Relationship Between Independent Variables and the Dependent Variable - 2 The Multiple R for the relationship between the set of independent variables and the dependent variable is 0.689, which would be characterized as strong using the rule of thumb that a correlation less than or equal to 0.20 is characterized as very weak; greater than 0.20 and less than or equal to 0.40 is weak; greater than 0.40 and less than or equal to 0.60 is moderate; greater than 0.60 and less than or equal to 0.80 is strong; and greater than 0.80 is very strong.

SW388R7 Data Analysis & Computers II Slide 19 Answer 3 We satisfied both conditions: the F test for the regression was statistically significant and the strength of the relationship was correctly identified. A caution results from the inclusion of ordinal variables.

SW388R7 Data Analysis & Computers II Slide 20 Question 4 In order for findings about individual relationships to be true, they must satisfy two conditions. First, the t test for the b coefficient must be statistically significant at the stated alpha level. Second, the statement of the relationship must be correct. If the relationship is true, but involves ordinal variables, a caution is added.

SW388R7 Data Analysis & Computers II Slide 21 Relationship of Individual Independent Variable to Dependent Variable - 1 Based on the statistical test of the b coefficient (t = 5.857, p<0.001) for the independent variable "strength of religious affiliation" [reliten], the null hypothesis that the slope or b coefficient was equal to 0 was rejected. The research hypothesis that there was a relationship between strength of religious affiliation and frequency of attendance at religious services was supported.

SW388R7 Data Analysis & Computers II Slide 22 Relationship of Individual Independent Variable to Dependent Variable - 2 Higher numeric values for strength of religious affiliation meant that survey respondents have been more strongly affiliated with their religion. To check whether the statement of the relationship is correct or not, we need to understand the pattern of the coding for the variable when it is ordinal level of measurement.

SW388R7 Data Analysis & Computers II Slide 23 Relationship of Individual Independent Variable to Dependent Variable - 3 Higher numeric values for frequency of attendance at religious services meant that survey respondents have attended religious services more often.

SW388R7 Data Analysis & Computers II Slide 24 Relationship of Individual Independent Variable to Dependent Variable - 4 The positive sign of the b coefficient (1.138) meant the relationship between the numeric values for strength of religious affiliation and frequency of attendance at religious services was a direct relationship, implying that higher numeric values for the independent variable (strength of religious affiliation) were associated with higher numeric values for the dependent variable (frequency of attendance at religious services). The correct statement in the relationship is: "survey respondents who have been more strongly affiliated with their religion have attended religious services more often".

SW388R7 Data Analysis & Computers II Slide 25 Answer 4 While the hypothesis test supports the existence of a relationship, the statement of the relationship in the problem is opposite to the correct statement, so the answer to the question is false.

SW388R7 Data Analysis & Computers II Slide 26 Question 5 The next question asks us to evaluate the relationship for the second independent variable. There will be a separate question for each of the independent variables.

SW388R7 Data Analysis & Computers II Slide 27 Relationship of Individual Independent Variable to Dependent Variable - 1 Based on the statistical test of the b coefficient (t = 4.145, p<0.001) for the independent variable "frequency of prayer" [pray], the null hypothesis that the slope or b coefficient was equal to 0 was rejected. The research hypothesis that there was a relationship between frequency of prayer and frequency of attendance at religious services was supported.

SW388R7 Data Analysis & Computers II Slide 28 Relationship of Individual Independent Variable to Dependent Variable - 2 Higher numeric values for frequency of prayer meant that survey respondents have prayed more often. To check whether the statement of the relationship is correct or not, we need to understand the pattern of the coding for the variable when it is ordinal level of measurement.

SW388R7 Data Analysis & Computers II Slide 29 Relationship of Individual Independent Variable to Dependent Variable - 3 The positive sign of the b coefficient (0.554) meant the relationship between frequency of prayer and frequency of attendance at religious services was a direct relationship, implying that higher numeric values for the independent variable (frequency of prayer) were associated with higher numeric values for the dependent variable (frequency of attendance at religious services). The correct statement in the relationship is: "survey respondents who have prayed more often have attended religious services more often".

SW388R7 Data Analysis & Computers II Slide 30 Answer 5 The hypothesis test supports the existence of the relationship, the statement of the relationship in the problem is a correct statement, so the answer to the question is true. A caution results from the inclusion of ordinal variables.

SW388R7 Data Analysis & Computers II Slide 31 Question 6 The next question asks us to identify which predictor has the largest effect on the dependent variable. The largest effect is operationally defined as the largest change in the dependent variable associated with a one-unit change in the independent variables.

SW388R7 Data Analysis & Computers II Slide 32 Independent Variable with Largest Effect on the Dependent Variable - 1 To answer this question, we look for the largest value in the column of standardized beta coefficients, irrespective of sign. In this example, the beta coefficient of for strength of affiliation is larger than the beta coefficient of for how often the respondent prays.

SW388R7 Data Analysis & Computers II Slide 33 Answer 6 The answer to the question is true because the correct variable was identified as having the largest influence on the dependent variable. A caution results from the inclusion of ordinal variables.

SW388R7 Data Analysis & Computers II Slide 34 Steps in answering questions about standard multiple regression - 1 Incorrect application of a statistic Yes No Is the dependent variable metric and the independent variables metric or dichotomous? Question: Variables included in the analysis satisfy the level of measurement requirements?

SW388R7 Data Analysis & Computers II Slide 35 Standard multiple regression - 2 Compute the standard multiple regression in SPSS Yes Ratio of cases to independent variables at least 5 to 1? Yes No Inappropriate application of a statistic Question: Number of variables and cases satisfy sample size requirements? Yes Ratio of cases to independent variables at preferred sample size of at least 15 to 1? No True True with caution

SW388R7 Data Analysis & Computers II Slide 36 Standard multiple regression - 3 Yes Probability of F test of regression less than/equal to level of significance? No False Yes Strength of relationship for included variables interpreted correctly? No False Question: Finding about overall relationship between dependent variable and independent variables. Ordinal variables included in the relationship? No Yes True True with caution

SW388R7 Data Analysis & Computers II Slide 37 Standard multiple regression - 4 Yes Probability of t test between each IV and DV <= level of significance? Yes No Yes Direction of relationship between IV and DV interpreted correctly? Yes No False Question: Finding about individual relationship between independent variable and dependent variable. Ordinal variables included in the relationship? No Yes True True with caution

SW388R7 Data Analysis & Computers II Slide 38 Standard multiple regression - 5 Does the stated variable have the largest beta coefficient (ignoring sign)? No False Question: Finding about independent variable with largest impact on dependent variable. Ordinal variables included in the relationship? No Yes True True with caution Yes