Download presentation
Presentation is loading. Please wait.
Published byAlvin Bridges Modified over 9 years ago
1
Chapter Sixteen Copyright © 2006 McGraw-Hill/Irwin Data Analysis: Testing for Association
2
McGraw-Hill/Irwin 2 1.Understand and evaluate the types of relationships between variables. 2.Explain the concepts of association and covariation. 3.Discuss the differences in chi square, Pearson correlation, and Spearman correlation. 4.Explain the concept of statistical significance versus practical significance. 5.Understand when and how to use regression analysis. Learning Objectives
3
McGraw-Hill/Irwin 3 Relationship–consistent and systematic link between two or more variables –First issue–are two or more variables related at all Presence of a relationship–systematic relationship exists between two or more variables Statistical significance–measures whether a relationship is present –Second issue–the direction of the relationship– positive or negative Understand and evaluate the types of relationships between variables Relationships Between Variables
4
McGraw-Hill/Irwin 4 –Third issue–understanding the strength of the association Weak–the low probability of the variables having a relationship Moderate Strong–high probability a consistent and systematic relationship exists Understand and evaluate the types of relationships between variables Relationships Between Variables
5
McGraw-Hill/Irwin 5 –Fourth issue–type of relationship Two variables–related and the nature of the relationship Linear relationship–between two variables whereby the strength and nature of the relationship remains the same over the range of both variables Curvilinear relationship–between two variables whereby the strength and/or direction of their relationship changes over the range of both variables Understand and evaluate the types of relationships between variables Relationships Between Variables
6
McGraw-Hill/Irwin 6 Three questions to ask about a relationship between two variables –Is there a relationship between the two variables we are interested in? –How strong is the relationship? –How can that relationship be best described? Understand and evaluate the types of relationships between variables Relationships Between Variables
7
McGraw-Hill/Irwin 7 Covariation–amount of change in one variable that is consistently related to the change in another variable of interest Scatter Diagram–graphic plot of the relative position of two variables using a horizontal and a vertical axis to represent the values of the respective variables Explain the concepts of association and covariation Using Covariation to Describe Variable Relationships
8
McGraw-Hill/Irwin 8 Explain the concepts of association and covariation Exhibit 16.1
9
McGraw-Hill/Irwin 9 Explain the concepts of association and covariation Exhibit 16.2
10
McGraw-Hill/Irwin 10 Explain the concepts of association and covariation Exhibit 16.3
11
McGraw-Hill/Irwin 11 Explain the concepts of association and covariation Exhibit 16.4
12
McGraw-Hill/Irwin 12 Chi-Square (X 2 ) Analysis–test for significance between the frequency distributions of two or more nominally scaled variables in a cross-tabulation table to determine if there is any association –Assesses how closely the observed frequencies fit the pattern of the expected frequencies and is referred to as a ”goodness-of-fit” test –Used to analyze nominal data which cannot be analyzed with other types of statistical analysis, such as ANOVA or t- tests –Results will be distorted if more than 20 percent of the cells have an expected count of less than 5 Discuss the differences in Chi- square, Pearson correlation, and Spearman correlation Using Covariation to Describe Variable Relationships
13
McGraw-Hill/Irwin 13 Explain the concepts of association and covariation Exhibit 16.5
14
McGraw-Hill/Irwin 14 Pearson Correlation Coefficient–statistical measure of the strength of a linear relationship between two metric variables –It varies between – 1.00 and +1.00, with 0 representing absolutely no association between two variables, and – 1.00 and +1.00 representing perfect linkage between two variables –The higher the correlation coefficient–the stronger the level of association –The correlation coefficient can be either positive or negative–depending upon the direction of the relationship between two variables. Using Covariation to Describe Variable Relationships Discuss the differences in Chi- square, Pearson correlation, and Spearman correlation
15
McGraw-Hill/Irwin 15 –Null hypothesis states that there is no association between the two variables in the population and that the correlation coefficient is zero –The correlation coefficient statistically significant–null hypothesis is rejected and the conclusion is that the two variables do share some association in the population –The size of the correlation coefficient can be used to quantitatively describe the strength of the association between two variables Using Covariation to Describe Variable Relationships Discuss the differences in Chi- square, Pearson correlation, and Spearman correlation
16
McGraw-Hill/Irwin 16 Explain the concepts of association and covariation Exhibit 16.6
17
McGraw-Hill/Irwin 17 Pearson Correlation Coefficient–several assumptions about the nature of the data –The two variables are assumed to have been measured using interval or ratio-scaled measures –Nature of the relationship to be measured is linear –Variables to be analyzed come from a bivariate normally distributed population Using Covariation to Describe Variable Relationships Discuss the differences in Chi- square, Pearson correlation, and Spearman correlation
18
McGraw-Hill/Irwin 18 Explain the concepts of association and covariation Exhibit 16.7
19
McGraw-Hill/Irwin 19 Coefficient of Determination (r 2 )–a number measuring the proportion of variation in one variable accounted for by another. –The r 2 measure can be thought of as a percentage and varies from 0.0 to 1.00 –The larger the size of the coefficient of determination, the stronger the linear relationship between the two variables under study Using Covariation to Describe Variable Relationships Discuss the differences in Chi- square, Pearson correlation, and Spearman correlation
20
McGraw-Hill/Irwin 20 Spearman Rank Order Correlation Coefficient–a statistical measure of the linear association between two variables where both have been measured using ordinal (rank order) scales –If either one of the variables is represented by rank order data–use the Spearman rank order correlation coefficient –Spearman rank order correlation coefficient tends to produce the lower coefficient and is considered the more conservative measure Using Covariation to Describe Variable Relationships Discuss the differences in Chi- square, Pearson correlation, and Spearman correlation
21
McGraw-Hill/Irwin 21 Explain the concepts of association and covariation Exhibit 16.8
22
McGraw-Hill/Irwin 22 Explain the concepts of association and covariation Exhibit 16.9
23
McGraw-Hill/Irwin 23 Regression Analysis--One method for arriving at more detailed answers than can be provided by the correlation coefficient –Marketing manager interested in making predictions about future sales levels or how much impact a potential price increase will have on the profits or market share of the company–number of ways to make such predictions Extrapolation from past behavior of the variable Simple guesses Use of a regression equation which compares information about related variables to assist in the prediction What is Regression Analysis? Understand when and how to use regression analysis
24
McGraw-Hill/Irwin 24 Bivariate Regression Analysis–a statistical technique which analyzes the linear relationship between two variables by estimating coefficients for an equation for a straight line. One variable is designated as a dependent variable and the other is called an independent or predictor variable –Assumptions behind regression analysis Just like correlation analysis, regression analysis assumes that a linear relationship will provide a good description of the relationship between two variables –Even though the common terminology of regression analysis uses the labels dependent and independent for the variables, those names do not mean that one variable causes the behavior of the other What is Regression Analysis? Understand when and how to use regression analysis
25
McGraw-Hill/Irwin 25 The use of a simple regression model assumes 1.The variables of interest are measured on interval or ratio scales (except in the case of dummy variables) 2.These variables come from a bivariate normal population 3.The error terms associated with making predictions are normally and independently distributed What is Regression Analysis? Understand when and how to use regression analysis
26
McGraw-Hill/Irwin 26 Explain the concepts of association and covariation Exhibit 16.10
27
McGraw-Hill/Irwin 27 Formula for a Straight Line y=a + bX + e i where y=the dependent variable a=the intercept (point where the straight line intersects the y-axis when X = 0 b=the slope (the change in y for very 1-unit change in x) X=the independent variable used to predict y e i =the error for the prediction What is Regression Analysis? Understand when and how to use regression analysis
28
McGraw-Hill/Irwin 28 Regression Analysis–examine the relationship between the independent variable X and the dependent variable Y Least Squares Procedures–determines the best- fitting line by minimizing the vertical distances of all points from the line Test Statistical Significance–t-test is used to determine whether the computed intercept and slop are significantly different from zero What is Regression Analysis? Understand when and how to use regression analysis
29
McGraw-Hill/Irwin 29 Explain the concepts of association and covariation Exhibit 16.11
30
McGraw-Hill/Irwin 30 Ordinary Least Squares (OLS)–a statistical procedure that estimates regression equation coefficients which produce the lowest sum of squared differences between the actual and predicted values of the dependent variable Errors in Regression –Differences between the actual and predicted values Y are represented by e i (the error term of the regression equation) –Square the errors for each observation (the difference between actual values of Y and predicted values of Y) and add them up, the total represents an aggregate or overall measure of the accuracy of the regression equation What is Regression Analysis? Understand when and how to use regression analysis
31
McGraw-Hill/Irwin 31 Error in Regression –Regression equations calculated through the use of the OLS procedures will always give the lowest squared error totals and this is why both bivariate and multiple regression analysis are sometimes referred to as OLS regression –The error terms also can be used to diagnose potential problems caused by data observations that do not meet the assumptions described above –The pattern obtained by comparing the actual values of Y with predicted Y values indicates whether the errors are normally distribute and/or have equal variances across the range of X values What is Regression Analysis? Understand when and how to use regression analysis
32
McGraw-Hill/Irwin 32 Explain the concepts of association and covariation Exhibit 16.12
33
McGraw-Hill/Irwin 33 Explain the concepts of association and covariation Exhibit 16.13
34
McGraw-Hill/Irwin 34 Explain the concepts of association and covariation Exhibit 16.14
35
McGraw-Hill/Irwin 35 Explain the concepts of association and covariation Exhibit 16.15
36
McGraw-Hill/Irwin 36 Adjusted R-square–adjustment reduces the R2 by taking into account the sample size and the number of independent variables in the regression equation. It tells you when the multiple regression equation has too many independent variables Explained variance–the amount of variation in the dependent construct that can be accounted for by the combination of independent variables Unexplained variance–the amount of variation in the dependent construct that cannot be accounted for by the combination of independent variables Regression coefficient–indicator of the importance of an independent variable in predicting a dependent variable. Large coefficients are good predictors and small coefficients are weak predictors What is Regression Analysis? Understand when and how to use regression analysis
37
McGraw-Hill/Irwin 37 Statistical Significance of the Regression Coefficients 1.If yes–answered the first question about the relationship–Is there a relationship between the dependent and independent variable 2.How strong is the relationship–what is the coefficient of determination (r 2 )–tells what percentage of the total variation in dependent variable 3.r 2 measure varies between.00 and 1.00–the size of the r 2 will indicate the strength of the relationship–the closer to 1.00 the stronger the relationship What is Regression Analysis? Understand when and how to use regression analysis
38
McGraw-Hill/Irwin 38 End here today Hand back Lab 1
39
McGraw-Hill/Irwin 39 Multiple Regression Analysis–a statistical technique which analyzes the linear relationship between a dependent variable and multiple independent variables by estimating coefficients for the equation for a straight line Multiple Regression Analysis Understand when and how to use regression analysis
40
McGraw-Hill/Irwin 40 Relationship that exists between each independent variable and the dependent measure is still linear –Analyze the relationships–examine the regression coefficients for each independent variable –Describes the average amount of change to be expected in Y given a unit change in the value of the particular independent variable–this describes the relationship of that independent variable to the dependent variable Multiple Regression Analysis Understand when and how to use regression analysis
41
McGraw-Hill/Irwin 41 Concerns Each independent variable may be measured using a different scale The use of different scales will not allow the relative comparisons between regression coefficients to see which independent variable has the most influence on the dependent variable Multiple Regression Analysis Understand when and how to use regression analysis
42
McGraw-Hill/Irwin 42 Standardized regression coefficient–corrects this problem Beta Coefficient– –an estimated regression coefficient which has been recalculated to have a mean of 0 and a standard deviation of 1. –Such a change enables independent variables with different units of measurement to be directly compared on their association with the dependent variable. Standardization removes the effects of different scales Multiple Regression Analysis Understand when and how to use regression analysis
43
McGraw-Hill/Irwin 43 Regression coefficient–divided by its standard error to produce a t-test statistic, which is compared against the critical value to determine whether the null hypothesis can be rejected. –Examine the t-test statistics for each regression coefficient –Many times not all the independent variables in a regression equation will be statistically significant. –Using multiple regression analysis–examine the overall statistical significance of the regression model Multiple Regression Analysis Explain the concept of statistical significance versus practical significance
44
McGraw-Hill/Irwin 44 Model F statistic–measure is compared against a critical value to determine whether or not to reject the null hypothesis –If the F statistic is statistically significant, it means that the chances of the regression model for your sample producing a large r 2 when the population r 2 is actually 0 are acceptably small Multiple Regression Analysis Explain the concept of statistical significance versus practical significance
45
McGraw-Hill/Irwin 45 Appropriate procedure to follow in evaluating the results of a regression analysis –Assess the statistical significance of the overall regression model using the F statistic and its associated probability –Evaluate the obtained r2 to see how large it is –Examine the individual regression coefficient and their t-test statistic to see which are statistically significant –Look at the beta coefficient to assess relative influence Multiple Regression Analysis Explain the concept of statistical significance versus practical significance
46
McGraw-Hill/Irwin 46 Exhibit 16.16 Explain the concept of statistical significance versus practical significance
47
McGraw-Hill/Irwin 47 Dummy Variables–artificial variables introduced into a regression equation to represent the categories of a nominally scaled variable Sometimes the particular independent variables you may want to use to predict a dependent variable are not measured using interval or ratio scales –In this case, use dummy variables –There will be one dummy variable for each of the nominal categories of the independent variable and the values will typically be 0 or 1 Multiple Regression Analysis Understand when and how to use regression analysis
48
McGraw-Hill/Irwin 48 Exhibit 16.17 Understand when and how to use regression analysis
49
McGraw-Hill/Irwin 49 Multicollinearity–a situation in which several independent variables are highly correlated with each other. –This characteristic can result in difficulty in estimating separate or independent regression coefficients for the correlated variables Multiple Regression Analysis Understand when and how to use regression analysis
50
McGraw-Hill/Irwin 50 Multicollinearity inflates the standard error of the coefficient and lowers the t statistic associated with it The major impact is confined to the statistical significance of the individual regression coefficients. Multicollinearity problems do not have an impact on the size of the r 2 or the ability to predict values of the dependent variable Multiple Regression Analysis Understand when and how to use regression analysis
51
McGraw-Hill/Irwin 51 Exhibit 16.18 Understand when and how to use regression analysis
52
McGraw-Hill/Irwin 52 Relationships Between Variables Using Covariation to Describe Variable Relationships What is Regression Analysis? Multiple Regression Summary
53
McGraw-Hill/Irwin 53 The End Copyright © 2006 McGraw-Hill/Irwin
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.