Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 3 Association: Contingency, Correlation, and Regression

Similar presentations


Presentation on theme: "Chapter 3 Association: Contingency, Correlation, and Regression"— Presentation transcript:

1 Chapter 3 Association: Contingency, Correlation, and Regression
Section 3.1 The Association Between Two Categorical Variables

2 Response and Explanatory Variables
Response variable (Dependent Variable) The outcome variable on which comparisons are made. Explanatory variable (Independent variable) When the explanatory variable is categorical, it defines the groups to be compared with respect to values on the response variable. When the explanatory variable is quantitative, it defines the change in different numerical values to be compared with respect to the values for the response variable. Example: Response/Explanatory Survival status / Smoking status Carbon dioxide(CO2)Level / Amount of gasoline use for cars College GPA / Number of hours a week spent studying

3 Association Between Two Variables
The main purpose of data analysis with two variables is to investigate whether there is an association and to describe that association. An association exists between two variables if a particular value for one variable is more likely to occur with certain values of the other variable.

4 Contingency Tables A Contingency Table:
Displays two categorical variables The rows list the categories of one variable The columns list the categories of the other variable Entries in the table are frequencies

5 Contingency Tables: Calculate Proportions and Conditional Proportions
29/26698 = 19485/26698 = 0.73 29/127 = 0.228 19485/26571 = 0.733 19514 / = 0.731 Questions: What proportion of foods are organic with pesticides? What proportion of foods are conventionally grown with pesticides? What proportion of organic foods contain pesticides? 4. What proportion of conventionally grown foods contain pesticides? 5. What proportion of all sampled items contain pesticides?

6 Answer: (A) Example A survey includes the​ question, "Taken all​ together, would you say that you are very​ happy, pretty​ happy, or not too​ happy?" The table uses the survey to​ cross-tabulate happiness with family​ income, measured as the response to the​ question, "Compared with families in​ general, would you say that your family income is below​ average, average, or above​ average?“ Answer: A)

7 75/504 = 284/504 = 145/504 = 61/450 = NO 390/1426= b. Construct the conditional proportions on happiness at each level of income. Interpret and summarize the association between these variables. 75/504 = 284/504 = 145/504 = 61/450 = 390/1426= A B

8 Calculate Proportions and Conditional Proportions
For questions 3,and 4 from previous slide, these proportions are called conditional proportions because their formation is conditional on (in this example) food type. Table 3.2 Conditional Proportions on Pesticide Status, for Two Food Types. These conditional proportions (using two decimal places) treat pesticide status as the response variable. The sample size n in a row shows the total on which the conditional proportions in that row were based.

9 Chapter 3 Association: Contingency, Correlation, and Regression
Section 3.2 The Association Between Two Quantitative Variables

10 An example Student Beers Blood Alcohol 1 5 0.1 2 0.03 3 9 0.19 6 7 0.095 0.07 0.02 11 4 13 0.085 8 0.12 0.04 0.06 10 0.05 12 14 0.09 15 0.01 16 Student Beers Blood Alcohol 1 5 0.1 2 0.03 3 9 0.19 6 7 0.095 0.07 0.02 11 4 13 0.085 8 0.12 0.04 0.06 10 0.05 12 14 0.09 15 0.01 16 Here, we have two quantitative variables for each of 16 students. 1) How many beers they drank, and 2) Their blood alcohol level (BAC) We are interested in the relationship between the two variables: How is one affected by changes in the other one?

11 Scatterplots In a scatterplot, one axis is used to represent each of the variables, and the data are plotted as points on the graph. Student Beers BAC 1 5 0.1 2 0.03 3 9 0.19 6 7 0.095 0.07 0.02 11 4 13 0.085 8 0.12 0.04 0.06 10 0.05 12 14 0.09 15 0.01 16 Quantitative data - have two pieces of data per individual and wonder if there is an association between them. Very important in biology, as we not only want to describe individuals but also understand various things about them. Here for example we plot BAC vs number of beers. Can clearly see that there is a pattern to the data. When you drink more beers you generally have a higher BAC. Dots are arranged in pretty straight line - a linear relationship. And since when one goes up, the other does too, it is a positive linear relationship. Also see that

12 How to Examine a Scatterplot
We examine a scatterplot to study association. How do values on the response variable change as values of the explanatory variable change? You can describe the overall pattern of a scatterplot by the trend, direction, and strength of the relationship between the two variables. Trend: linear, curved, clusters, no pattern Direction: positive, negative, no direction Strength: how closely the points fit the trend Also look for outliers from the overall trend.

13 Form and direction of an association
Linear No relationship Nonlinear

14 No relationship: X and Y vary independently
No relationship: X and Y vary independently. Knowing X tells you nothing about Y. One way to think about this is to remember the following: The equation for this line is y = 5. x is not involved.

15 Example: 100 cars on the lot of a used-car dealership
Question1: Would you expect a positive association, a negative association or no association between the age of the car and the mileage on the odometer? Question2: Would you expect a positive association, a negative association or no association between the age of the car and the resale value?

16 Interpreting Scatterplots: Direction/Association
Two quantitative variables x and y are Positively associated when high values of x tend to occur with high values of y. low values of x tend to occur with low values of y. Negatively associated when high values of one variable tend to pair with low values of the other variable.

17 Strength of the association
The strength of the relationship between the two variables can be seen by how much variation, or scatter, there is around the main form. With a strong relationship, you can get a pretty good estimate of y if you know x. With a weak relationship, for any x you might get a wide range of y values.

18 Strength of the relationship or association ...
This is a weak relationship. For a particular state median household income, you can’t predict the state per capita income very well. This is a very strong relationship. The daily amount of gas consumed can be predicted quite accurately for a given temperature value.

19 Correlation The correlation coefficient “r”
r does not distinguish between x and y r has no units of measurement r ranges from -1 to +1

20 The correlation coefficient "r"
The correlation coefficient is a measure of the direction and strength of a linear relationship between two numerical variables. r ranges from -1 to +1. The sign of r gives the direction of a scatter plot. |r| gives the strength of a scatter plot: If |r| is close to 1, then the strength is strong. If |r| is close to 0.5, then the strength is moderate. If |r| is close to 0, then the strength is weak. Note: if r=1, then it is perfect positive; if r=-1, then it is perfect negative.

21 Summarizing the Strength of Association: The Correlation, r
The Correlation measures the strength and direction of the linear association between x and y. A positive r value indicates a positive linear association. A negative r value indicates a negative linear association. An r value close to +1 or -1 indicates a strong linear association. An r value close to 0 indicates a weak linear association.

22 Calculating the Correlation Coefficient
Example: Per Capita Gross Domestic Product and Average Life Expectancy for Countries in Western Europe. x y 21.4 77.48 -0.078 -0.345 0.027 23.2 77.53 1.097 -0.282 -0.309 20.0 77.32 -0.992 -0.546 0.542 22.7 78.63 0.770 1.102 0.849 20.8 77.17 -0.470 -0.735 0.345 18.6 76.39 -1.906 -1.716 3.271 21.5 78.51 -0.013 0.951 -0.012 22.0 78.15 0.313 0.498 0.156 23.8 78.99 1.489 1.555 2.315 21.2 77.37 -0.209 -0.483 0.101 = 21.52 = sum = 7.285 sx =1.532 sy =0.795

23 Example to calculate “r” by calculator
1 3 5 r=? Y 7 11 Input the data: Stat  Edit  Input X-values into L1; and input Y-values into L2. Calculate correlation coefficient r: Stat  Calc  option 4. If you can’t find r from your calculator, then you must follow the next slide to get the option of r back…

24 Subject: For TI 84, where's the correlation coefficient
Subject: For TI 84, where's the correlation coefficient? To find the correlation coefficient: First, your calculator must be set up to display the correlation. (You only have to set it up once, so if you’ve done it in class, skip this part. Sometimes if you change batteries you have to do it again.) Hit 2nd CATALOG (this is over the 0 button). Go down to DiagnosticOn, hit ENTER then ENTER again. It is now set up to display correlation with the regression line. 3. Enter the X values in one list and the Y values in another. Go to STAT>CALC 8:LinReg (a+bx) and hit ENTER. It is now pasted to the home screen. You must input the names of the list containing the X values followed by a comma then the list containing the Y values. For example, if my X values are in L1 and Y values are in L2, I would enter LinReg(a+bx) L1,L2

25 How do I restore deleted lists on a TI-83 family or TI-84 Plus family graphing calculator?
The instructions below detail how to restore deleted lists on a TI-83 family or TI-84 Plus family graphing calculator. To restore the original list names (L1 - L6): • Press [STAT] • Select 5:SetUpEditor • Press [ENTER] (Done should appear on the screen) The original lists, L1 - L6, should now appear when using [STAT] [ENTER]. Please see the TI-83 family and TI-84 Plus family guidebooks for additional information.

26 Examples for correlation coefficient “r”
Ex1. find the correlation coefficient of X and Y. Ex2. find the correlation coefficient of X and Z, where Z=2*X. Ex3. find the correlation coefficient of X and Z, where Z= -2*X. Ex4. find the correlation coefficient of X and Z, where Z= X+10. EX5. find the correlation coefficient of Y and X. EX6. find the correlation coefficient of U and V. Plot the scatter plots for EX2, EX3, EX4, and EX6 Now summarize all properties we obtain from these exercises. X Y 1 3 3 5 4 7 6 9 U V 1 0 0 1 2 1 1 2

27 Examples for correlation coefficient “r”
X Y 1 3 3 5 4 7 6 9 U V 1 0 0 1 2 1 1 2 Ex1. find the correlation coefficient of X and Y. Ex2. find the correlation coefficient of X and Z, where Z=2*X. Ex3. find the correlation coefficient of X and Z, where Z= -2*X. Ex4. find the correlation coefficient of X and Z, where Z= X+10. EX5. find the correlation coefficient of Y and X. EX6. find the correlation coefficient of U and V. Plot the scatter plots for EX2, EX3, EX4, and EX6 EX2 scatter plot EX3 scatter plot EX4 scatter plot EX6 scatter plot

28 Properties of Correlation
Sign of correlation denotes direction (-) indicates negative linear association (ex3) (+) indicates positive linear association (ex2) Two variables have the same correlation no matter which is treated as the response variable.(ex4 and ex1 share the same number) Correlation has a unit-less measure, it does not depend on the variables’ units (ex5 and ex1 share the same number). Correlation is not resistant to outliers. Correlation only measures strength of a linear relationship. Always falls between -1 and +1.

29 Correlation estimation
Estimate the correlation coefficient from the scatter plot R=0.393 R=0.861 R=0.778 r = r = r = r = r = -0.05 r = r = r = r = r = 1

30 Correlation Estimation
Estimate the correlation coefficient from the scatter plot R= - 0.9 R= R= R= R= R= R= -0.05 R= R= R= R= R= R= 1

31 Correlation Coefficient: Measuring Strength and Direction of a Linear Relationship
Let’s get a feel for the correlation r by looking at its values for the scatterplots shown in Figure 3.7: Figure 3.7 Some Scatterplots and Their Correlations. The correlation gets closer to when the data points fall closer to a straight line. Question: Why are the cases in which the data points are closer to a straight line considered to represent stronger association?

32 Chapter 3 Association: Contingency, Correlation, and Regression
Section 3.3 Predicting the Outcome of a Variable

33 But which line best describes our data?
Correlation tells us about strength (scatter) and direction of the linear relationship between two quantitative variables. In addition, we would like to have a numerical description of how both variables vary together. For instance, is one variable increasing faster than the other one? And we would like to make predictions based on that numerical description. But which line best describes our data?

34 Regression Line The first step of a regression analysis is to identify the response and explanatory variables. We use y to denote the response variable. We use x to denote the explanatory variable. The regression line predicts the value for the response variable y as a straight-line function of the value x of the explanatory variable. Let denote the predicted value of y. The equation for the regression line has the form In this formula, a denotes the y-intercept and b denotes the slope.

35 The least-squares regression line
Error=observed value – predicted value. The least-squares regression line is the unique line such that the sum of the squared vertical (y) distances between the data points and the line is the smallest possible. Distances between the points and line are squared so all are positive values. This is done so that distances can be properly added (Pythagoras).

36 How to: First we calculate the slope of the line, b; from statistics we already know: r is the correlation. sy is the standard deviation of the response variable y. sx is the the standard deviation of the explanatory variable x. Once we know b, the slope, we can calculate a, the y-intercept: where x and y are the sample means of the x and y variables This means that we don't have to calculate a lot of squared distances to find the least-squares regression line for a data set. We can instead rely on the equation. But typically, we use a 2-var stats calculator or stats software.

37 Example 1: Beer v.s. BAC Q1: Find the regression line.
Here, we have two quantitative variables for each of 16 students. 1) How many beers they drank, and 2) Their blood alcohol level (BAC) We are interested in the relationship between the two variables: How is one affected by changes in the other one? Student Beers Blood Alcohol 1 5 0.1 2 0.03 3 9 0.19 6 7 0.095 0.07 0.02 11 4 13 0.085 8 0.12 0.04 0.06 10 0.05 12 14 0.09 15 0.01 16 Q1: Find the regression line. Q2: If a student has 6.5 cups of beers, what is the BAC we expect?

38 Example 1: Beer v.s. BAC Here, we have two quantitative variables for each of 16 students. 1) How many beers they drank, and 2) Their blood alcohol level (BAC) We are interested in the relationship between the two variables: How is one affected by changes in the other one? Student Beers Blood Alcohol 1 5 0.1 2 0.03 3 9 0.19 6 7 0.095 0.07 0.02 11 4 13 0.085 8 0.12 0.04 0.06 10 0.05 12 14 0.09 15 0.01 16 Q3: If a student turned out to drink 6.5 cups of beers, with BAC what is the corresponding prediction error?

39 Solution: Beer v.s. BAC – Q1
Student Beers Blood Alcohol 1 5 0.1 2 0.03 3 9 0.19 6 7 0.095 0.07 0.02 11 4 13 0.085 8 0.12 0.04 0.06 10 0.05 12 14 0.09 15 0.01 16 USE TI 84 to construct the regression line Enter the X values in one list and the Y values in another. Go to STAT>CALC 8:LinReg (a+bx) and hit ENTER. It is now pasted to the home screen. You must input the names of the list containing the X values followed by a comma then the list containing the Y values. For example, if my X values are in L1 and Y values are in L2, I would enter LinReg(a+bx) L1,L2

40 Making predictions: interpolation
The equation of the least-squares regression allows to predict y for any x within the range studied. This is called interpolating. Q2: If a student has 6.5 cups of beers, what is the BAC we expect? Solution: Nobody in the study drank 6.5 beers, but by finding the value of from the regression line for x = 6.5 we would expect a blood alcohol content of mg/ml.

41 (in 1000’s) Example 2 Q1: Find the regression line; Q2: If there were 500,000 powerboats, what is the # of deaths we would expect? There is a positive linear relationship between the number of powerboats registered and the number of manatee deaths. The least squares regression line has the equation: ˆ y = . 125 x - 41 4 Thus if we were to limit the number of powerboat registrations to 500,000, what could we expect for the number of manatee deaths? Roughly 22 manatees.

42 Error=observed value – predicted value= 18-21.1 = -3.1.
(in 1000’s) Q3: Supposed in 1991, it turned out to have 500,000 powerboats and there were 18 manatee deaths, what is the corresponding prediction error? Thus if we were to limit the number of powerboat registrations to 500,000, what could we expect for the number of manatee deaths? Roughly 21 manatees. Error=observed value – predicted value= = -3.1.

43 Q3: How to plot a regression line?
The equation completely describes the regression line. To plot the regression line you only need to plug two x values into the equation, get y, and draw the line that goes through those points. Hint: The regression line always passes through the mean of x and y. The points you use for drawing the regression line are derived from the equation. They are NOT points from your sample data (except by pure coincidence).

44 Interpreting the y-Intercept
The predicted value for y when , which helps in plotting the line May not have any interpretative value When the # of beer is zero, the bac = *0= However, the bac can’t be negative, and we can’t interpret this.

45 The interpretation of the slope b
The interpretation of the slope b: for each unit change in X, the amount of change in Y For each 1k more boats, the deaths of Manatees increases by 0.125 For each extra beer , BAC will increase by 0.018 ˆ y = . 125 x - 41 4

46 The interpretation of the slope b Exercise:
For each extra beer , BAC will increase by 0.018 For each extra 5 beer , BAC will increase by 5*0.018=0.09 For each extra 2 beer , BAC will increase by ________ For each extra 3 beer , BAC will increase by _________

47 Slope Values: Positive, Negative, Equal to 0
Positive association Positive slope Negative association Negative slope Figure Three Regression Lines Showing Positive Association (slope > 0), Negative Association (slope < 0) and No Association (slope = 0). Question: Would you expect a positive or negative slope when y = BAC and x = number of beers?

48 Residuals Measure the Size of Prediction Errors
Residuals measure the size of the prediction errors, the vertical distance between the point and the regression line. Each observation has a residual Calculation for each residual: A large residual indicates an unusual observation. The smaller the absolute value of a residual, the closer the predicted value is to the actual value, so the better is the prediction.

49 Residual Example: Beer vs BAC
Student Beers Blood Alcohol 1 5 0.1 2 0.03 3 9 0.19 6 7 0.095 0.07 0.02 11 4 13 0.085 8 0.12 0.04 0.06 10 0.05 12 14 0.09 15 0.01 16 We know that the regression line is: 1. Find the residual for the 3rd student: 2. Find the residual for the 16th student:

50 The Method of Least Squares Yields the Regression Line
Residual sum of squares: The least squares regression line is the line that minimizes the vertical distance between the points and their predictions, i.e., it minimizes the residual sum of squares. Note: The sum of the residuals about the regression line will always be zero.

51 Regression Formulas for y-Intercept and Slope
Notice that the slope b is directly related to the correlation r, and the y-intercept depends on the slope.

52 Calculating the slope and y-intercept for the regression line, Example 1
Using BAC and Beer data, we have Beer as x and BAC as y. Find the regression line predicting BAC from Beer. Solution: The regression line to predict BAC from Beer is

53 Calculating the slope and y-intercept for the regression line, Example 2
Using Manatees and Power Boat data, we have Power Boat as x and Manatee as y. Find the regression line predicting BAC from Beer. Solution: The regression line to predict BAC from Beer is

54 The Slope and the Correlation:
Describes the strength of the linear association between the two variables. Does not change when the units of measurement change. Does not depend upon which variable is the response and which is the explanatory. Slope: Numerical value depends on the units used to measure the variables. Does not tell us whether the association is strong or weak. The two variables must be identified as response and explanatory variables. The regression equation can be used to predict values of the response variable for given values of the explanatory variable.

55 Note: Only difference b/w these two examples is to switch x and y.
The distinction between explanatory and response variables is crucial in regression. If you exchange y for x in calculating the regression line, you will get the wrong line, even though correlation coefficient r does not distinguish x and y!. Regression examines the distance of all points from the line in the y direction only. Q: Find the regression line for the following two examples. Do you get the same regression line? Note: Only difference b/w these two examples is to switch x and y. x y 1 3 7 5 9 11 X y 3 1 7 9 5 11 y= *x y= *x

56 The distinction between explanatory and response variables is crucial in regression. If you exchange y for x in calculating the regression line, you will get the wrong line, even though correlation coefficient r does not distinguish x and y!. Regression examines the distance of all points from the line in the y direction only. Hubble telescope data about galaxies moving away from earth: These two lines are the two regression lines calculated either correctly (x = distance, y = velocity, solid line) or incorrectly (x = velocity, y = distance, dotted line).

57 The Squared Correlation
The typical way to interpret is as the proportion of the variation in the y-values that is accounted for by the linear relationship of y with x. When a strong linear association exists, the regression equation predictions tend to be much better than the predictions using only We measure the proportional reduction in error and call it, .

58 Coefficient of determination, r2
r2, the coefficient of determination, is the square of the correlation coefficient. r2 (square of the correlation coefficient) gives the percent of variation explained in the values of y that is explained by the least squares regression of y on x. r2 is always between 0 and 1 (closer to 1, the better the line is for prediction) Go over point that missing zero is OK - this is messy data, a prediction based on messy data. Residuals should be scattered randomly around the line, or there is something wrong with your data - not linear, outliers, etc.

59 Here are two plots of height (response) against age (explanatory) of some children. Notice how r2 relates to the variation in heights... r=0.994, r-square=0.988 r=0.921, r-square=0.848

60 The Squared Correlation
measures the proportion of the variation in the y-values that is accounted for by the linear relationship of y with x. A correlation of .9 means that 81% of the variation in the y-values can be explained by the explanatory variable, x.

61 ˆ y = . 125 x - 41 4

62 Here the change in x only explains 76% of the change in y
Here the change in x only explains 76% of the change in y. The rest of the change in y (the vertical scatter, shown as red arrows) must be explained by something other than x. r = 0.87 r2 = 0.76 r = -1 r2 = 1 Changes in x explain 100% of the variations in y. Y can be entirely predicted for any given value of x. r = 0 r2 = 0 Changes in x explain 0% of the variations in y. The value(s) y takes is (are) entirely independent of what value x takes.

63 Chapter 3 Association: Contingency, Correlation, and Regression
Section 3.4 Cautions in Analyzing Associations

64 Extrapolation Is Dangerous
Extrapolation: Using a regression line to predict y-values for x-values outside the observed range of the data. Riskier the farther we move from the range of the given x- values. There is no guarantee that the relationship given by the regression equation holds outside the range of sampled x-values.

65 Be Cautious of Influential Outliers
One reason to plot the data before you do a correlation or regression analysis is to check for unusual observations. Search for observations that are regression outliers, being well removed from the trend that the rest of the data follow.

66 Outliers and Influential Points
A regression outlier is an observation that lies far away from the trend that the rest of the data follows. An observation is influential if its x value is relatively low or high compared to the remainder of the data. the observation is a regression outlier. Influential observations tend to pull the regression line toward that data point and away from the rest of the data points.

67 Outliers and influential points
Outlier: observation that lies outside the overall pattern of observations. “Influential individual”: observation that markedly changes the regression if removed. This is often an outlier on the x-axis. Child 19 = outlier in y direction Child 18 = outlier in x direction Child 19 is an outlier of the relationship. Child 18 is only an outlier in the x direction and thus might be an influential point.

68 Correlation Does Not Imply Causation: 1st example
In a regression analysis, suppose that as x goes up, y also tends to go up (or down). Can we conclude that there’s a causal connection, with changes in x causing changes in y? A strong correlation between x and y means that there is a strong linear association that exists between the two variables. A strong correlation between x and y, does not mean that x causes y to change.

69 Correlation Does Not Imply Causation
Data are available for all fires in Chicago last year on x = number of firefighters at the fire and y = cost of damages due to the fire. 1. Would you expect the correlation to be negative, zero, or positive? 2. If the correlation is positive, does this mean that having more firefighters at a fire causes the damages to be worse? Yes or No? 3. Identify a third variable that could be considered a common cause of x and y: Distance from the fire station Intensity of the fire Size of the fire

70 Correlation Does Not Imply Causation:
2nd example

71 Lurking Variables & Confounding
A lurking variable is a variable, usually unobserved, that influences the association between the variables of primary interest. Ice cream sales and drowning – lurking variable = temperature Reading level and shoe size – lurking variable = age Childhood obesity rate and GDP-lurking variable = time When two explanatory variables are both associated with a response variable but are also associated with each other, there is said to be confounding. Lurking variables are not measured in the study but have the potential for confounding.

72 Simpson’s Paradox Simpson’s Paradox:
When the direction of an association between two variables changes after we include a third variable and analyze the data at separate levels of that third variable.

73 Simpson’s Paradox Example: Smoking and Health
Is Smoking Actually Beneficial to Your Health? Table 3.7 Smoking Status and 20-Year Survival in Women Probability of Death of Smoker = 139/582= 24% Probability of Death of Nonsmoker = 230/732= 31% This can’t be true that smoking improves your chances of living! What’s going on?!

74 Simpson’s Paradox Example: Smoking and Health
Break out Data by Age Table 3.8 Smoking Status and 20-Year Survival, for Four Age Groups

75 Simpson’s Paradox Example: Smoking and Health
For instance, for smokers of age 18–34, from Table 3.8 the proportion who died was 5/( ) = 0.028, or 2.8% Could age explain the association? Table 3.9 Conditional Percentages of Deaths for Smokers and Nonsmokers, by Age

76 Simpson’s Paradox Example: Smoking and Health
Figure 3.23 MINITAB Bar Graph Comparing Percentage of Deaths for Smokers and Nonsmokers, by Age. This side-by-side bar graph shows the conditional percentages from Table 3.9. An association can look quite different after adjusting for the effect of a third variable by grouping the data according to the values of the third variable (age).

77 The Effect of Lurking Variables on Associations
Lurking variables can affect associations in many ways. For instance, a lurking variable may be a common cause of both the explanatory and response variable. In practice, there’s usually not a single variable that causally explains a response variable or the association between two variables. More commonly, there are multiple causes . When there are multiple causes, the association among them makes it difficult to study the effect of any single variable.

78 The Effect of Confounding on Associations
When two explanatory variables are both associated with a response variable but are also associated with each other, confounding occurs. It is difficult to determine whether either of them truly causes the response because a variable’s effect could be at least partly due to its association with the other variable.


Download ppt "Chapter 3 Association: Contingency, Correlation, and Regression"

Similar presentations


Ads by Google