Presentation is loading. Please wait.

Presentation is loading. Please wait.

AP Statistics Causation & Relations in Categorical Data.

Similar presentations


Presentation on theme: "AP Statistics Causation & Relations in Categorical Data."— Presentation transcript:

1 AP Statistics Causation & Relations in Categorical Data

2 HW Questions???

3 Causation  The old adage, “Correlation does not necessarily imply causation.”  Many times in statistics we can find a “connection” or a correlation or an association between two variables, it is more difficult to prove that the explanatory actually causes the response variable to respond.

4 Examples  X = number of complete passes a quarterback throws  Y= passing yardage for the quarterback  It’s reasonable to assume that the more complete passes he throws, the more yards he’ll rack up.

5 Example 2  X=the number of ounces of alcohol consumed  Y=Fine motor skills abilities  Again, it’s reasonable to assume the x is actually causing y to respond.

6 Common Response  Two variables show a strong association because a third variable is causing both of them to respond.  X=A student’s ACT score  Y = A student’s SAT score It’s fair to assume that ones intelligence, or lack of it, will cause high, or low, scores on both tests. Having a high ACT score doesn’t cause the SAT score to be high.

7 Confounding  Two variables are confounded when their effect on a response variable cannot be distinguished from each other.  It looks like x is causing y, but another variable z is also acting on y, and it’s hard to sort out who is doing what.

8 Example 1  X = # of mentos dropped into the soda bottle  Y = height of soda spray One might think that x causes y to respond, more mentos = higher spray, but Z the temperature of the soda is also changing, and now you can’t tell what did what.

9 Example 2  X = latitude at which a person lives  Y = lifespan  The variables are associated, but it’s hard to know if living at a higher latitude causes you to live longer, or if, it just happens that poorer countries tend to be in the tropics and it’s the poverty that is reducing the life span.

10 How to establish causation???  You need a controlled experiment, where the effects of lurking variables are controlled and minimized. See chapter 5!!

11 Relations in Categorical Data  What if we want to see if there is an association among categorical data? Obviously we can’t make a scatterplot and compute the correlation, do a regression, etc.  We make a two way table.

12 Example 1: College Students I--------Gender---------I Age GroupFemaleMaleTotal 15 – 178961150 18 - 245,6684,69710,365 25 – 341,9041,5893,494 35 or older1,6609702,630 Total9,3217,31716,639

13 Conditional and Marginal Distribution  The Marginal Distribution is the distribution of one variable alone, that is a column total out of the total total. Ex. % of males in college.  The Conditional Distribution is the distribution of one variable across another variable. Example, % of Women among 15 – 17 year olds.

14 Looking for association  If age does not have any effect on gender in college, then we’d expect the conditional percentages to be roughly equal. If there is a big disparity, then we might conclude that age and gender in college are connected.  Compute the conditional distributions for gender on age.

15 Do Medical Helicopters Save Lives?  A businessman is trying to cut costs to a hospital and knows that the helicopter program is quite expensive. He gets some data on whether or not the program is effective.  ____________Helicopter Road Victim died 64 260 Victim survived 136 840 Total 200 1100

16 Doesn’t look good  The conditional distributions for death on vehicle type is 64/200 = 32% on the helicopter, and 260/1100 = 24% on the road.  As the hospital statistician, what might you do to try and save the helicopter program?? i.e. what lurking variables are out there?

17 Statistics to the Rescue…  Since helicopters are more likely to respond to serious accidents… Serious Accidents Less Serious Helicopter Road Helicopter Road Died 48 60 16 200 Survived 52 40 84 800 Total 100 100 100 1000

18 Simpson’s Paradox  The reversal of the direction of a comparison or association when data from several groups are combined to form a single group.  I.E. when you have data that isn’t parsed out for various lurking variables, it might not be the true reprsentation.

19 Homework  4.33, 4.36, 4.37; 4.52 – 54, 4.60


Download ppt "AP Statistics Causation & Relations in Categorical Data."

Similar presentations


Ads by Google