Presentation is loading. Please wait.

Presentation is loading. Please wait.

The binomial applied: absolute and relative risks, chi-square

Similar presentations


Presentation on theme: "The binomial applied: absolute and relative risks, chi-square"— Presentation transcript:

1 The binomial applied: absolute and relative risks, chi-square

2 Probability speak (just shorthand!)…
P(X) = “the probability of event X” P(D) = “the probability of disease” P(E) = “the probability of exposure” P(~D) = “the probability of not getting the disease” P(~E)= “the probability of not being exposed” P(D/E) = “the probability of disease given exposure” or “the probability of disease among the exposed” P(D/~E) = “the probability of disease given unexposed” or “the probability of disease among the unexposed”

3 Things that follow a binomial distribution…
Cohort study (or cross-sectional): The number of exposed individuals in your sample that develop the disease The number of unexposed individuals in your sample that develop the disease Case-control study: The number of cases that have had the exposure The number of controls that have had the exposure

4 Cohort study example: You sample 100 smokers and 100 non-smokers and follow them for 5 years to see who develops heart disease.

5 Seeing it as a binomial…
The number of smokers that develop heart disease in your study follows a binomial distribution with N=100, p=pd/e The number of non-smokers that develop heart disease in your study follows a binomial distribution with N=100, pd/~e

6 A possible outcome: Smoker (E) Non-smoker (~E) Heart disease (D) 21 13
Smoker (E) Non-smoker (~E) Heart disease (D) 21 13 No Disease (~D) 79 87 100

7 Statistics for these data
1. Risk ratio (relative risk) 2. Difference in proportions (absolute risk) 3. Chi-square test of independence For 2x2 tables, mathematically equivalent to difference in proportions Z test.

8 1. Risk ratio (relative risk)
Exposure (E) No Exposure (~E) Disease (D) a b No Disease (~D) c d a+c b+d risk to the exposed risk to the unexposed

9 In probability terms… Risk of disease in the exposed
Exposure (E) No Exposure (~E) Disease (D) a b No Disease (~D) c d a+c b+d Risk of disease in the exposed risk of disease in the unexposed

10 Risk ratio calculation:
Smoker (E) Non-smoker (~E) Heart disease (D) 21 13 No Disease (~D) 79 87 100 Interpretation: there is a 61% increase in risk of heart disease in smokers vs. nonsmokers

11 Inferences about risk ratio…
Is our observed risk ratio statistically different from 1.0? What is the p-value? I’m going to present statistical inference for odds ratio; risk ratio is similar. So, for now, just get answer from SAS: 95% confidence interval: 0.86 to 3.04 P-value>.05

12 2. Difference in proportions
Exposure (E) No Exposure (~E) Disease (D) a b No Disease (~D) c d a+c b+d

13 2. Difference in proportions
Smoker (E) Non-smoker (~E) Heart disease (D) 21 13 No Disease (~D) 79 87 100 Absolute, rather than relative risk difference!

14 Difference in proportions test
Null hypothesis: difference in proportions = 0 Under the null, the groups have the same risk of heart disease (=overall risk in the study): The number of smokers that develop heart disease in your study follows a binomial distribution with N=100, p=.17 The number of non-smokers that develop heart disease in your study follows a binomial distribution with N=100, p=.17

15 Difference in proportions test
Follows a normal because binomial can be approximated with normal Difference in proportions test Null hypothesis: The difference in proportions is 0. Recall, variance of a proportion is p(1-p)/n Use average (or pooled) proportion in standard error formula, because under the null hypothesis, groups have equal proportions.

16 Z-test applied here… Corresponding two-sided p-value is .131.

17 Corresponding 95% confidence interval…
If the 95% confidence interval crosses the null value (here=0), then p>.05

18 OR, use computer simulation to make inferences…
1. In SAS, assume infinite population of smokers and non-smokers with equal disease risk, p=.17 (UNDER THE NULL!) 2. Use the random binomial function to randomly select n=100 smokers and n=100 non-smokers, each with p=.17 3. Calculate the observed difference in proportions. 4. Repeat this 1000 times (or some large number of times). 5. Observe the distribution of differences under the null hypothesis.

19 Computer Simulation Results
Empirical standard error is about 5.3%

20 P-value from our simulation…
When we ran this study 1000 times, by chance, we got 72 results as big or bigger than 8%. We also got 82 results as small or smaller than –8%.

21 P-value From our simulation, we estimate the p-value to be:
154/1000 or .154

22 3. chi-square test of independence
Smoker (E) Non-smoker (~E) Heart disease (D) 21 13 No Disease (~D) 79 87 100 Null hypothesis: smoking and heart disease are independent

23 What does it mean to be “independent” in stats?
Under independence, P(A&B)=P(A)*P(B) In words the “joint probability” equals the product of the “marginal probabilities.” OR The probability of both A and B happening is equal to the probability of A times the probability of B. If smoking and heart disease are independent, then P(smoker&heart disease)=P(smoker)*P(heart disease)

24 Calculate expected counts under independence…
Smoker (E) Non-smoker (~E) Heart disease (D) 21 13 No Disease (~D) 79 87 100 IF smoking and heart disease are independent THEN: P(HeartDisesae&Smoker)=P(HeartDisease)*P(Smoker) P(HeartDisease)=34/100=17% P(Smoker)=100/200=50% IF INDEPENDENT, then P(HeartDisease&Smoker) should be 8.5%; 8.5% of 200 = 17

25 Fill in the expected table…
Smoker (E) Non-smoker (~E) Heart disease (D) 17 No Disease (~D) 100 Marginals are fixed! 17 34 83 83 156 Notice that the rest of the table is determined after you fill in 17 for cell A. There are no degrees of freedom left! (This table has only 1 degree of freedom).

26 Compare expected and observed counts…
Smoker (E) Non-smoker (~E) Heart disease (D) 17 No Disease (~D) 17 expected 83 83 Smoker (E) Non-smoker (~E) Heart disease (D) 21 13 No Disease (~D) 79 87 observed

27 Chi-Square test 2.25=1.5-squared. The chi-square test produces exactly the square of the Z-test and the same p-value. Degrees of freedom = (rows-1)*(columns-1)=(2-1)*(2-1)=1 Rule of thumb: if the chi-square statistic is much greater than it’s degrees of freedom, indicates statistical significance. Here 2.25 not quite big enough—p=.131.

28 Bonus material: The Chi-Square distribution: is sum of squared normal deviates
The expected value and variance of a chi-square: E(x)=df Var(x)=2(df)

29 Case-control study example:
You sample 50 stroke patients and 50 controls without stroke and ask about their smoking in the past.

30 Possible study results:
Smoker (E) Non-smoker (~E) Stroke (D) 15 35 No Stroke (~D) 8 42 50

31 Statistics for these data
1. Odds ratio (relative risk) 2. Difference in proportions exposed (absolute risk) 3. Chi-square

32 What’s the risk ratio here?
Smoker (E) Non-smoker (~E) Stroke (D) 15 35 No Stroke (~D) 8 42 50 Tricky: There is no risk ratio, because we cannot calculate the risk of disease!!

33 The odds ratio… We cannot calculate a risk ratio from a case-control study. BUT, we can calculate a measure called the odds ratio…

34 Odds vs. Risk If the risk is… Then the odds are… ½ (50%) ¾ (75%)
1/10 (10%) 1/100 (1%) 1:1 3:1 1:9 1:99 Note: An odds is always higher than its corresponding probability, unless the probability is 100%.

35 The Odds Ratio (OR) a+b=cases c+d=controls
Exposure (E) No Exposure (~E) Disease (D) a b No Disease (~D) c d a+b=cases The proportion of cases to controls are set by the investigator; therefore, they do not represent the risk (probability) of developing disease. c+d=controls Odds of exposure in the cases Odds of exposure in the controls

36 The Odds Ratio (OR) Odds of disease in the exposed
Odds of disease in the unexposed Odds of exposure in the cases Odds of exposure in the controls This expression is mathematically equivalent to: Backward from what we want… The direction of interest!

37 Proof via Bayes’ Rule (optional)
Odds of exposure in the controls Odds of exposure in the cases Bayes’ Rule Odds of disease in the unexposed Odds of disease in the exposed What we want! =

38 The odds ratio Smoker (E) Non-smoker (~E) Stroke (D) 15 35
Smoker (E) Non-smoker (~E) Stroke (D) 15 35 No Stroke (~D) 8 42 50 Interpretation: there is a 2.25-fold higher odds of stroke in smokers vs. non-smokers.

39 Inferences about the odds ratio…
Does the sampling distribution follow a normal distribution? What is the standard error?

40 Simulation… 1. In SAS, assume infinite population of cases and controls with equal proportion of smokers (exposure), p=.23 (UNDER THE NULL!) 2. Use the random binomial function to randomly select n=50 cases and n=50 controls each with p=.23 chance of being a smoker. 3. Calculate the observed odds ratio for the resulting 2x2 table. 4. Repeat this 1000 times (or some large number of times). 5. Observe the distribution of odds ratios under the null hypothesis.

41 Properties of the OR (simulation) (50 cases/50 controls/23% exposed)
Under the null, this is the expected variability of the sample ORnote the right skew

42 Properties of the lnOR Normal!

43 Properties of the lnOR From the simulation, can get the empirical standard error (~0.5) and p-valuE (~.10)

44 Properties of the lnOR Or, in general, standard error =

45 Inferences about the ln(OR)
Smoker (E) Non-smoker (~E) Stroke (D) 15 35 No Stroke (~D) 8 42 50 p=.10

46 Confidence interval… Final answer: 2.25 (0.85,5.92) Smoker (E)
Smoker (E) Non-smoker (~E) Stroke (D) 15 35 No Stroke (~D) 8 42 50 Final answer: 2.25 (0.85,5.92)

47 2. Difference in proportions exposed
Smoker (E) Non-smoker (~E) Stroke (D) 15 35 No Stroke (~D) 8 42 50

48 2. Difference in proportions exposed

49 3. chi-square test of independence
Smoker (E) Non-smoker (~E) Stroke (D) 15 35 No Stroke (~D) 8 42 Expected count for cell A: proportion: 0.5*.23=.115 count: .115*100= 11.5

50 expected and observed counts…
Smoker (E) Non-smoker (~E) Stroke (D) 11.5 No Stroke (~D) 38.5 expected 11.5 38.5 Smoker (E) Non-smoker (~E) Stroke (D) 15 35 No Stroke (~D) 8 42 observed

51 Chi-Square test Not quite sufficient evidence to reject null…
2.78=1.67-squared. Not quite sufficient evidence to reject null…


Download ppt "The binomial applied: absolute and relative risks, chi-square"

Similar presentations


Ads by Google