Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical Inference II. Confidence Intervals give: *A plausible range of values for a population parameter. *The precision of an estimate.(When sampling.

Similar presentations


Presentation on theme: "Statistical Inference II. Confidence Intervals give: *A plausible range of values for a population parameter. *The precision of an estimate.(When sampling."— Presentation transcript:

1 Statistical Inference II

2 Confidence Intervals give: *A plausible range of values for a population parameter. *The precision of an estimate.(When sampling variability is high, the confidence interval will be wide to reflect the uncertainty of the observation.) *Statistical significance (if the 95% CI does not cross the null value, it is significant at.05)

3 Confidence Intervals: Estimating the Size of the Effect (Sample statistic)  (measure of how confident we want to be)  (standard error)

4 Common Levels of Confidence Commonly used confidence levels are 90%, 95%, and 99% Confidence Level Z value 1.28 1.645 1.96 2.33 2.58 3.08 3.27 80% 90% 95% 98% 99% 99.8% 99.9%

5 The true meaning of a confidence interval A computer simulation: Imagine that the true population value is 10. Have the computer take 50 samples of the same size from the same population and calculate the 95% confidence interval for each sample. Here are the results…

6 95% Confidence Intervals

7 3 misses=6% error rate For a 95% confidence interval, you can be 95% confident that you captured the true population value. 95% Confidence Intervals

8 Confidence Intervals for our weight example… (Sample statistic)  (measure of how confident we want to be)  (standard error) 95% CI: 160  1.96  1.5 = 157-163 lbs 99% CI: 160  2.58  1.5 = 156-164 lbs Note how the confidence intervals do not cross the null value of 150!

9 Confidence intervals give the same information (and more) than hypothesis tests…

10 Duality with hypothesis tests. Null value 95% confidence interval Null hypothesis: Average weight is 150 lbs. Alternative hypothesis: Average weight is not 150 lbs. P-value <.05 150 151 152 153 154 155 156 157 158 159 160 161 162 163

11 Duality with hypothesis tests. Null value 99% confidence interval Null hypothesis: Average weight is 150 lbs. Alternative hypothesis: Average weight is not 150 lbs. P-value <.01 150 151 152 153 154 155 156 157 158 159 160 161 162 163

12 A 95% confidence interval for a mean: a. Is wider than a 99% confidence interval. b. Is wider when the sample size is larger. c. In repeated samples will include the population mean 95% of the time. d. Will include 95% of the observations of a sample. Review Question 1

13 A 95% confidence interval for a mean: a. Is wider than a 99% confidence interval. b. Is wider when the sample size is larger. c. In repeated samples will include the population mean 95% of the time. d. Will include 95% of the observations of a sample. Review Question 1

14 Review Question 2 Spine bone density is normally distributed in young women, with a mean of 1.0 g/cm 2 and a mean of 0.1 g/cm 2. In my sample of 100 young women runners, the average spine bone density is.93 g/cm 2. What is the 95% confidence interval? a..93  1.96(.1) =.91-.1.13 b..93  (.1) =.83-1.13 c. 1.0  (.1) =.90-1.10 d..93  1.96(.01) =.91-.95

15 Review Question 2 Spine bone density is normally distributed in young women, with a mean of 1.0 g/cm 2 and a mean of 0.1 g/cm 2. In my sample of 100 young women runners, the average spine bone density is.93 g/cm 2. What is the 95% confidence interval? a..93  1.96(.1) =.91-.1.13 b..93  (.1) =.83-1.13 c. 1.0  (.1) =.90-1.10 d..93  1.96(.01) =.91-.95 Note how the confidence interval does not cross the null value of 1.0!

16 Summary: Single population mean (known  ) Hypothesis test: Confidence Interval

17 Examples of Sample Statistics: Single population mean (known  ) Single population mean (unknown  ) Single population proportion Difference in means (ttest) Difference in proportions (Z-test) Odds ratio/risk ratio Correlation coefficient Regression coefficient …

18 Standard deviation is unknown NOTE: if we were actually doing the above experiment, i.e. sampling 100 doctors, we may not know the standard deviation of weights in the whole population (  ) ahead of time (unlike with dice, there is no theoretical variance, only a population variance that we can never know exactly without measuring the entire population).  To estimate  :  Estimated standard error of the mean: basically dividing by n twice…

19 Standard error of the mean when true sigma is unknown

20 When  is unknown, use t rather than Z! A t-distribution is like a Z distribution, except has slightly fatter tails to reflect the uncertainty added by estimating . The bigger the sample size (i.e., the bigger the sample size used to estimate  ), then the closer t becomes to Z. If n>100, t approaches Z.

21 Student’s t Distribution t 0 t (df = 5) t (df = 13) t-distributions are bell- shaped and symmetric, but have ‘fatter’ tails than the normal Standard Normal (t with df =  ) Note: t Z as n increases from “Statistics for Managers” Using Microsoft ® Excel 4 th Edition, Prentice-Hall 2004

22 Student’s t Table Upper Tail Area df.25.10.05 11.0003.0786.314 2 0.8171.886 2.920 30.7651.6382.353 t 0 2.920 The body of the table contains t values, not probabilities Let: n = 3 df = n - 1 = 2  =.10  /2 =.05  /2 =.05 from “Statistics for Managers” Using Microsoft ® Excel 4 th Edition, Prentice-Hall 2004 SEE APPENDIX A in your book!

23 t distribution values With comparison to the Z value Confidence t t t Z Level (10 d.f.) (20 d.f.) (30 d.f.) ____.80 1.372 1.325 1.310 1.28.90 1.812 1.725 1.697 1.64.95 2.228 2.086 2.042 1.96.99 3.169 2.845 2.750 2.58 Note: t Z as n increases from “Statistics for Managers” Using Microsoft ® Excel 4 th Edition, Prentice-Hall 2004

24 Practice problem You want to estimate the average ages of kids that ride a particular kid’s ride at Disneyland. You take a random sample of 8 kids exiting the ride, and find that their ages are: 2,3,4,5,6,6,7,7. a. Calculate the sample mean. b. Calculate the sample standard deviation. c. Calculate the standard error of the mean. d. Calculate the 99% confidence interval.

25 Answer (a,b) a. Calculate the sample mean. b. Calculate the sample standard deviation.

26 Answer (c) c. Calculate the standard error of the mean.

27 Answer (d) d. Calculate the 99% confidence interval. t 7,.005 =3.5

28 Review Question 3 A t-distribution: a. Is approximately a normal distribution if n>100. b. Can be used interchangeably with a normal distribution as long as the sample size is large enough. c. Reflects the uncertainty introduced when using the sample, rather than population, standard deviation. d. All of the above.

29 Review Question 3 A t-distribution: a. Is approximately a normal distribution if n>100. b. Can be used interchangeably with a normal distribution as long as the sample size is large enough. c. Reflects the uncertainty introduced when using the sample, rather than population, standard deviation. d. All of the above.

30 Example problem, class data: A two-tailed hypothesis test: A researcher claims that Stanford affiliates eat fewer than the recommended intake of 5 fruits and vegetables per week. We have data to address this claim: 20 people in the class provided data on their daily fruit and vegetable intake. Do we have evidence to dispute her claim?

31 Fruit and veggie consumption, this class… Mean=3.9 servings Median=3.5 servings Mode=3.0 servings Std Dev=1.5 servings

32 Answer 1. Define your hypotheses (null, alternative) H 0 : P(average servings)=5.0 Ha: P(average servings)≠5.0 servings (two-sided) 2. Specify your null distribution We do not know the true standard deviation of fruit and veggie consumption, so we must use a T-distribution to make inferences, rather than a Z-distribution.

33 Answer, continued 5. Reject or fail to reject (~accept) the null hypothesis Reject! Stanford affiliates eat significantly fewer than the recommended servings of fruits and veggies. T 19 critical value for p<.05, two tailed = 2.093 3. Do an experiment observed mean in our experiment =3.9 servings 4. Calculate the p-value of what you observed p-value <.05;

34 95% Confidence Interval H 0 : P(average servings)=5.0 The 95% CI excludes 5, so p-value <.05

35 Paired data (repeated measures) PatientBP Before (diastolic)BP After 110092 28984 38380 49893 510898 69590 What about these data? How do you analyze these?

36 Example problem: paired ttest PatientDiastolic BP BeforeD. BP AfterChange 110092-8 28984-5 38380-3 49893-5 510898-10 69590-5 Null Hypothesis: Average Change = 0

37 Example problem: paired ttest Change -8 -5 -3 -5 -10 -5 With 5 df, T>2.571 corresponds to p<.05 (two-sided test) Null Hypothesis: Average Change = 0

38 Example problem: paired ttest Change -8 -5 -3 -5 -10 -5 Note: does not include 0.

39 Summary: Single population mean (unknown  ) Hypothesis test: Confidence Interval

40 Summary: paired ttest Hypothesis test: Confidence Interval Where d=change over time or difference within a pair.

41 Review Question 4 If we have a p-value of 0.03 and so decide that our effect is statistically significant, what is the probability that we’re wrong (i.e., that the hypothesis test gave us a false positive)? a..03 b..06 c. Cannot tell d. 1.96 e. 95%

42 Review Question 4 If we have a p-value of 0.03 and so decide that our effect is statistically significant, what is the probability that we’re wrong (i.e., that the hypothesis test gave us a false positive)? a..03 b..06 c. Cannot tell d. 1.96 e. 95%

43 Review Question 5 Suppose we take a random sample of 100 people, both men and women. We form a 90% confidence interval of the true mean population height. Would we expect that confidence interval to be wider or narrower than if we had done everything the same but sampled only women? a. Narrower b. Wider c. It is impossible to predict

44 Review Question 5 Suppose we take a random sample of 100 people, both men and women. We form a 90% confidence interval of the true mean population height. Would we expect that confidence interval to be wider or narrower than if we had done everything the same but sampled only women? a. Narrower b. Wider c. It is impossible to predict Standard deviation of height decreases, so standard error decreases.

45 Review Question 6 Suppose we take a random sample of 100 people, both men and women. We form a 90% confidence interval of the true mean population height. Would we expect that confidence interval to be wider or narrower than if we had done everything the same except sampled 200 people? a. Narrower b. Wider c. It is impossible to predict

46 Review Question 6 Suppose we take a random sample of 100 people, both men and women. We form a 90% confidence interval of the true mean population height. Would we expect that confidence interval to be wider or narrower than if we had done everything the same except sampled 200 people? a. Narrower b. Wider c. It is impossible to predict N increases so standard error decreases.

47 Review Question 7 I am calculating the mean, median, standard deviation, and standard error for several variables (age, height, weight, income, blood pressure, etc.) from a sample of 246 patients. If I receive data for an additional 100 patients, which of the above statistics (mean, median, standard deviation, or standard error) would be expected to change substantially? a. All of them b. Mean, standard deviation, standard error c. Standard deviation, standard error d. Standard deviation only e. Standard error only

48 Review Question 7 I am calculating the mean, median, standard deviation, and standard error for several variables (age, height, weight, income, blood pressure, etc.) from a sample of 246 patients. If I receive data for an additional 100 patients, which of the above statistics (mean, median, standard deviation, or standard error) would be expected to change substantially? a. All of them b. Mean, standard deviation, standard error c. Standard deviation, standard error d. Standard deviation only e. Standard error only

49 Examples of Sample Statistics: Single population mean (known  ) Single population mean (unknown  ) Single population proportion Difference in means (ttest) Difference in proportions (Z-test) Odds ratio/risk ratio Correlation coefficient Regression coefficient …

50 Sampling distribution of a sample proportion Always a normal distribution! p=true population proportion. BUT… if you knew p you wouldn’t be doing the experiment!

51 Example You poll 100 random people in Ohio and find that 90% approve of Obama’s job as President. Form a 99% confidence interval for the true proportion of Obama- supporters in Ohio.

52 Answer

53 Key one-sample Hypothesis Tests… Test for H o : μ = μ 0 (σ 2 unknown): Test for H o : p = p o :

54 Corresponding confidence intervals… For a mean (σ 2 unknown): For a proportion:

55 Symbol overload! n: Sample size Z: Z-statistic (standard normal) t df: T-statistic (t-distribution with df degrees of freedom) p: (“p-hat”): sample proportion X: (“X-bar”): sample mean s: Sample standard deviation p 0: Null hypothesis proportion  0: Null hypothesis mean

56 Two-sample tests

57 Examples of Sample Statistics: Single population mean (known  ) Single population mean (unknown  ) Single population proportion Difference in means (ttest) Difference in proportions (Z-test) Odds ratio/risk ratio Correlation coefficient Regression coefficient …

58 The two-sample t-test

59 The two-sample T-test Is the difference in means that we observe between two groups more than we’d expect to see based on chance alone?

60 The standard error of the difference of two means ** First add the variances and then take the square root of the sum to get the standard error.

61 Distribution of differences If X and Y are the averages of n and m subjects, respectively:

62 But… As before, you usually have to use the sample SD, since you won’t know the true SD ahead of time… So, again becomes a T-distribution...

63 Estimated standard error (using pooled variance estimate) The degrees of freedom are n+m-2

64 Example: two-sample t-test In 1980, some researchers reported that “men have more mathematical ability than women” as evidenced by the 1979 SAT’s, where a sample of 30 random male adolescents had a mean score ± 1 standard deviation of 436±77 and 30 random female adolescents scored lower: 416±81 (genders were similar in educational backgrounds, socio-economic status, and age). Do you agree with the authors’ conclusions?

65 Data Summary nSample Mean Sample Standard Deviation Group 1: women 3041681 Group 2: men 3043677

66 Two-sample t-test 1. Define your hypotheses (null, alternative) H 0 : ♂-♀ math SAT = 0 Ha: ♂-♀ math SAT ≠ 0 [two-sided]

67 Two-sample t-test 2. Specify your null distribution: F and M have approximately equal standard deviations/variances, so make a “pooled” estimate of variance.

68 Two-sample t-test 3. Observed difference in our experiment = 20 points

69 Two-sample t-test 4. Calculate the p-value of what you observed 5. Do not reject null! No evidence that men are better in math ;)

70 Example 2 Example: Rosental, R. and Jacobson, L. (1966) Teachers’ expectancies: Determinates of pupils’ I.Q. gains. Psychological Reports, 19, 115-118.

71 The Experiment (note: exact numbers have been altered) Grade 3 at Oak School were given an IQ test at the beginning of the academic year (n=90). Classroom teachers were given a list of names of students in their classes who had supposedly scored in the top 20 percent; these students were identified as “ academic bloomers ” (n=18). BUT: the children on the teachers lists had actually been randomly assigned to the list. At the end of the year, the same I.Q. test was re- administered.

72 Example 2 Statistical question: Do students in the treatment group have more improvement in IQ than students in the control group? What will we actually compare? One-year change in IQ score in the treatment group vs. one-year change in IQ score in the control group.

73 “Academic bloomers” (n=18) Controls (n=72 ) Change in IQ score: 12.2 (2.0) 8.2 (2.0) Results: 12.2 points 8.2 points Difference=4 points The standard deviation of change scores was 2.0 in both groups. This affects statistical significance…

74 What does a 4-point difference mean? Before we perform any formal statistical analysis on these data, we already have a lot of information. Look at the basic numbers first; THEN consider statistical significance as a secondary guide.

75 Is the association statistically significant? This 4-point difference could reflect a true effect or it could be a fluke. The question: is a 4-point difference bigger or smaller than the expected sampling variability?

76 Hypothesis testing Null hypothesis: There is no difference between “academic bloomers” and normal students (= the difference is 0%) Step 1: Assume the null hypothesis.

77 Hypothesis Testing These predictions can be made by mathematical theory or by computer simulation. Step 2: Predict the sampling variability assuming the null hypothesis is true

78 Hypothesis Testing Step 2: Predict the sampling variability assuming the null hypothesis is true—math theory:

79 Hypothesis Testing In computer simulation, you simulate taking repeated samples of the same size from the same population and observe the sampling variability. I used computer simulation to take 1000 samples of 18 treated and 72 controls Step 2: Predict the sampling variability assuming the null hypothesis is true—computer simulation:

80 Computer Simulation Results Standard error is about 0.52

81 3. Empirical data Observed difference in our experiment = 12.2-8.2 = 4.0

82 4. P-value t-curve with 88 df’s has slightly wider cut-off’s for 95% area (t=1.99) than a normal curve (Z=1.96) p-value <.0001

83 If we ran this study 1000 times, we wouldn’t expect to get 1 result as big as a difference of 4 (under the null hypothesis). Visually…

84 5. Reject null! Conclusion: I.Q. scores can bias expectancies in the teachers’ minds and cause them to unintentionally treat “bright” students differently from those seen as less bright.

85 Confidence interval (more information!!) 95% CI for the difference: 4.0±1.99(.52) = (3.0 – 5.0) t-curve with 88 df’s has slightly wider cut- off’s for 95% area (t=1.99) than a normal curve (Z=1.96)

86 Summary: ttest, pooled variance

87 What if our standard deviation had been higher? The standard deviation for change scores in treatment and control were each 2.0. What if change scores had been much more variable—say a standard deviation of 10.0 (for both)?

88 Standard error is 0.54 Std. dev in change scores = 2.0 Std. dev in change scores = 10.0 Standard error is 2.58

89 With a std. dev. of 10.0… LESS STATISICAL POWER! Standard error is 2.58 If we ran this study 1000 times, we would expect to get  +4.0 or  –4.0 12% of the time. P-value=.12

90 Don’t forget: The paired T-test Did the control group in the previous experiment improve at all during the year? Do not apply a two-sample ttest to answer this question! After-Before yields a single sample of differences…

91 Data Summary nSample Mean Sample Standard Deviation Group 1: Change 72+8.22.0

92 Paired Ttest p-value <.0001

93 Paired Ttest Correlated (paired) data: either the same person on different occasions or pairs of people who are more similar to each other than to individuals from other pairs (husband-wife pairs, twin pairs, matched cases and controls, etc.)

94 Review Question 8 In a medical student class, the 6 people born on odd days had heights of 64.6  4 inches; the 10 people born on even days had heights of 71.1  5 inches. Height is roughly normally distributed. Which of the following best represents the correct statistical test for these data? a. b. c. d.

95 Review Question 8 In a medical student class, the 6 people born on odd days had heights of 64.6  4 inches; the 10 people born on even days had heights of 71.1  5 inches. Height is roughly normally distributed. Which of the following best represents the correct statistical test for these data? a. b. c. d.

96 Review Question 9 Fifty percent of the people born on odd days commute to school by car two or more times per week, whereas only 40 percent of people born on even days do. To test whether this difference is more than expected by chance, we would use: a. A two-sample ttest b. A paired ttest c. A one-sample proportions test d. A two-sample proportions test

97 Review Question 9 Fifty percent of the people born on odd days commute to school by car two or more times per week, whereas only 40 percent of people born on even days do. To test whether this difference is more than expected by chance, we would use: a. A two-sample ttest b. A one-sample ttest c. A one-sample proportions test d. A two-sample proportions test

98 Review Question 10 Standard error is: a. For a given variable, its standard deviation divided by the square root of n. b. A measure of the variability of a sample statistic. c. The inverse of sample size. d. A measure of the variability of a characteristic. e. All of the above.

99 Review Question 10 Standard error is: a. For a given variable, its standard deviation divided by the square root of n. b. A measure of the variability of a sample statistic. c. The inverse of sample size. d. A measure of the variability of a characteristic. e. All of the above.

100 Two sample proportions (Z test) Compare the difference in proportions between two independent samples…(binary outcome rather than continuous outcome)

101 Z-test

102 Example: Difference in proportions Research Question: Are antidepressants a risk factor for suicide attempts in children and adolescents? Example modified from: “ Antidepressant Drug Therapy and Suicide in Severely Depressed Children and Adults ”; Olfson et al. Arch Gen Psychiatry.2006;63:865- 872.

103 Example: Difference in Proportions Design: Case-control study Methods: Researchers used Medicaid records to compare prescription histories between 263 children and teenagers (6-18 years) who had attempted suicide and 1241 controls who had never attempted suicide (all subjects suffered from depression). Statistical question: Is a history of use of antidepressants more common among cases than controls?

104 Example Statistical question: Is a history of use of antidepressants more common among heart disease cases than controls? What will we actually compare? Proportion of cases who used antidepressants in the past vs. proportion of controls who did

105 No (%) of cases (n=263) No (%) of controls (n=1241 ) Any antidepressant drug ever 120 (46%) 448 (36%) 46% 36% Difference=10% Results

106 What does a 10% difference mean? Before we perform any formal statistical analysis on these data, we already have a lot of information. Look at the basic numbers first; THEN consider statistical significance as a secondary guide.

107 Is the association statistically significant? This 10% difference could reflect a true association or it could be a fluke in this particular sample. The question: is 10% bigger or smaller than the expected sampling variability?

108 Hypothesis testing Null hypothesis: There is no association between antidepressant use and suicide attempts in the target population (= the difference is 0%) Step 1: Assume the null hypothesis.

109 Hypothesis Testing These predictions can be made by mathematical theory or by computer simulation. Step 2: Predict the sampling variability assuming the null hypothesis is true

110 Hypothesis Testing Step 2: Predict the sampling variability assuming the null hypothesis is true—mathematical theory:

111 Hypothesis Testing In computer simulation, you simulate taking repeated samples of the same size from the same population and observe the sampling variability. I used computer simulation to take 1000 samples of 263 cases and 1241 controls. Step 2: Predict the sampling variability assuming the null hypothesis is true—computer simulation:

112 Also: Computer Simulation Results Standard error is about 3.3%

113 Hypothesis Testing Step 3: Do an experiment We observed a difference of 10% between cases and controls.

114 Hypothesis Testing Step 4: Calculate a p-value—mathematical theory:

115 When we ran this study 1000 times, we got 1 result as big or bigger than 10%. P-value from our simulation… We also got 3 results as small or smaller than –10%.

116 P-value From our simulation, we estimate the p-value to be: 4/1000 or.004

117 Here we reject the null. Alternative hypothesis: There is an association between antidepressant use and suicide in the target population. Hypothesis Testing Step 5: Reject or do not reject the null hypothesis.

118 What would a lack of statistical significance mean? If this study had sampled only 50 cases and 50 controls, the sampling variability would have been much higher—as shown in this computer simulation…

119 Standard error is about 10% 50 cases and 50 controls. Standard error is about 3.3% 263 cases and 1241 controls.

120 With only 50 cases and 50 controls… Standard error is about 10% If we ran this study 1000 times, we would expect to get values of 10% or higher 170 times (or 17% of the time).

121 Two-tailed p-value Two-tailed p-value = 17%x2=34%

122 Key two-sample Hypothesis Tests… Test for H o : μ x - μ y = 0 (σ 2 unknown, but roughly equal): Test for H o : p 1- p 2 = 0:

123 Corresponding confidence intervals… For a difference in means, 2 independent samples (σ 2 ’s unknown but roughly equal): For a difference in proportions, 2 independent samples:


Download ppt "Statistical Inference II. Confidence Intervals give: *A plausible range of values for a population parameter. *The precision of an estimate.(When sampling."

Similar presentations


Ads by Google