Download presentation
Presentation is loading. Please wait.
Published byDavid Tucker Modified over 9 years ago
1
STAT E102 Midterm Review March 15, 2006
2
Review Topics Populations, parameters, sampling Types of data i.e., nominal, ordinal, discrete, continuous Graphical/tabular representations of data Summary statistics Central tendency: mean, median, mode Variability: variance, standard deviation, range, interquartile range Populations, parameters, sampling Types of data i.e., nominal, ordinal, discrete, continuous Graphical/tabular representations of data Summary statistics Central tendency: mean, median, mode Variability: variance, standard deviation, range, interquartile range
3
Review Topics Normal distribution Standardization Z-statistic Probability, conditional probability, Bayes’ Theorem Diagnostic/screening tests Sensitivity, specificity, predictive value CLT, sampling distributions Normal distribution Standardization Z-statistic Probability, conditional probability, Bayes’ Theorem Diagnostic/screening tests Sensitivity, specificity, predictive value CLT, sampling distributions
4
Review Topics Hypothesis testing, p-values, alpha 1- vs. 2-sided T distribution, t-tests 1- vs. 2-sample Independent vs. paired (for 2-sample) Type I / Type II error Power, sample size estimation Hypothesis testing, p-values, alpha 1- vs. 2-sided T distribution, t-tests 1- vs. 2-sample Independent vs. paired (for 2-sample) Type I / Type II error Power, sample size estimation
5
Question #1 A clinical trial is in development to compare a new drug for Progressive Multifocal Leukoencephalopathy (PML) to placebo. It is decided that a 2-sample independent t-test will compare the groups at the end of the trial with respect to a continuous variable endpoint. Early drafts of the study protocol estimate that the sample size would be 200 (100 per group). However, a recently completed study suggests that the variability of the endpoint in the two groups is larger than originally planned (i.e., larger than assumed in the original sample size calculations). The sample size is re-estimated using the new variability estimates from the recently completed trial. The result is: A) A larger total sample size B) A smaller total sample size C) The sample size does not change D) We cannot perform a t-test on these data any longer A clinical trial is in development to compare a new drug for Progressive Multifocal Leukoencephalopathy (PML) to placebo. It is decided that a 2-sample independent t-test will compare the groups at the end of the trial with respect to a continuous variable endpoint. Early drafts of the study protocol estimate that the sample size would be 200 (100 per group). However, a recently completed study suggests that the variability of the endpoint in the two groups is larger than originally planned (i.e., larger than assumed in the original sample size calculations). The sample size is re-estimated using the new variability estimates from the recently completed trial. The result is: A) A larger total sample size B) A smaller total sample size C) The sample size does not change D) We cannot perform a t-test on these data any longer
6
Question #1 Sample size equation: n = [(z α/2 + z β )(σ) / (µ 1 - µ 0 )] 2 Variability is represented by σ in the numerator. As variability (standard deviation σ) increases, n increases. The result is: A) A larger total sample size B) A smaller total sample size C) The sample size does not change D) We cannot perform a t-test on these data any longer Sample size equation: n = [(z α/2 + z β )(σ) / (µ 1 - µ 0 )] 2 Variability is represented by σ in the numerator. As variability (standard deviation σ) increases, n increases. The result is: A) A larger total sample size B) A smaller total sample size C) The sample size does not change D) We cannot perform a t-test on these data any longer
7
Question #2 T-tests are most useful for what type of data (variables)? A) Continuous B) Ordinal C) Nominal D) Binary T-tests are most useful for what type of data (variables)? A) Continuous B) Ordinal C) Nominal D) Binary
8
Question #2 T-tests are used for continuous data. Recall: Ordinal data have natural order without defined magnitude (ex. low, moderate, high) Nominal data have categories with no order or rank (ex. gender, race) Binary data are discrete data with only two possible outcomes (ex. cancer vs. no cancer) A) Continuous B) Ordinal C) Nominal D) Binary T-tests are used for continuous data. Recall: Ordinal data have natural order without defined magnitude (ex. low, moderate, high) Nominal data have categories with no order or rank (ex. gender, race) Binary data are discrete data with only two possible outcomes (ex. cancer vs. no cancer) A) Continuous B) Ordinal C) Nominal D) Binary
9
Question #3 The probabilities that a 25- to 34-year old U.S. male’s cholesterol level belongs to one of the following intervals are listed. What is the probability that a male from this population has cholesterol <200 mg/dl? A) 0.414 B) 0.567 C) 0.847 D) 0.280 The probabilities that a 25- to 34-year old U.S. male’s cholesterol level belongs to one of the following intervals are listed. What is the probability that a male from this population has cholesterol <200 mg/dl? A) 0.414 B) 0.567 C) 0.847 D) 0.280 Cholesterol (mg/dl) Probability 80-1190.012 120-1590.141 160-1990.414 200-2390.280 240-2490.108 280-3190.032 320-3590.008 360-3990.005
10
Question #3 The probabilities that a 25- to 34-year old U.S. male’s cholesterol level belongs to one of the following intervals are listed. What is the probability that a male from this population has cholesterol <200 mg/dl? A) 0.414 B) 0.567 C) 0.847 D) 0.280 The probabilities that a 25- to 34-year old U.S. male’s cholesterol level belongs to one of the following intervals are listed. What is the probability that a male from this population has cholesterol <200 mg/dl? A) 0.414 B) 0.567 C) 0.847 D) 0.280 Cholesterol (mg/dl) Probability 80-1190.012 120-1590.141 160-1990.414 200-2390.280 240-2490.108 280-3190.032 320-3590.008 360-3990.005
11
Question #3 These categories are mutually exclusive. P(<200) =P(80-119)+P(120-159)+P(160-199) =0.012+0.141+0.414 =0.567 What is the probability that a male from this population has cholesterol <200 mg/dl? A) 0.414 B) 0.567 C) 0.847 D) 0.280 These categories are mutually exclusive. P(<200) =P(80-119)+P(120-159)+P(160-199) =0.012+0.141+0.414 =0.567 What is the probability that a male from this population has cholesterol <200 mg/dl? A) 0.414 B) 0.567 C) 0.847 D) 0.280
12
Question #4 The probabilities that a 25- to 34- year old U.S. male’s cholesterol level belongs to one of the following intervals are listed. Given that a person from this population has serum cholesterol level <240 mg/dL, compute the conditional probability that he will have cholesterol <200 mg/dL. A) 0.669 B) 0.567 C) 0.331 D) 0.280 The probabilities that a 25- to 34- year old U.S. male’s cholesterol level belongs to one of the following intervals are listed. Given that a person from this population has serum cholesterol level <240 mg/dL, compute the conditional probability that he will have cholesterol <200 mg/dL. A) 0.669 B) 0.567 C) 0.331 D) 0.280 Cholesterol (mg/dl) Probability 80-1190.012 120-1590.141 160-1990.414 200-2390.280 240-2490.108 280-3190.032 320-3590.008 360-3990.005
13
Question #4 A=cholesterol <240 B=cholesterol <200 P(B|A) = P(A B) / P(A) P(A B) = P(<240 & <200) = 0.567 (calc in #3) P(A) = P(<240) =.012+.141+.414+.280 = 0.847 P(B|A) = 0.567 / 0.847 = 0.669 A) 0.669 B) 0.567 C) 0.331 D) 0.280 A=cholesterol <240 B=cholesterol <200 P(B|A) = P(A B) / P(A) P(A B) = P(<240 & <200) = 0.567 (calc in #3) P(A) = P(<240) =.012+.141+.414+.280 = 0.847 P(B|A) = 0.567 / 0.847 = 0.669 A) 0.669 B) 0.567 C) 0.331 D) 0.280 Cholesterol (mg/dl) Probability 80-1190.012 120-1590.141 160-1990.414 200-2390.280 240-2490.108 280-3190.032 320-3590.008 360-3990.005
14
Question #5 In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. What is the sensitivity and specificity of this test? A) 0.734 and 0.918 B) 0.918 and 0.734 C) 0.833 and 0.813 D) 0.813 and 0.833 In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. What is the sensitivity and specificity of this test? A) 0.734 and 0.918 B) 0.918 and 0.734 C) 0.833 and 0.813 D) 0.813 and 0.833 + Diagnosis- DiagnosisTotal Disease501060 No Disease103449552 Total153459612
15
Question #5 Sensitivity = P(T + |D + ) = 50/60 = 0.833 Specificity = P(T - |D - ) = 449/552 = 0.813 Sensitivity = P(T + |D + ) = 50/60 = 0.833 Specificity = P(T - |D - ) = 449/552 = 0.813 + Diagnosis- DiagnosisTotal Disease501060 No Disease103449552 Total153459612
16
Question #5 In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. What is the sensitivity and specificity of this test? A) 0.734 and 0.918 B) 0.918 and 0.734 C) 0.833 and 0.813 D) 0.813 and 0.833 In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. What is the sensitivity and specificity of this test? A) 0.734 and 0.918 B) 0.918 and 0.734 C) 0.833 and 0.813 D) 0.813 and 0.833 + Diagnosis- DiagnosisTotal Disease501060 No Disease103449552 Total153459612
17
Question #6 In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. What is the probability that a subject has the disease given that the test was positive? A) 19% B) 81% C) 32% D) 68% In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. What is the probability that a subject has the disease given that the test was positive? A) 19% B) 81% C) 32% D) 68% + Diagnosis- DiagnosisTotal Disease501060 No Disease103449552 Total153459612
18
Question #6 Prevalence = 0.05 Positive Predictive Value = P(D+|T+) Bayes’ Theorem: P(D + |T + ) = P(D + )P(T + |D + ) / [P(D + )P(T + |D + ) + P(D - )P(T + |D - )] = (Prev)(Sens) / (Prev)(Sens) + (1-Prev)(1-Spec) = (0.05)(0.833) / (0.05)(0.833) + (0.95)(0.187) = (0.042) / (0.042+0.178) = 0.191 Prevalence = 0.05 Positive Predictive Value = P(D+|T+) Bayes’ Theorem: P(D + |T + ) = P(D + )P(T + |D + ) / [P(D + )P(T + |D + ) + P(D - )P(T + |D - )] = (Prev)(Sens) / (Prev)(Sens) + (1-Prev)(1-Spec) = (0.05)(0.833) / (0.05)(0.833) + (0.95)(0.187) = (0.042) / (0.042+0.178) = 0.191 + Diagnosis- DiagnosisTotal Disease501060 No Disease103449552 Total153459612
19
Question #6 In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. What is the probability that a subject has the disease given that the test was positive? A) 19% B) 81% C) 32% D) 68% In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. What is the probability that a subject has the disease given that the test was positive? A) 19% B) 81% C) 32% D) 68% + Diagnosis- DiagnosisTotal Disease501060 No Disease103449552 Total153459612
20
Question #7 A small p-value for a hypothesis test (i.e., <0.05) signifies which of the following? A) The conditional probability that the null hypothesis is true is <0.05 B) The probability that the alternative hypothesis is true is >0.95 C) The conditional probability of the data being this extreme if the null hypothesis was true is <0.05 D) The conditional probability of the data being this extreme if the alternative hypothesis is true is >0.05 A small p-value for a hypothesis test (i.e., <0.05) signifies which of the following? A) The conditional probability that the null hypothesis is true is <0.05 B) The probability that the alternative hypothesis is true is >0.95 C) The conditional probability of the data being this extreme if the null hypothesis was true is <0.05 D) The conditional probability of the data being this extreme if the alternative hypothesis is true is >0.05
21
Question #7 A small p-value for a hypothesis test (i.e., <0.05) signifies which of the following? A) The conditional probability that the null hypothesis is true is <0.05 B) The probability that the alternative hypothesis is true is >0.95 C) The conditional probability of the data being this extreme if the null hypothesis was true is <0.05 D) The conditional probability of the data being this extreme if the alternative hypothesis is true is >0.05 A small p-value for a hypothesis test (i.e., <0.05) signifies which of the following? A) The conditional probability that the null hypothesis is true is <0.05 B) The probability that the alternative hypothesis is true is >0.95 C) The conditional probability of the data being this extreme if the null hypothesis was true is <0.05 D) The conditional probability of the data being this extreme if the alternative hypothesis is true is >0.05
22
Question #8 The following output comes from an experiment comparing the levels of carboxyhemoglobin for a group of cigarette smokers of size n1=75 and a group of non-smokers of size n2=121. It is believed that the mean carboxyhemoglobin level of the smokers must be higher than the mean level of non-smokers. Assume that the underlying variances are identical and use alpha=0.05. Smokers: Number of obs = 75 Non-smokers: Number of obs = 121 ------------------------------------------------------------------------------------------------------ Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] --------------+-------------------------------------------------------------------------------------- Smokers | 4.1.2309401 17.7535 0.0000 3.639842 4.560158 Non-smok | 1.3.1818182 7.15 0.0000.9400127 1.659987 --------------+-------------------------------------------------------------------------------------- diff | 2.8.2939238 9.52628 0.0000 2.220304 3.379696 ------------------------------------------------------------------------------------------------------ Degrees of freedom: ??? Ho: mean(smoker) - mean(non-smoker) = diff = 0 Ha: diff 0 t = 9.5263 t = 9.5263 t = 9.5263 P |t| = 0.0000 P > t = 0.0000
23
Question #8 What is the proper test of this hypothesis? A) A z test B) A one-sample t test C) A paired t test D) An unpaired t test What is the proper test of this hypothesis? A) A z test B) A one-sample t test C) A paired t test D) An unpaired t test Smokers: Number of obs = 75 Non-smokers: Number of obs = 121 ------------------------------------------------------------------------------------------------------ Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] --------------+-------------------------------------------------------------------------------------- Smokers | 4.1.2309401 17.7535 0.0000 3.639842 4.560158 Non-smok | 1.3.1818182 7.15 0.0000.9400127 1.659987 --------------+-------------------------------------------------------------------------------------- diff | 2.8.2939238 9.52628 0.0000 2.220304 3.379696 ------------------------------------------------------------------------------------------------------ Degrees of freedom: ??? Ho: mean(smoker) - mean(non-smoker) = diff = 0 Ha: diff 0 t = 9.5263 t = 9.5263 t = 9.5263 P |t| = 0.0000 P > t = 0.0000
24
Question #8 What is the proper test of this hypothesis? A) A z test B) A one-sample t test C) A paired t test D) An unpaired t test What is the proper test of this hypothesis? A) A z test B) A one-sample t test C) A paired t test D) An unpaired t test Smokers: Number of obs = 75 Non-smokers: Number of obs = 121 ------------------------------------------------------------------------------------------------------ Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] --------------+-------------------------------------------------------------------------------------- Smokers | 4.1.2309401 17.7535 0.0000 3.639842 4.560158 Non-smok | 1.3.1818182 7.15 0.0000.9400127 1.659987 --------------+-------------------------------------------------------------------------------------- diff | 2.8.2939238 9.52628 0.0000 2.220304 3.379696 ------------------------------------------------------------------------------------------------------ Degrees of freedom: ??? Ho: mean(smoker) - mean(non-smoker) = diff = 0 Ha: diff 0 t = 9.5263 t = 9.5263 t = 9.5263 P |t| = 0.0000 P > t = 0.0000
25
Question #9 The following output comes from an experiment comparing the levels of carboxyhemoglobin for a group of cigarette smokers of size n1=75 and a group of non-smokers of size n2=121. It is believed that the mean carboxyhemoglobin level of the smokers must be higher than the mean level of non-smokers. Assume that the underlying variances are identical and use alpha=0.05. What are the null and alternative hypotheses for this test? µ 1 is the mean carboxyhemoglobin level among smokers, and µ 2 is the mean level among non-smokers. A) H 0 : µ 1 = µ 2 ; H A : µ 1 ≠ µ 2 B) H 0 : µ 1 ≤ µ 2 ; H A : µ 1 ≠ µ 2 C) H 0 : µ 1 ≤ µ 2 ; H A : µ 1 > µ 2 D) H 0 : µ 1 ≥ µ 2 ; H A : µ 1 < µ 2 The following output comes from an experiment comparing the levels of carboxyhemoglobin for a group of cigarette smokers of size n1=75 and a group of non-smokers of size n2=121. It is believed that the mean carboxyhemoglobin level of the smokers must be higher than the mean level of non-smokers. Assume that the underlying variances are identical and use alpha=0.05. What are the null and alternative hypotheses for this test? µ 1 is the mean carboxyhemoglobin level among smokers, and µ 2 is the mean level among non-smokers. A) H 0 : µ 1 = µ 2 ; H A : µ 1 ≠ µ 2 B) H 0 : µ 1 ≤ µ 2 ; H A : µ 1 ≠ µ 2 C) H 0 : µ 1 ≤ µ 2 ; H A : µ 1 > µ 2 D) H 0 : µ 1 ≥ µ 2 ; H A : µ 1 < µ 2
26
Question #9 The following output comes from an experiment comparing the levels of carboxyhemoglobin for a group of cigarette smokers of size n1=75 and a group of non-smokers of size n2=121. It is believed that the mean carboxyhemoglobin level of the smokers must be higher than the mean level of non-smokers. Assume that the underlying variances are identical and use alpha=0.05. What are the null and alternative hypotheses for this test? µ 1 is the mean carboxyhemoglobin level among smokers, and µ 2 is the mean level among non-smokers. A) H 0 : µ 1 = µ 2 ; H A : µ 1 ≠ µ 2 B) H 0 : µ 1 ≤ µ 2 ; H A : µ 1 ≠ µ 2 C) H 0 : µ 1 ≤ µ 2 ; H A : µ 1 > µ 2 D) H 0 : µ 1 ≥ µ 2 ; H A : µ 1 < µ 2 The following output comes from an experiment comparing the levels of carboxyhemoglobin for a group of cigarette smokers of size n1=75 and a group of non-smokers of size n2=121. It is believed that the mean carboxyhemoglobin level of the smokers must be higher than the mean level of non-smokers. Assume that the underlying variances are identical and use alpha=0.05. What are the null and alternative hypotheses for this test? µ 1 is the mean carboxyhemoglobin level among smokers, and µ 2 is the mean level among non-smokers. A) H 0 : µ 1 = µ 2 ; H A : µ 1 ≠ µ 2 B) H 0 : µ 1 ≤ µ 2 ; H A : µ 1 ≠ µ 2 C) H 0 : µ 1 ≤ µ 2 ; H A : µ 1 > µ 2 D) H 0 : µ 1 ≥ µ 2 ; H A : µ 1 < µ 2
27
Question #10 The following output comes from an experiment comparing the levels of carboxyhemoglobin for a group of cigarette smokers of size n1=75 and a group of non-smokers of size n2=121. It is believed that the mean carboxyhemoglobin level of the smokers must be higher than the mean level of non-smokers. Assume that the underlying variances are identical and use alpha=0.05. What is the number of the degrees of freedom associated with this test? A) df = n 1 – 1 = 74 B) df = n 2 – 1 = 121 C) df = n 1 + n 2 – 1 = 195 D) df = n 1 + n 2 – 2 = 194 The following output comes from an experiment comparing the levels of carboxyhemoglobin for a group of cigarette smokers of size n1=75 and a group of non-smokers of size n2=121. It is believed that the mean carboxyhemoglobin level of the smokers must be higher than the mean level of non-smokers. Assume that the underlying variances are identical and use alpha=0.05. What is the number of the degrees of freedom associated with this test? A) df = n 1 – 1 = 74 B) df = n 2 – 1 = 121 C) df = n 1 + n 2 – 1 = 195 D) df = n 1 + n 2 – 2 = 194
28
Question #10 This is a t-test of two independent samples, therefore: df = n 1 + n 2 – 2 =75 + 121 – 2 = 194 What is the number of the degrees of freedom associated with this test? A) df = n 1 – 1 = 74 B) df = n 2 – 1 = 121 C) df = n 1 + n 2 – 1 = 195 D) df = n 1 + n 2 – 2 = 194 This is a t-test of two independent samples, therefore: df = n 1 + n 2 – 2 =75 + 121 – 2 = 194 What is the number of the degrees of freedom associated with this test? A) df = n 1 – 1 = 74 B) df = n 2 – 1 = 121 C) df = n 1 + n 2 – 1 = 195 D) df = n 1 + n 2 – 2 = 194
29
Question #11 Based on the output, what is the decision? A) Reject H 0 ; The carboxyhemoglobin levels are higher among non-smokers B) Do not reject H 0 ; The carboxyhemoglobin levels are equal C) Do not reject H 0 ; The carboxyhemoglobin levels are lower among non- smokers D) Reject H 0 ; The carboxyhemoglobin levels are higher among smokers Based on the output, what is the decision? A) Reject H 0 ; The carboxyhemoglobin levels are higher among non-smokers B) Do not reject H 0 ; The carboxyhemoglobin levels are equal C) Do not reject H 0 ; The carboxyhemoglobin levels are lower among non- smokers D) Reject H 0 ; The carboxyhemoglobin levels are higher among smokers Smokers: Number of obs = 75 Non-smokers: Number of obs = 121 ------------------------------------------------------------------------------------------------------ Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] --------------+-------------------------------------------------------------------------------------- Smokers | 4.1.2309401 17.7535 0.0000 3.639842 4.560158 Non-smok | 1.3.1818182 7.15 0.0000.9400127 1.659987 --------------+-------------------------------------------------------------------------------------- diff | 2.8.2939238 9.52628 0.0000 2.220304 3.379696 ------------------------------------------------------------------------------------------------------ Degrees of freedom: ??? Ho: mean(smoker) - mean(non-smoker) = diff = 0 Ha: diff 0 t = 9.5263 t = 9.5263 t = 9.5263 P |t| = 0.0000 P > t = 0.0000
30
Question #11 To ease interpretation, rearrange the hypotheses H 0 : µ 1 ≤ µ 2 can be rewritten as H 0 : µ 1 - µ 2 ≤ 0 H A : µ 1 > µ 2 can be rewritten as H A : µ 1 - µ 2 > 0 Now select the appropriate output (circled) p < 0.05; reject H 0 and conclude that smokers have higher carboxyhemoglobin levels than non-smokers To ease interpretation, rearrange the hypotheses H 0 : µ 1 ≤ µ 2 can be rewritten as H 0 : µ 1 - µ 2 ≤ 0 H A : µ 1 > µ 2 can be rewritten as H A : µ 1 - µ 2 > 0 Now select the appropriate output (circled) p < 0.05; reject H 0 and conclude that smokers have higher carboxyhemoglobin levels than non-smokers Smokers: Number of obs = 75 Non-smokers: Number of obs = 121 ------------------------------------------------------------------------------------------------------ Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] --------------+-------------------------------------------------------------------------------------- Smokers | 4.1.2309401 17.7535 0.0000 3.639842 4.560158 Non-smok | 1.3.1818182 7.15 0.0000.9400127 1.659987 --------------+-------------------------------------------------------------------------------------- diff | 2.8.2939238 9.52628 0.0000 2.220304 3.379696 ------------------------------------------------------------------------------------------------------ Degrees of freedom: ??? Ho: mean(smoker) - mean(non-smoker) = diff = 0 Ha: diff 0 t = 9.5263 t = 9.5263 t = 9.5263 P |t| = 0.0000 P > t = 0.0000
31
Question #11 Based on the output, what is the decision? A) Reject H 0 ; The carboxyhemoglobin levels are higher among non-smokers B) Do not reject H 0 ; The carboxyhemoglobin levels are equal C) Do not reject H 0 ; The carboxyhemoglobin levels are lower among non- smokers D) Reject H 0 ; The carboxyhemoglobin levels are higher among smokers Based on the output, what is the decision? A) Reject H 0 ; The carboxyhemoglobin levels are higher among non-smokers B) Do not reject H 0 ; The carboxyhemoglobin levels are equal C) Do not reject H 0 ; The carboxyhemoglobin levels are lower among non- smokers D) Reject H 0 ; The carboxyhemoglobin levels are higher among smokers Smokers: Number of obs = 75 Non-smokers: Number of obs = 121 ------------------------------------------------------------------------------------------------------ Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] --------------+-------------------------------------------------------------------------------------- Smokers | 4.1.2309401 17.7535 0.0000 3.639842 4.560158 Non-smok | 1.3.1818182 7.15 0.0000.9400127 1.659987 --------------+-------------------------------------------------------------------------------------- diff | 2.8.2939238 9.52628 0.0000 2.220304 3.379696 ------------------------------------------------------------------------------------------------------ Degrees of freedom: ??? Ho: mean(smoker) - mean(non-smoker) = diff = 0 Ha: diff 0 t = 9.5263 t = 9.5263 t = 9.5263 P |t| = 0.0000 P > t = 0.0000
32
Question #12 Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable monitored is pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline level. The results of one particular hour based on N=17 samples is a mean of 8.42 and a standard deviation of 0.16. Interest focuses on whether there is sufficient evidence to conclude that the mean pH level in the water differs from 8.5. The hypotheses are then: A) H 0 : µ = 8.5, H A : µ ≠ 8.5 B) H 0 : µ ≠ 8.5, H A : µ = 8.5 C) H 0 : µ = 0, H A : µ ≠ 0 D) H 0 : µ = 0, H A : µ ≠ 8.5 Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable monitored is pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline level. The results of one particular hour based on N=17 samples is a mean of 8.42 and a standard deviation of 0.16. Interest focuses on whether there is sufficient evidence to conclude that the mean pH level in the water differs from 8.5. The hypotheses are then: A) H 0 : µ = 8.5, H A : µ ≠ 8.5 B) H 0 : µ ≠ 8.5, H A : µ = 8.5 C) H 0 : µ = 0, H A : µ ≠ 0 D) H 0 : µ = 0, H A : µ ≠ 8.5
33
Question #12 Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable monitored is pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline level. The results of one particular hour based on N=17 samples is a mean of 8.42 and a standard deviation of 0.16. Interest focuses on whether there is sufficient evidence to conclude that the mean pH level in the water differs from 8.5. The hypotheses are then: A) H 0 : µ = 8.5, H A : µ ≠ 8.5 B) H 0 : µ ≠ 8.5, H A : µ = 8.5 C) H 0 : µ = 0, H A : µ ≠ 0 D) H 0 : µ = 0, H A : µ ≠ 8.5 Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable monitored is pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline level. The results of one particular hour based on N=17 samples is a mean of 8.42 and a standard deviation of 0.16. Interest focuses on whether there is sufficient evidence to conclude that the mean pH level in the water differs from 8.5. The hypotheses are then: A) H 0 : µ = 8.5, H A : µ ≠ 8.5 B) H 0 : µ ≠ 8.5, H A : µ = 8.5 C) H 0 : µ = 0, H A : µ ≠ 0 D) H 0 : µ = 0, H A : µ ≠ 8.5
34
Question #13 Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable monitored is pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline level. The results of one particular hour based on N=17 samples is a mean of 8.42 and a standard deviation of 0.16. The degrees of freedom for this test is: A) 7 B) 16 C) 8.5 D) Not applicable Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable monitored is pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline level. The results of one particular hour based on N=17 samples is a mean of 8.42 and a standard deviation of 0.16. The degrees of freedom for this test is: A) 7 B) 16 C) 8.5 D) Not applicable
35
Question #13 This is a one-sample t-test. *NOTE* Do not be confused by the use of the word “samples” in the set-up…it refers to the number of water samples. df = n – 1 = 17 – 1 = 16 The degrees of freedom for this test is: A) 7 B) 16 C) 8.5 D) Not applicable This is a one-sample t-test. *NOTE* Do not be confused by the use of the word “samples” in the set-up…it refers to the number of water samples. df = n – 1 = 17 – 1 = 16 The degrees of freedom for this test is: A) 7 B) 16 C) 8.5 D) Not applicable
36
Question #14 Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable monitored is pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline level. The results of one particular hour based on N=17 samples is a mean of 8.42 and a standard deviation of 0.16. Is there sufficient evidence to conclude that the mean pH level in the water differs from 8.5 at the 0.05 level of significance? A) Yes, because the sample mean was 8.42 B) Yes, since p<0.05 C) No, since p>0.05 D) No, since the sample mean (8.42) is practically equal to 8.5 Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable monitored is pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline level. The results of one particular hour based on N=17 samples is a mean of 8.42 and a standard deviation of 0.16. Is there sufficient evidence to conclude that the mean pH level in the water differs from 8.5 at the 0.05 level of significance? A) Yes, because the sample mean was 8.42 B) Yes, since p<0.05 C) No, since p>0.05 D) No, since the sample mean (8.42) is practically equal to 8.5
37
Question #14 Carry out the one-sample, two-sided t-test at alpha=0.05: _ t = (x - µ) / (s / √n) = (8.42-8.5) / (0.16/√17) = -2.062; 16 df; 2(.025)<p<2(.05) p>0.05; do not reject H 0 In STATA: ttesti 17 8.42 0.16 8.5 p=0.0559; do not reject H 0 Is there sufficient evidence to conclude that the mean pH level in the water differs from 8.5 at the 0.05 level of significance? A) Yes, because the sample mean was 8.42 B) Yes, since p<0.05 C) No, since p>0.05 D) No, since the sample mean (8.42) is practically equal to 8.5 Carry out the one-sample, two-sided t-test at alpha=0.05: _ t = (x - µ) / (s / √n) = (8.42-8.5) / (0.16/√17) = -2.062; 16 df; 2(.025)<p<2(.05) p>0.05; do not reject H 0 In STATA: ttesti 17 8.42 0.16 8.5 p=0.0559; do not reject H 0 Is there sufficient evidence to conclude that the mean pH level in the water differs from 8.5 at the 0.05 level of significance? A) Yes, because the sample mean was 8.42 B) Yes, since p<0.05 C) No, since p>0.05 D) No, since the sample mean (8.42) is practically equal to 8.5
38
Question #15 A clinical trial is conducted to compare two treatments for asthma. Asthmatics are split into two groups with one group receiving Treatment A and the other group receiving Treatment B. At the end of the trial, the groups are compared with respect to FEV (forced expiratory volume). Differences found at the end of the trial may be due to: A) Treatment effect differences B) Chance C) Group differences with respect to other variables (e.g. sex, age, race, severity of asthma, etc.) D) All of the above A clinical trial is conducted to compare two treatments for asthma. Asthmatics are split into two groups with one group receiving Treatment A and the other group receiving Treatment B. At the end of the trial, the groups are compared with respect to FEV (forced expiratory volume). Differences found at the end of the trial may be due to: A) Treatment effect differences B) Chance C) Group differences with respect to other variables (e.g. sex, age, race, severity of asthma, etc.) D) All of the above
39
Question #15 A clinical trial is conducted to compare two treatments for asthma. Asthmatics are split into two groups with one group receiving Treatment A and the other group receiving Treatment B. At the end of the trial, the groups are compared with respect to FEV (forced expiratory volume). Differences found at the end of the trial may be due to: A) Treatment effect differences B) Chance C) Group differences with respect to other variables (e.g. sex, age, race, severity of asthma, etc.) D) All of the above A clinical trial is conducted to compare two treatments for asthma. Asthmatics are split into two groups with one group receiving Treatment A and the other group receiving Treatment B. At the end of the trial, the groups are compared with respect to FEV (forced expiratory volume). Differences found at the end of the trial may be due to: A) Treatment effect differences B) Chance C) Group differences with respect to other variables (e.g. sex, age, race, severity of asthma, etc.) D) All of the above
40
Question #16 A large study has determined that the diastolic blood pressure among women ages 18-74 is normally distributed with mean 70 mm Hg and a standard deviation of σ=10 mm Hg. What is the probability that a randomly chosen woman will have diastolic blood pressure lower than 89.6 mm Hg? A) 2.5% B) 5% C) 97.5% D) 95% A large study has determined that the diastolic blood pressure among women ages 18-74 is normally distributed with mean 70 mm Hg and a standard deviation of σ=10 mm Hg. What is the probability that a randomly chosen woman will have diastolic blood pressure lower than 89.6 mm Hg? A) 2.5% B) 5% C) 97.5% D) 95%
41
Question #16 Z = X - µ / σ = (89.6-70) / 10 = 1.96 P(x 1.96) = 1 – 0.025 = 0.975 Or, in STATA: display normprob(1.96) = 0.975 What is the probability that a randomly chosen woman will have diastolic blood pressure lower than 89.6 mm Hg? A) 2.5% B) 5% C) 97.5% D) 95% Z = X - µ / σ = (89.6-70) / 10 = 1.96 P(x 1.96) = 1 – 0.025 = 0.975 Or, in STATA: display normprob(1.96) = 0.975 What is the probability that a randomly chosen woman will have diastolic blood pressure lower than 89.6 mm Hg? A) 2.5% B) 5% C) 97.5% D) 95%
42
Question #17 A large study has determined that the diastolic blood pressure among women ages 18-74 is normally distributed with mean 70 mm Hg and a standard deviation of σ=10 mm Hg. Suppose that you measure the diastolic blood pressure in n=25 women. What is the sampling distribution of X 25, the mean diastolic blood pressure of the women in the sample? A) Normal, µ xbar =70 mmHg and σ xbar =10 mmHg B) Normal, µ xbar =7 mmHg and σ xbar =10 mmHg C) Normal, µ xbar =70 mmHg and σ xbar =2 mmHg D) Normal, µ xbar =7 mmHg and σ xbar =2 mmHg A large study has determined that the diastolic blood pressure among women ages 18-74 is normally distributed with mean 70 mm Hg and a standard deviation of σ=10 mm Hg. Suppose that you measure the diastolic blood pressure in n=25 women. What is the sampling distribution of X 25, the mean diastolic blood pressure of the women in the sample? A) Normal, µ xbar =70 mmHg and σ xbar =10 mmHg B) Normal, µ xbar =7 mmHg and σ xbar =10 mmHg C) Normal, µ xbar =70 mmHg and σ xbar =2 mmHg D) Normal, µ xbar =7 mmHg and σ xbar =2 mmHg
43
Question #17 If the underlying distribution is normal, the sampling distribution is assumed to be normal. µ xbar = µ 0 = 70 mmHg σ xbar = σ 0 / √n = 10 / √25 = 2 mmHg Suppose that you measure the diastolic blood pressure in n=25 women. What is the sampling distribution of X 25, the mean diastolic blood pressure of the women in the sample? A) Normal, µ xbar =70 mmHg and σ xbar =10 mmHg B) Normal, µ xbar =7 mmHg and σ xbar =10 mmHg C) Normal, µ xbar =70 mmHg and σ xbar =2 mmHg D) Normal, µ xbar =7 mmHg and σ xbar =2 mmHg If the underlying distribution is normal, the sampling distribution is assumed to be normal. µ xbar = µ 0 = 70 mmHg σ xbar = σ 0 / √n = 10 / √25 = 2 mmHg Suppose that you measure the diastolic blood pressure in n=25 women. What is the sampling distribution of X 25, the mean diastolic blood pressure of the women in the sample? A) Normal, µ xbar =70 mmHg and σ xbar =10 mmHg B) Normal, µ xbar =7 mmHg and σ xbar =10 mmHg C) Normal, µ xbar =70 mmHg and σ xbar =2 mmHg D) Normal, µ xbar =7 mmHg and σ xbar =2 mmHg
44
Question #18 A large study has determined that the diastolic blood pressure among women ages 18-74 is normally distributed with mean 70 mm Hg and a standard deviation of σ=10 mm Hg. What is the lower 5 th percentile of the above distribution (i.e., the point below which, the mean diastolic blood pressures of these 25-women samples will be 5% of the time)? A) 66.08 mmHg B) 66.71 mmHg C) 50.40 mmHg D) 27.55 mmHg A large study has determined that the diastolic blood pressure among women ages 18-74 is normally distributed with mean 70 mm Hg and a standard deviation of σ=10 mm Hg. What is the lower 5 th percentile of the above distribution (i.e., the point below which, the mean diastolic blood pressures of these 25-women samples will be 5% of the time)? A) 66.08 mmHg B) 66.71 mmHg C) 50.40 mmHg D) 27.55 mmHg
45
Question #18 From Z table, we know that Z=-1.645 cuts off the lower 5 th percentile. Alternatively, can get this in STATA: display invnorm(.05) Rearrange Z equation and solve for X bar : Z = (X bar - µ xbar ) / σ xbar X bar = (Z)(σ xbar ) + µ xbar = (-1.645)(2) + 70 = 66.71 What is the lower 5 th percentile of the above distribution (i.e., the point below which, the mean diastolic blood pressures of these 25- women samples will be 5% of the time)? A) 66.08 mmHg B) 66.71 mmHg C) 50.40 mmHg D) 27.55 mmHg From Z table, we know that Z=-1.645 cuts off the lower 5 th percentile. Alternatively, can get this in STATA: display invnorm(.05) Rearrange Z equation and solve for X bar : Z = (X bar - µ xbar ) / σ xbar X bar = (Z)(σ xbar ) + µ xbar = (-1.645)(2) + 70 = 66.71 What is the lower 5 th percentile of the above distribution (i.e., the point below which, the mean diastolic blood pressures of these 25- women samples will be 5% of the time)? A) 66.08 mmHg B) 66.71 mmHg C) 50.40 mmHg D) 27.55 mmHg
46
Question #19 Consider the following: a hypothesis test was performed. The resulting p-value was higher than the pre-specified alpha and thus there was a failure to reject the null hypothesis. Which of the following are correct? A) The null hypothesis must be true B) An evaluation of power and type II error may be warranted C) It is still feasible that the alternative hypothesis is true D) Both B and C Consider the following: a hypothesis test was performed. The resulting p-value was higher than the pre-specified alpha and thus there was a failure to reject the null hypothesis. Which of the following are correct? A) The null hypothesis must be true B) An evaluation of power and type II error may be warranted C) It is still feasible that the alternative hypothesis is true D) Both B and C
47
Question #19 We haven’t proven the null by failing to reject it. It is still possible that the alternative is actually true. Recall that power (1-β) is the probability of rejecting the null when it is false. β is the probability of making a Type II error (failing to reject the null when it is false). In this case it is possible that statistical power was too low, leading to a Type II error where we have failed to reject H 0 when it is actually false. An evaluation is appropriate. A) The null hypothesis must be true B) An evaluation of power and type II error may be warranted C) It is still feasible that the alternative hypothesis is true D) Both B and C We haven’t proven the null by failing to reject it. It is still possible that the alternative is actually true. Recall that power (1-β) is the probability of rejecting the null when it is false. β is the probability of making a Type II error (failing to reject the null when it is false). In this case it is possible that statistical power was too low, leading to a Type II error where we have failed to reject H 0 when it is actually false. An evaluation is appropriate. A) The null hypothesis must be true B) An evaluation of power and type II error may be warranted C) It is still feasible that the alternative hypothesis is true D) Both B and C
48
Question #20 The mean cholesterol level of healthy 20-74 year old males in the US is 211 mg/mL, with an unknown standard deviation. The average cholesterol level among 16 hypertensive males was 232 mg/mL. Conduct a statistical test at the 10% level about whether hypertension is associated with higher cholesterol levels (consult the STATA output below). Assuming that the α -level of the test is 10%, what can you conclude? The mean cholesterol level of healthy 20-74 year old males in the US is 211 mg/mL, with an unknown standard deviation. The average cholesterol level among 16 hypertensive males was 232 mg/mL. Conduct a statistical test at the 10% level about whether hypertension is associated with higher cholesterol levels (consult the STATA output below). Assuming that the α -level of the test is 10%, what can you conclude? Number of obs = 16 --------------------------------------------------------------------------------------------------- Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] ------------+-------------------------------------------------------------------------------------- x | 232 12 19.3333 0.0000 206.4226 257.5774 ---------------------------------------------------------------------------------------------------- Degrees of freedom: 15 Ho: mean(x) = 211 Ha: mean 211 t = 1.7500 t = 1.7500 t = 1.7500 P |t| = 0.1005 P > t = 0.0503
49
Question #20 This is a one-sample, one-sided t-test. Set up the hypotheses and select relevant output: H 0 : µ ≤ 211 H A : µ > 211 The alpha level is 0.10. The p-value is 0.0503. p<.10; reject H 0 ; conclude that hypertensive males have higher cholesterol levels than the healthy population of US males 20-74. This is a one-sample, one-sided t-test. Set up the hypotheses and select relevant output: H 0 : µ ≤ 211 H A : µ > 211 The alpha level is 0.10. The p-value is 0.0503. p<.10; reject H 0 ; conclude that hypertensive males have higher cholesterol levels than the healthy population of US males 20-74. Number of obs = 16 --------------------------------------------------------------------------------------------------- Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] ------------+-------------------------------------------------------------------------------------- x | 232 12 19.3333 0.0000 206.4226 257.5774 ---------------------------------------------------------------------------------------------------- Degrees of freedom: 15 Ho: mean(x) = 211 Ha: mean 211 t = 1.7500 t = 1.7500 t = 1.7500 P |t| = 0.1005 P > t = 0.0503
50
Question #20 Assuming that the α -level of the test is 10%, what can you conclude? A) The p-value of the test is 0.1005 and thus we fail to reject the null hypothesis and thus conclude that hypertensives have a higher cholesterol level. B) The p-value of the test is 0.9497 and thus we fail to reject the null hypothesis and thus conclude that hypertensives have a lower cholesterol level. C) The p-value of the test is 0.0503 and thus we fail to reject the null hypothesis and conclude that hypertension is not associated with higher cholesterol levels. D) The p-value of the test is 0.0503 and thus we reject the null hypothesis and conclude that the mean cholesterol level for hypertensives is higher than healthy 20-74 year old US males. Assuming that the α -level of the test is 10%, what can you conclude? A) The p-value of the test is 0.1005 and thus we fail to reject the null hypothesis and thus conclude that hypertensives have a higher cholesterol level. B) The p-value of the test is 0.9497 and thus we fail to reject the null hypothesis and thus conclude that hypertensives have a lower cholesterol level. C) The p-value of the test is 0.0503 and thus we fail to reject the null hypothesis and conclude that hypertension is not associated with higher cholesterol levels. D) The p-value of the test is 0.0503 and thus we reject the null hypothesis and conclude that the mean cholesterol level for hypertensives is higher than healthy 20-74 year old US males. Number of obs = 16 --------------------------------------------------------------------------------------------------- Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] ------------+-------------------------------------------------------------------------------------- x | 232 12 19.3333 0.0000 206.4226 257.5774 ---------------------------------------------------------------------------------------------------- Degrees of freedom: 15 Ho: mean(x) = 211 Ha: mean 211 t = 1.7500 t = 1.7500 t = 1.7500 P |t| = 0.1005 P > t = 0.0503
51
Question #21 Ten infants were involved in a study to compare the effectiveness of two medications for the treatment of diaper rash. For each baby, two areas of approximately the same size and rash severity were selected, and one area was treated with medication A and the other area was treated with medication B. The number of hours for the rash to disappear was recorded for each medication and each infant. The question of interest: is there sufficient evidence to conclude that a difference exists in the mean time required for elimination of the rash? The appropriate statistical methodology to use is: A) A 2-sample (independent) t-test B) A z-test C) A paired t-test D) A sensitivity analysis Ten infants were involved in a study to compare the effectiveness of two medications for the treatment of diaper rash. For each baby, two areas of approximately the same size and rash severity were selected, and one area was treated with medication A and the other area was treated with medication B. The number of hours for the rash to disappear was recorded for each medication and each infant. The question of interest: is there sufficient evidence to conclude that a difference exists in the mean time required for elimination of the rash? The appropriate statistical methodology to use is: A) A 2-sample (independent) t-test B) A z-test C) A paired t-test D) A sensitivity analysis
52
Question #21 Medication A and Medication B are “matched” to the same infant. The samples are not independent. We were not given a known standard deviation σ. A sensitivity analysis is typically a sub-analysis that is performed, changing model parameters to see if the results remain consistent. This concept has not been covered in class and is not relevant to this question. This study calls for a paired t-test. A) A 2-sample (independent) t-test B) A z-test C) A paired t-test D) A sensitivity analysis Medication A and Medication B are “matched” to the same infant. The samples are not independent. We were not given a known standard deviation σ. A sensitivity analysis is typically a sub-analysis that is performed, changing model parameters to see if the results remain consistent. This concept has not been covered in class and is not relevant to this question. This study calls for a paired t-test. A) A 2-sample (independent) t-test B) A z-test C) A paired t-test D) A sensitivity analysis
53
GOOD LUCK!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.