STAT E102 Midterm Review March 15, 2006
Review Topics Populations, parameters, sampling Types of data i.e., nominal, ordinal, discrete, continuous Graphical/tabular representations of data Summary statistics Central tendency: mean, median, mode Variability: variance, standard deviation, range, interquartile range Populations, parameters, sampling Types of data i.e., nominal, ordinal, discrete, continuous Graphical/tabular representations of data Summary statistics Central tendency: mean, median, mode Variability: variance, standard deviation, range, interquartile range
Review Topics Normal distribution Standardization Z-statistic Probability, conditional probability, Bayes’ Theorem Diagnostic/screening tests Sensitivity, specificity, predictive value CLT, sampling distributions Normal distribution Standardization Z-statistic Probability, conditional probability, Bayes’ Theorem Diagnostic/screening tests Sensitivity, specificity, predictive value CLT, sampling distributions
Review Topics Hypothesis testing, p-values, alpha 1- vs. 2-sided T distribution, t-tests 1- vs. 2-sample Independent vs. paired (for 2-sample) Type I / Type II error Power, sample size estimation Hypothesis testing, p-values, alpha 1- vs. 2-sided T distribution, t-tests 1- vs. 2-sample Independent vs. paired (for 2-sample) Type I / Type II error Power, sample size estimation
Question #1 A clinical trial is in development to compare a new drug for Progressive Multifocal Leukoencephalopathy (PML) to placebo. It is decided that a 2-sample independent t-test will compare the groups at the end of the trial with respect to a continuous variable endpoint. Early drafts of the study protocol estimate that the sample size would be 200 (100 per group). However, a recently completed study suggests that the variability of the endpoint in the two groups is larger than originally planned (i.e., larger than assumed in the original sample size calculations). The sample size is re-estimated using the new variability estimates from the recently completed trial. The result is: A) A larger total sample size B) A smaller total sample size C) The sample size does not change D) We cannot perform a t-test on these data any longer A clinical trial is in development to compare a new drug for Progressive Multifocal Leukoencephalopathy (PML) to placebo. It is decided that a 2-sample independent t-test will compare the groups at the end of the trial with respect to a continuous variable endpoint. Early drafts of the study protocol estimate that the sample size would be 200 (100 per group). However, a recently completed study suggests that the variability of the endpoint in the two groups is larger than originally planned (i.e., larger than assumed in the original sample size calculations). The sample size is re-estimated using the new variability estimates from the recently completed trial. The result is: A) A larger total sample size B) A smaller total sample size C) The sample size does not change D) We cannot perform a t-test on these data any longer
Question #1 Sample size equation: n = [(z α/2 + z β )(σ) / (µ 1 - µ 0 )] 2 Variability is represented by σ in the numerator. As variability (standard deviation σ) increases, n increases. The result is: A) A larger total sample size B) A smaller total sample size C) The sample size does not change D) We cannot perform a t-test on these data any longer Sample size equation: n = [(z α/2 + z β )(σ) / (µ 1 - µ 0 )] 2 Variability is represented by σ in the numerator. As variability (standard deviation σ) increases, n increases. The result is: A) A larger total sample size B) A smaller total sample size C) The sample size does not change D) We cannot perform a t-test on these data any longer
Question #2 T-tests are most useful for what type of data (variables)? A) Continuous B) Ordinal C) Nominal D) Binary T-tests are most useful for what type of data (variables)? A) Continuous B) Ordinal C) Nominal D) Binary
Question #2 T-tests are used for continuous data. Recall: Ordinal data have natural order without defined magnitude (ex. low, moderate, high) Nominal data have categories with no order or rank (ex. gender, race) Binary data are discrete data with only two possible outcomes (ex. cancer vs. no cancer) A) Continuous B) Ordinal C) Nominal D) Binary T-tests are used for continuous data. Recall: Ordinal data have natural order without defined magnitude (ex. low, moderate, high) Nominal data have categories with no order or rank (ex. gender, race) Binary data are discrete data with only two possible outcomes (ex. cancer vs. no cancer) A) Continuous B) Ordinal C) Nominal D) Binary
Question #3 The probabilities that a 25- to 34-year old U.S. male’s cholesterol level belongs to one of the following intervals are listed. What is the probability that a male from this population has cholesterol <200 mg/dl? A) B) C) D) The probabilities that a 25- to 34-year old U.S. male’s cholesterol level belongs to one of the following intervals are listed. What is the probability that a male from this population has cholesterol <200 mg/dl? A) B) C) D) Cholesterol (mg/dl) Probability
Question #3 The probabilities that a 25- to 34-year old U.S. male’s cholesterol level belongs to one of the following intervals are listed. What is the probability that a male from this population has cholesterol <200 mg/dl? A) B) C) D) The probabilities that a 25- to 34-year old U.S. male’s cholesterol level belongs to one of the following intervals are listed. What is the probability that a male from this population has cholesterol <200 mg/dl? A) B) C) D) Cholesterol (mg/dl) Probability
Question #3 These categories are mutually exclusive. P(<200) =P(80-119)+P( )+P( ) = =0.567 What is the probability that a male from this population has cholesterol <200 mg/dl? A) B) C) D) These categories are mutually exclusive. P(<200) =P(80-119)+P( )+P( ) = =0.567 What is the probability that a male from this population has cholesterol <200 mg/dl? A) B) C) D) 0.280
Question #4 The probabilities that a 25- to 34- year old U.S. male’s cholesterol level belongs to one of the following intervals are listed. Given that a person from this population has serum cholesterol level <240 mg/dL, compute the conditional probability that he will have cholesterol <200 mg/dL. A) B) C) D) The probabilities that a 25- to 34- year old U.S. male’s cholesterol level belongs to one of the following intervals are listed. Given that a person from this population has serum cholesterol level <240 mg/dL, compute the conditional probability that he will have cholesterol <200 mg/dL. A) B) C) D) Cholesterol (mg/dl) Probability
Question #4 A=cholesterol <240 B=cholesterol <200 P(B|A) = P(A B) / P(A) P(A B) = P(<240 & <200) = (calc in #3) P(A) = P(<240) = = P(B|A) = / = A) B) C) D) A=cholesterol <240 B=cholesterol <200 P(B|A) = P(A B) / P(A) P(A B) = P(<240 & <200) = (calc in #3) P(A) = P(<240) = = P(B|A) = / = A) B) C) D) Cholesterol (mg/dl) Probability
Question #5 In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. What is the sensitivity and specificity of this test? A) and B) and C) and D) and In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. What is the sensitivity and specificity of this test? A) and B) and C) and D) and Diagnosis- DiagnosisTotal Disease No Disease Total
Question #5 Sensitivity = P(T + |D + ) = 50/60 = Specificity = P(T - |D - ) = 449/552 = Sensitivity = P(T + |D + ) = 50/60 = Specificity = P(T - |D - ) = 449/552 = Diagnosis- DiagnosisTotal Disease No Disease Total
Question #5 In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. What is the sensitivity and specificity of this test? A) and B) and C) and D) and In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. What is the sensitivity and specificity of this test? A) and B) and C) and D) and Diagnosis- DiagnosisTotal Disease No Disease Total
Question #6 In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. What is the probability that a subject has the disease given that the test was positive? A) 19% B) 81% C) 32% D) 68% In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. What is the probability that a subject has the disease given that the test was positive? A) 19% B) 81% C) 32% D) 68% + Diagnosis- DiagnosisTotal Disease No Disease Total
Question #6 Prevalence = 0.05 Positive Predictive Value = P(D+|T+) Bayes’ Theorem: P(D + |T + ) = P(D + )P(T + |D + ) / [P(D + )P(T + |D + ) + P(D - )P(T + |D - )] = (Prev)(Sens) / (Prev)(Sens) + (1-Prev)(1-Spec) = (0.05)(0.833) / (0.05)(0.833) + (0.95)(0.187) = (0.042) / ( ) = Prevalence = 0.05 Positive Predictive Value = P(D+|T+) Bayes’ Theorem: P(D + |T + ) = P(D + )P(T + |D + ) / [P(D + )P(T + |D + ) + P(D - )P(T + |D - )] = (Prev)(Sens) / (Prev)(Sens) + (1-Prev)(1-Spec) = (0.05)(0.833) / (0.05)(0.833) + (0.95)(0.187) = (0.042) / ( ) = Diagnosis- DiagnosisTotal Disease No Disease Total
Question #6 In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. What is the probability that a subject has the disease given that the test was positive? A) 19% B) 81% C) 32% D) 68% In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. What is the probability that a subject has the disease given that the test was positive? A) 19% B) 81% C) 32% D) 68% + Diagnosis- DiagnosisTotal Disease No Disease Total
Question #7 A small p-value for a hypothesis test (i.e., <0.05) signifies which of the following? A) The conditional probability that the null hypothesis is true is <0.05 B) The probability that the alternative hypothesis is true is >0.95 C) The conditional probability of the data being this extreme if the null hypothesis was true is <0.05 D) The conditional probability of the data being this extreme if the alternative hypothesis is true is >0.05 A small p-value for a hypothesis test (i.e., <0.05) signifies which of the following? A) The conditional probability that the null hypothesis is true is <0.05 B) The probability that the alternative hypothesis is true is >0.95 C) The conditional probability of the data being this extreme if the null hypothesis was true is <0.05 D) The conditional probability of the data being this extreme if the alternative hypothesis is true is >0.05
Question #7 A small p-value for a hypothesis test (i.e., <0.05) signifies which of the following? A) The conditional probability that the null hypothesis is true is <0.05 B) The probability that the alternative hypothesis is true is >0.95 C) The conditional probability of the data being this extreme if the null hypothesis was true is <0.05 D) The conditional probability of the data being this extreme if the alternative hypothesis is true is >0.05 A small p-value for a hypothesis test (i.e., <0.05) signifies which of the following? A) The conditional probability that the null hypothesis is true is <0.05 B) The probability that the alternative hypothesis is true is >0.95 C) The conditional probability of the data being this extreme if the null hypothesis was true is <0.05 D) The conditional probability of the data being this extreme if the alternative hypothesis is true is >0.05
Question #8 The following output comes from an experiment comparing the levels of carboxyhemoglobin for a group of cigarette smokers of size n1=75 and a group of non-smokers of size n2=121. It is believed that the mean carboxyhemoglobin level of the smokers must be higher than the mean level of non-smokers. Assume that the underlying variances are identical and use alpha=0.05. Smokers: Number of obs = 75 Non-smokers: Number of obs = Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] Smokers | Non-smok | diff | Degrees of freedom: ??? Ho: mean(smoker) - mean(non-smoker) = diff = 0 Ha: diff 0 t = t = t = P |t| = P > t =
Question #8 What is the proper test of this hypothesis? A) A z test B) A one-sample t test C) A paired t test D) An unpaired t test What is the proper test of this hypothesis? A) A z test B) A one-sample t test C) A paired t test D) An unpaired t test Smokers: Number of obs = 75 Non-smokers: Number of obs = Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] Smokers | Non-smok | diff | Degrees of freedom: ??? Ho: mean(smoker) - mean(non-smoker) = diff = 0 Ha: diff 0 t = t = t = P |t| = P > t =
Question #8 What is the proper test of this hypothesis? A) A z test B) A one-sample t test C) A paired t test D) An unpaired t test What is the proper test of this hypothesis? A) A z test B) A one-sample t test C) A paired t test D) An unpaired t test Smokers: Number of obs = 75 Non-smokers: Number of obs = Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] Smokers | Non-smok | diff | Degrees of freedom: ??? Ho: mean(smoker) - mean(non-smoker) = diff = 0 Ha: diff 0 t = t = t = P |t| = P > t =
Question #9 The following output comes from an experiment comparing the levels of carboxyhemoglobin for a group of cigarette smokers of size n1=75 and a group of non-smokers of size n2=121. It is believed that the mean carboxyhemoglobin level of the smokers must be higher than the mean level of non-smokers. Assume that the underlying variances are identical and use alpha=0.05. What are the null and alternative hypotheses for this test? µ 1 is the mean carboxyhemoglobin level among smokers, and µ 2 is the mean level among non-smokers. A) H 0 : µ 1 = µ 2 ; H A : µ 1 ≠ µ 2 B) H 0 : µ 1 ≤ µ 2 ; H A : µ 1 ≠ µ 2 C) H 0 : µ 1 ≤ µ 2 ; H A : µ 1 > µ 2 D) H 0 : µ 1 ≥ µ 2 ; H A : µ 1 < µ 2 The following output comes from an experiment comparing the levels of carboxyhemoglobin for a group of cigarette smokers of size n1=75 and a group of non-smokers of size n2=121. It is believed that the mean carboxyhemoglobin level of the smokers must be higher than the mean level of non-smokers. Assume that the underlying variances are identical and use alpha=0.05. What are the null and alternative hypotheses for this test? µ 1 is the mean carboxyhemoglobin level among smokers, and µ 2 is the mean level among non-smokers. A) H 0 : µ 1 = µ 2 ; H A : µ 1 ≠ µ 2 B) H 0 : µ 1 ≤ µ 2 ; H A : µ 1 ≠ µ 2 C) H 0 : µ 1 ≤ µ 2 ; H A : µ 1 > µ 2 D) H 0 : µ 1 ≥ µ 2 ; H A : µ 1 < µ 2
Question #9 The following output comes from an experiment comparing the levels of carboxyhemoglobin for a group of cigarette smokers of size n1=75 and a group of non-smokers of size n2=121. It is believed that the mean carboxyhemoglobin level of the smokers must be higher than the mean level of non-smokers. Assume that the underlying variances are identical and use alpha=0.05. What are the null and alternative hypotheses for this test? µ 1 is the mean carboxyhemoglobin level among smokers, and µ 2 is the mean level among non-smokers. A) H 0 : µ 1 = µ 2 ; H A : µ 1 ≠ µ 2 B) H 0 : µ 1 ≤ µ 2 ; H A : µ 1 ≠ µ 2 C) H 0 : µ 1 ≤ µ 2 ; H A : µ 1 > µ 2 D) H 0 : µ 1 ≥ µ 2 ; H A : µ 1 < µ 2 The following output comes from an experiment comparing the levels of carboxyhemoglobin for a group of cigarette smokers of size n1=75 and a group of non-smokers of size n2=121. It is believed that the mean carboxyhemoglobin level of the smokers must be higher than the mean level of non-smokers. Assume that the underlying variances are identical and use alpha=0.05. What are the null and alternative hypotheses for this test? µ 1 is the mean carboxyhemoglobin level among smokers, and µ 2 is the mean level among non-smokers. A) H 0 : µ 1 = µ 2 ; H A : µ 1 ≠ µ 2 B) H 0 : µ 1 ≤ µ 2 ; H A : µ 1 ≠ µ 2 C) H 0 : µ 1 ≤ µ 2 ; H A : µ 1 > µ 2 D) H 0 : µ 1 ≥ µ 2 ; H A : µ 1 < µ 2
Question #10 The following output comes from an experiment comparing the levels of carboxyhemoglobin for a group of cigarette smokers of size n1=75 and a group of non-smokers of size n2=121. It is believed that the mean carboxyhemoglobin level of the smokers must be higher than the mean level of non-smokers. Assume that the underlying variances are identical and use alpha=0.05. What is the number of the degrees of freedom associated with this test? A) df = n 1 – 1 = 74 B) df = n 2 – 1 = 121 C) df = n 1 + n 2 – 1 = 195 D) df = n 1 + n 2 – 2 = 194 The following output comes from an experiment comparing the levels of carboxyhemoglobin for a group of cigarette smokers of size n1=75 and a group of non-smokers of size n2=121. It is believed that the mean carboxyhemoglobin level of the smokers must be higher than the mean level of non-smokers. Assume that the underlying variances are identical and use alpha=0.05. What is the number of the degrees of freedom associated with this test? A) df = n 1 – 1 = 74 B) df = n 2 – 1 = 121 C) df = n 1 + n 2 – 1 = 195 D) df = n 1 + n 2 – 2 = 194
Question #10 This is a t-test of two independent samples, therefore: df = n 1 + n 2 – 2 = – 2 = 194 What is the number of the degrees of freedom associated with this test? A) df = n 1 – 1 = 74 B) df = n 2 – 1 = 121 C) df = n 1 + n 2 – 1 = 195 D) df = n 1 + n 2 – 2 = 194 This is a t-test of two independent samples, therefore: df = n 1 + n 2 – 2 = – 2 = 194 What is the number of the degrees of freedom associated with this test? A) df = n 1 – 1 = 74 B) df = n 2 – 1 = 121 C) df = n 1 + n 2 – 1 = 195 D) df = n 1 + n 2 – 2 = 194
Question #11 Based on the output, what is the decision? A) Reject H 0 ; The carboxyhemoglobin levels are higher among non-smokers B) Do not reject H 0 ; The carboxyhemoglobin levels are equal C) Do not reject H 0 ; The carboxyhemoglobin levels are lower among non- smokers D) Reject H 0 ; The carboxyhemoglobin levels are higher among smokers Based on the output, what is the decision? A) Reject H 0 ; The carboxyhemoglobin levels are higher among non-smokers B) Do not reject H 0 ; The carboxyhemoglobin levels are equal C) Do not reject H 0 ; The carboxyhemoglobin levels are lower among non- smokers D) Reject H 0 ; The carboxyhemoglobin levels are higher among smokers Smokers: Number of obs = 75 Non-smokers: Number of obs = Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] Smokers | Non-smok | diff | Degrees of freedom: ??? Ho: mean(smoker) - mean(non-smoker) = diff = 0 Ha: diff 0 t = t = t = P |t| = P > t =
Question #11 To ease interpretation, rearrange the hypotheses H 0 : µ 1 ≤ µ 2 can be rewritten as H 0 : µ 1 - µ 2 ≤ 0 H A : µ 1 > µ 2 can be rewritten as H A : µ 1 - µ 2 > 0 Now select the appropriate output (circled) p < 0.05; reject H 0 and conclude that smokers have higher carboxyhemoglobin levels than non-smokers To ease interpretation, rearrange the hypotheses H 0 : µ 1 ≤ µ 2 can be rewritten as H 0 : µ 1 - µ 2 ≤ 0 H A : µ 1 > µ 2 can be rewritten as H A : µ 1 - µ 2 > 0 Now select the appropriate output (circled) p < 0.05; reject H 0 and conclude that smokers have higher carboxyhemoglobin levels than non-smokers Smokers: Number of obs = 75 Non-smokers: Number of obs = Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] Smokers | Non-smok | diff | Degrees of freedom: ??? Ho: mean(smoker) - mean(non-smoker) = diff = 0 Ha: diff 0 t = t = t = P |t| = P > t =
Question #11 Based on the output, what is the decision? A) Reject H 0 ; The carboxyhemoglobin levels are higher among non-smokers B) Do not reject H 0 ; The carboxyhemoglobin levels are equal C) Do not reject H 0 ; The carboxyhemoglobin levels are lower among non- smokers D) Reject H 0 ; The carboxyhemoglobin levels are higher among smokers Based on the output, what is the decision? A) Reject H 0 ; The carboxyhemoglobin levels are higher among non-smokers B) Do not reject H 0 ; The carboxyhemoglobin levels are equal C) Do not reject H 0 ; The carboxyhemoglobin levels are lower among non- smokers D) Reject H 0 ; The carboxyhemoglobin levels are higher among smokers Smokers: Number of obs = 75 Non-smokers: Number of obs = Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] Smokers | Non-smok | diff | Degrees of freedom: ??? Ho: mean(smoker) - mean(non-smoker) = diff = 0 Ha: diff 0 t = t = t = P |t| = P > t =
Question #12 Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable monitored is pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline level. The results of one particular hour based on N=17 samples is a mean of 8.42 and a standard deviation of Interest focuses on whether there is sufficient evidence to conclude that the mean pH level in the water differs from 8.5. The hypotheses are then: A) H 0 : µ = 8.5, H A : µ ≠ 8.5 B) H 0 : µ ≠ 8.5, H A : µ = 8.5 C) H 0 : µ = 0, H A : µ ≠ 0 D) H 0 : µ = 0, H A : µ ≠ 8.5 Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable monitored is pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline level. The results of one particular hour based on N=17 samples is a mean of 8.42 and a standard deviation of Interest focuses on whether there is sufficient evidence to conclude that the mean pH level in the water differs from 8.5. The hypotheses are then: A) H 0 : µ = 8.5, H A : µ ≠ 8.5 B) H 0 : µ ≠ 8.5, H A : µ = 8.5 C) H 0 : µ = 0, H A : µ ≠ 0 D) H 0 : µ = 0, H A : µ ≠ 8.5
Question #12 Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable monitored is pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline level. The results of one particular hour based on N=17 samples is a mean of 8.42 and a standard deviation of Interest focuses on whether there is sufficient evidence to conclude that the mean pH level in the water differs from 8.5. The hypotheses are then: A) H 0 : µ = 8.5, H A : µ ≠ 8.5 B) H 0 : µ ≠ 8.5, H A : µ = 8.5 C) H 0 : µ = 0, H A : µ ≠ 0 D) H 0 : µ = 0, H A : µ ≠ 8.5 Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable monitored is pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline level. The results of one particular hour based on N=17 samples is a mean of 8.42 and a standard deviation of Interest focuses on whether there is sufficient evidence to conclude that the mean pH level in the water differs from 8.5. The hypotheses are then: A) H 0 : µ = 8.5, H A : µ ≠ 8.5 B) H 0 : µ ≠ 8.5, H A : µ = 8.5 C) H 0 : µ = 0, H A : µ ≠ 0 D) H 0 : µ = 0, H A : µ ≠ 8.5
Question #13 Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable monitored is pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline level. The results of one particular hour based on N=17 samples is a mean of 8.42 and a standard deviation of The degrees of freedom for this test is: A) 7 B) 16 C) 8.5 D) Not applicable Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable monitored is pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline level. The results of one particular hour based on N=17 samples is a mean of 8.42 and a standard deviation of The degrees of freedom for this test is: A) 7 B) 16 C) 8.5 D) Not applicable
Question #13 This is a one-sample t-test. *NOTE* Do not be confused by the use of the word “samples” in the set-up…it refers to the number of water samples. df = n – 1 = 17 – 1 = 16 The degrees of freedom for this test is: A) 7 B) 16 C) 8.5 D) Not applicable This is a one-sample t-test. *NOTE* Do not be confused by the use of the word “samples” in the set-up…it refers to the number of water samples. df = n – 1 = 17 – 1 = 16 The degrees of freedom for this test is: A) 7 B) 16 C) 8.5 D) Not applicable
Question #14 Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable monitored is pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline level. The results of one particular hour based on N=17 samples is a mean of 8.42 and a standard deviation of Is there sufficient evidence to conclude that the mean pH level in the water differs from 8.5 at the 0.05 level of significance? A) Yes, because the sample mean was 8.42 B) Yes, since p<0.05 C) No, since p>0.05 D) No, since the sample mean (8.42) is practically equal to 8.5 Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable monitored is pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline level. The results of one particular hour based on N=17 samples is a mean of 8.42 and a standard deviation of Is there sufficient evidence to conclude that the mean pH level in the water differs from 8.5 at the 0.05 level of significance? A) Yes, because the sample mean was 8.42 B) Yes, since p<0.05 C) No, since p>0.05 D) No, since the sample mean (8.42) is practically equal to 8.5
Question #14 Carry out the one-sample, two-sided t-test at alpha=0.05: _ t = (x - µ) / (s / √n) = ( ) / (0.16/√17) = ; 16 df; 2(.025)<p<2(.05) p>0.05; do not reject H 0 In STATA: ttesti p=0.0559; do not reject H 0 Is there sufficient evidence to conclude that the mean pH level in the water differs from 8.5 at the 0.05 level of significance? A) Yes, because the sample mean was 8.42 B) Yes, since p<0.05 C) No, since p>0.05 D) No, since the sample mean (8.42) is practically equal to 8.5 Carry out the one-sample, two-sided t-test at alpha=0.05: _ t = (x - µ) / (s / √n) = ( ) / (0.16/√17) = ; 16 df; 2(.025)<p<2(.05) p>0.05; do not reject H 0 In STATA: ttesti p=0.0559; do not reject H 0 Is there sufficient evidence to conclude that the mean pH level in the water differs from 8.5 at the 0.05 level of significance? A) Yes, because the sample mean was 8.42 B) Yes, since p<0.05 C) No, since p>0.05 D) No, since the sample mean (8.42) is practically equal to 8.5
Question #15 A clinical trial is conducted to compare two treatments for asthma. Asthmatics are split into two groups with one group receiving Treatment A and the other group receiving Treatment B. At the end of the trial, the groups are compared with respect to FEV (forced expiratory volume). Differences found at the end of the trial may be due to: A) Treatment effect differences B) Chance C) Group differences with respect to other variables (e.g. sex, age, race, severity of asthma, etc.) D) All of the above A clinical trial is conducted to compare two treatments for asthma. Asthmatics are split into two groups with one group receiving Treatment A and the other group receiving Treatment B. At the end of the trial, the groups are compared with respect to FEV (forced expiratory volume). Differences found at the end of the trial may be due to: A) Treatment effect differences B) Chance C) Group differences with respect to other variables (e.g. sex, age, race, severity of asthma, etc.) D) All of the above
Question #15 A clinical trial is conducted to compare two treatments for asthma. Asthmatics are split into two groups with one group receiving Treatment A and the other group receiving Treatment B. At the end of the trial, the groups are compared with respect to FEV (forced expiratory volume). Differences found at the end of the trial may be due to: A) Treatment effect differences B) Chance C) Group differences with respect to other variables (e.g. sex, age, race, severity of asthma, etc.) D) All of the above A clinical trial is conducted to compare two treatments for asthma. Asthmatics are split into two groups with one group receiving Treatment A and the other group receiving Treatment B. At the end of the trial, the groups are compared with respect to FEV (forced expiratory volume). Differences found at the end of the trial may be due to: A) Treatment effect differences B) Chance C) Group differences with respect to other variables (e.g. sex, age, race, severity of asthma, etc.) D) All of the above
Question #16 A large study has determined that the diastolic blood pressure among women ages is normally distributed with mean 70 mm Hg and a standard deviation of σ=10 mm Hg. What is the probability that a randomly chosen woman will have diastolic blood pressure lower than 89.6 mm Hg? A) 2.5% B) 5% C) 97.5% D) 95% A large study has determined that the diastolic blood pressure among women ages is normally distributed with mean 70 mm Hg and a standard deviation of σ=10 mm Hg. What is the probability that a randomly chosen woman will have diastolic blood pressure lower than 89.6 mm Hg? A) 2.5% B) 5% C) 97.5% D) 95%
Question #16 Z = X - µ / σ = ( ) / 10 = 1.96 P(x 1.96) = 1 – = Or, in STATA: display normprob(1.96) = What is the probability that a randomly chosen woman will have diastolic blood pressure lower than 89.6 mm Hg? A) 2.5% B) 5% C) 97.5% D) 95% Z = X - µ / σ = ( ) / 10 = 1.96 P(x 1.96) = 1 – = Or, in STATA: display normprob(1.96) = What is the probability that a randomly chosen woman will have diastolic blood pressure lower than 89.6 mm Hg? A) 2.5% B) 5% C) 97.5% D) 95%
Question #17 A large study has determined that the diastolic blood pressure among women ages is normally distributed with mean 70 mm Hg and a standard deviation of σ=10 mm Hg. Suppose that you measure the diastolic blood pressure in n=25 women. What is the sampling distribution of X 25, the mean diastolic blood pressure of the women in the sample? A) Normal, µ xbar =70 mmHg and σ xbar =10 mmHg B) Normal, µ xbar =7 mmHg and σ xbar =10 mmHg C) Normal, µ xbar =70 mmHg and σ xbar =2 mmHg D) Normal, µ xbar =7 mmHg and σ xbar =2 mmHg A large study has determined that the diastolic blood pressure among women ages is normally distributed with mean 70 mm Hg and a standard deviation of σ=10 mm Hg. Suppose that you measure the diastolic blood pressure in n=25 women. What is the sampling distribution of X 25, the mean diastolic blood pressure of the women in the sample? A) Normal, µ xbar =70 mmHg and σ xbar =10 mmHg B) Normal, µ xbar =7 mmHg and σ xbar =10 mmHg C) Normal, µ xbar =70 mmHg and σ xbar =2 mmHg D) Normal, µ xbar =7 mmHg and σ xbar =2 mmHg
Question #17 If the underlying distribution is normal, the sampling distribution is assumed to be normal. µ xbar = µ 0 = 70 mmHg σ xbar = σ 0 / √n = 10 / √25 = 2 mmHg Suppose that you measure the diastolic blood pressure in n=25 women. What is the sampling distribution of X 25, the mean diastolic blood pressure of the women in the sample? A) Normal, µ xbar =70 mmHg and σ xbar =10 mmHg B) Normal, µ xbar =7 mmHg and σ xbar =10 mmHg C) Normal, µ xbar =70 mmHg and σ xbar =2 mmHg D) Normal, µ xbar =7 mmHg and σ xbar =2 mmHg If the underlying distribution is normal, the sampling distribution is assumed to be normal. µ xbar = µ 0 = 70 mmHg σ xbar = σ 0 / √n = 10 / √25 = 2 mmHg Suppose that you measure the diastolic blood pressure in n=25 women. What is the sampling distribution of X 25, the mean diastolic blood pressure of the women in the sample? A) Normal, µ xbar =70 mmHg and σ xbar =10 mmHg B) Normal, µ xbar =7 mmHg and σ xbar =10 mmHg C) Normal, µ xbar =70 mmHg and σ xbar =2 mmHg D) Normal, µ xbar =7 mmHg and σ xbar =2 mmHg
Question #18 A large study has determined that the diastolic blood pressure among women ages is normally distributed with mean 70 mm Hg and a standard deviation of σ=10 mm Hg. What is the lower 5 th percentile of the above distribution (i.e., the point below which, the mean diastolic blood pressures of these 25-women samples will be 5% of the time)? A) mmHg B) mmHg C) mmHg D) mmHg A large study has determined that the diastolic blood pressure among women ages is normally distributed with mean 70 mm Hg and a standard deviation of σ=10 mm Hg. What is the lower 5 th percentile of the above distribution (i.e., the point below which, the mean diastolic blood pressures of these 25-women samples will be 5% of the time)? A) mmHg B) mmHg C) mmHg D) mmHg
Question #18 From Z table, we know that Z= cuts off the lower 5 th percentile. Alternatively, can get this in STATA: display invnorm(.05) Rearrange Z equation and solve for X bar : Z = (X bar - µ xbar ) / σ xbar X bar = (Z)(σ xbar ) + µ xbar = (-1.645)(2) + 70 = What is the lower 5 th percentile of the above distribution (i.e., the point below which, the mean diastolic blood pressures of these 25- women samples will be 5% of the time)? A) mmHg B) mmHg C) mmHg D) mmHg From Z table, we know that Z= cuts off the lower 5 th percentile. Alternatively, can get this in STATA: display invnorm(.05) Rearrange Z equation and solve for X bar : Z = (X bar - µ xbar ) / σ xbar X bar = (Z)(σ xbar ) + µ xbar = (-1.645)(2) + 70 = What is the lower 5 th percentile of the above distribution (i.e., the point below which, the mean diastolic blood pressures of these 25- women samples will be 5% of the time)? A) mmHg B) mmHg C) mmHg D) mmHg
Question #19 Consider the following: a hypothesis test was performed. The resulting p-value was higher than the pre-specified alpha and thus there was a failure to reject the null hypothesis. Which of the following are correct? A) The null hypothesis must be true B) An evaluation of power and type II error may be warranted C) It is still feasible that the alternative hypothesis is true D) Both B and C Consider the following: a hypothesis test was performed. The resulting p-value was higher than the pre-specified alpha and thus there was a failure to reject the null hypothesis. Which of the following are correct? A) The null hypothesis must be true B) An evaluation of power and type II error may be warranted C) It is still feasible that the alternative hypothesis is true D) Both B and C
Question #19 We haven’t proven the null by failing to reject it. It is still possible that the alternative is actually true. Recall that power (1-β) is the probability of rejecting the null when it is false. β is the probability of making a Type II error (failing to reject the null when it is false). In this case it is possible that statistical power was too low, leading to a Type II error where we have failed to reject H 0 when it is actually false. An evaluation is appropriate. A) The null hypothesis must be true B) An evaluation of power and type II error may be warranted C) It is still feasible that the alternative hypothesis is true D) Both B and C We haven’t proven the null by failing to reject it. It is still possible that the alternative is actually true. Recall that power (1-β) is the probability of rejecting the null when it is false. β is the probability of making a Type II error (failing to reject the null when it is false). In this case it is possible that statistical power was too low, leading to a Type II error where we have failed to reject H 0 when it is actually false. An evaluation is appropriate. A) The null hypothesis must be true B) An evaluation of power and type II error may be warranted C) It is still feasible that the alternative hypothesis is true D) Both B and C
Question #20 The mean cholesterol level of healthy year old males in the US is 211 mg/mL, with an unknown standard deviation. The average cholesterol level among 16 hypertensive males was 232 mg/mL. Conduct a statistical test at the 10% level about whether hypertension is associated with higher cholesterol levels (consult the STATA output below). Assuming that the α -level of the test is 10%, what can you conclude? The mean cholesterol level of healthy year old males in the US is 211 mg/mL, with an unknown standard deviation. The average cholesterol level among 16 hypertensive males was 232 mg/mL. Conduct a statistical test at the 10% level about whether hypertension is associated with higher cholesterol levels (consult the STATA output below). Assuming that the α -level of the test is 10%, what can you conclude? Number of obs = Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] x | Degrees of freedom: 15 Ho: mean(x) = 211 Ha: mean 211 t = t = t = P |t| = P > t =
Question #20 This is a one-sample, one-sided t-test. Set up the hypotheses and select relevant output: H 0 : µ ≤ 211 H A : µ > 211 The alpha level is The p-value is p<.10; reject H 0 ; conclude that hypertensive males have higher cholesterol levels than the healthy population of US males This is a one-sample, one-sided t-test. Set up the hypotheses and select relevant output: H 0 : µ ≤ 211 H A : µ > 211 The alpha level is The p-value is p<.10; reject H 0 ; conclude that hypertensive males have higher cholesterol levels than the healthy population of US males Number of obs = Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] x | Degrees of freedom: 15 Ho: mean(x) = 211 Ha: mean 211 t = t = t = P |t| = P > t =
Question #20 Assuming that the α -level of the test is 10%, what can you conclude? A) The p-value of the test is and thus we fail to reject the null hypothesis and thus conclude that hypertensives have a higher cholesterol level. B) The p-value of the test is and thus we fail to reject the null hypothesis and thus conclude that hypertensives have a lower cholesterol level. C) The p-value of the test is and thus we fail to reject the null hypothesis and conclude that hypertension is not associated with higher cholesterol levels. D) The p-value of the test is and thus we reject the null hypothesis and conclude that the mean cholesterol level for hypertensives is higher than healthy year old US males. Assuming that the α -level of the test is 10%, what can you conclude? A) The p-value of the test is and thus we fail to reject the null hypothesis and thus conclude that hypertensives have a higher cholesterol level. B) The p-value of the test is and thus we fail to reject the null hypothesis and thus conclude that hypertensives have a lower cholesterol level. C) The p-value of the test is and thus we fail to reject the null hypothesis and conclude that hypertension is not associated with higher cholesterol levels. D) The p-value of the test is and thus we reject the null hypothesis and conclude that the mean cholesterol level for hypertensives is higher than healthy year old US males. Number of obs = Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] x | Degrees of freedom: 15 Ho: mean(x) = 211 Ha: mean 211 t = t = t = P |t| = P > t =
Question #21 Ten infants were involved in a study to compare the effectiveness of two medications for the treatment of diaper rash. For each baby, two areas of approximately the same size and rash severity were selected, and one area was treated with medication A and the other area was treated with medication B. The number of hours for the rash to disappear was recorded for each medication and each infant. The question of interest: is there sufficient evidence to conclude that a difference exists in the mean time required for elimination of the rash? The appropriate statistical methodology to use is: A) A 2-sample (independent) t-test B) A z-test C) A paired t-test D) A sensitivity analysis Ten infants were involved in a study to compare the effectiveness of two medications for the treatment of diaper rash. For each baby, two areas of approximately the same size and rash severity were selected, and one area was treated with medication A and the other area was treated with medication B. The number of hours for the rash to disappear was recorded for each medication and each infant. The question of interest: is there sufficient evidence to conclude that a difference exists in the mean time required for elimination of the rash? The appropriate statistical methodology to use is: A) A 2-sample (independent) t-test B) A z-test C) A paired t-test D) A sensitivity analysis
Question #21 Medication A and Medication B are “matched” to the same infant. The samples are not independent. We were not given a known standard deviation σ. A sensitivity analysis is typically a sub-analysis that is performed, changing model parameters to see if the results remain consistent. This concept has not been covered in class and is not relevant to this question. This study calls for a paired t-test. A) A 2-sample (independent) t-test B) A z-test C) A paired t-test D) A sensitivity analysis Medication A and Medication B are “matched” to the same infant. The samples are not independent. We were not given a known standard deviation σ. A sensitivity analysis is typically a sub-analysis that is performed, changing model parameters to see if the results remain consistent. This concept has not been covered in class and is not relevant to this question. This study calls for a paired t-test. A) A 2-sample (independent) t-test B) A z-test C) A paired t-test D) A sensitivity analysis
GOOD LUCK!