STAT E102 Midterm Review March 15, 2006. Review Topics Populations, parameters, sampling Types of data i.e., nominal, ordinal, discrete, continuous Graphical/tabular.

STAT E102 Midterm Review March 15, 2006

Review Topics Populations, parameters, sampling Types of data i.e., nominal, ordinal, discrete, continuous Graphical/tabular representations of data Summary statistics Central tendency: mean, median, mode Variability: variance, standard deviation, range, interquartile range Populations, parameters, sampling Types of data i.e., nominal, ordinal, discrete, continuous Graphical/tabular representations of data Summary statistics Central tendency: mean, median, mode Variability: variance, standard deviation, range, interquartile range

Review Topics Normal distribution Standardization Z-statistic Probability, conditional probability, Bayes’ Theorem Diagnostic/screening tests Sensitivity, specificity, predictive value CLT, sampling distributions Normal distribution Standardization Z-statistic Probability, conditional probability, Bayes’ Theorem Diagnostic/screening tests Sensitivity, specificity, predictive value CLT, sampling distributions

Review Topics Hypothesis testing, p-values, alpha 1- vs. 2-sided T distribution, t-tests 1- vs. 2-sample Independent vs. paired (for 2-sample) Type I / Type II error Power, sample size estimation Hypothesis testing, p-values, alpha 1- vs. 2-sided T distribution, t-tests 1- vs. 2-sample Independent vs. paired (for 2-sample) Type I / Type II error Power, sample size estimation

Question #1 A clinical trial is in development to compare a new drug for Progressive Multifocal Leukoencephalopathy (PML) to placebo. It is decided that a 2-sample independent t-test will compare the groups at the end of the trial with respect to a continuous variable endpoint. Early drafts of the study protocol estimate that the sample size would be 200 (100 per group). However, a recently completed study suggests that the variability of the endpoint in the two groups is larger than originally planned (i.e., larger than assumed in the original sample size calculations). The sample size is re-estimated using the new variability estimates from the recently completed trial. The result is: A) A larger total sample size B) A smaller total sample size C) The sample size does not change D) We cannot perform a t-test on these data any longer A clinical trial is in development to compare a new drug for Progressive Multifocal Leukoencephalopathy (PML) to placebo. It is decided that a 2-sample independent t-test will compare the groups at the end of the trial with respect to a continuous variable endpoint. Early drafts of the study protocol estimate that the sample size would be 200 (100 per group). However, a recently completed study suggests that the variability of the endpoint in the two groups is larger than originally planned (i.e., larger than assumed in the original sample size calculations). The sample size is re-estimated using the new variability estimates from the recently completed trial. The result is: A) A larger total sample size B) A smaller total sample size C) The sample size does not change D) We cannot perform a t-test on these data any longer

Question #1 Sample size equation: n = [(z α/2 + z β )(σ) / (µ 1 - µ 0 )] 2 Variability is represented by σ in the numerator. As variability (standard deviation σ) increases, n increases. The result is: A) A larger total sample size B) A smaller total sample size C) The sample size does not change D) We cannot perform a t-test on these data any longer Sample size equation: n = [(z α/2 + z β )(σ) / (µ 1 - µ 0 )] 2 Variability is represented by σ in the numerator. As variability (standard deviation σ) increases, n increases. The result is: A) A larger total sample size B) A smaller total sample size C) The sample size does not change D) We cannot perform a t-test on these data any longer

Question #2 T-tests are most useful for what type of data (variables)? A) Continuous B) Ordinal C) Nominal D) Binary T-tests are most useful for what type of data (variables)? A) Continuous B) Ordinal C) Nominal D) Binary

Question #2 T-tests are used for continuous data. Recall: Ordinal data have natural order without defined magnitude (ex. low, moderate, high) Nominal data have categories with no order or rank (ex. gender, race) Binary data are discrete data with only two possible outcomes (ex. cancer vs. no cancer) A) Continuous B) Ordinal C) Nominal D) Binary T-tests are used for continuous data. Recall: Ordinal data have natural order without defined magnitude (ex. low, moderate, high) Nominal data have categories with no order or rank (ex. gender, race) Binary data are discrete data with only two possible outcomes (ex. cancer vs. no cancer) A) Continuous B) Ordinal C) Nominal D) Binary

Question #3 The probabilities that a 25- to 34-year old U.S. male’s cholesterol level belongs to one of the following intervals are listed. What is the probability that a male from this population has cholesterol <200 mg/dl? A) 0.414 B) 0.567 C) 0.847 D) 0.280 The probabilities that a 25- to 34-year old U.S. male’s cholesterol level belongs to one of the following intervals are listed. What is the probability that a male from this population has cholesterol <200 mg/dl? A) 0.414 B) 0.567 C) 0.847 D) 0.280 Cholesterol (mg/dl) Probability 80-1190.012 120-1590.141 160-1990.414 200-2390.280 240-2490.108 280-3190.032 320-3590.008 360-3990.005

Question #3 These categories are mutually exclusive. P(<200) =P(80-119)+P(120-159)+P(160-199) =0.012+0.141+0.414 =0.567 What is the probability that a male from this population has cholesterol <200 mg/dl? A) 0.414 B) 0.567 C) 0.847 D) 0.280 These categories are mutually exclusive. P(<200) =P(80-119)+P(120-159)+P(160-199) =0.012+0.141+0.414 =0.567 What is the probability that a male from this population has cholesterol <200 mg/dl? A) 0.414 B) 0.567 C) 0.847 D) 0.280

Question #4 The probabilities that a 25- to 34- year old U.S. male’s cholesterol level belongs to one of the following intervals are listed. Given that a person from this population has serum cholesterol level <240 mg/dL, compute the conditional probability that he will have cholesterol <200 mg/dL. A) 0.669 B) 0.567 C) 0.331 D) 0.280 The probabilities that a 25- to 34- year old U.S. male’s cholesterol level belongs to one of the following intervals are listed. Given that a person from this population has serum cholesterol level <240 mg/dL, compute the conditional probability that he will have cholesterol <200 mg/dL. A) 0.669 B) 0.567 C) 0.331 D) 0.280 Cholesterol (mg/dl) Probability 80-1190.012 120-1590.141 160-1990.414 200-2390.280 240-2490.108 280-3190.032 320-3590.008 360-3990.005

Question #4 A=cholesterol <240 B=cholesterol <200 P(B|A) = P(A  B) / P(A) P(A  B) = P(<240 & <200) = 0.567 (calc in #3) P(A) = P(<240) =.012+.141+.414+.280 = 0.847 P(B|A) = 0.567 / 0.847 = 0.669 A) 0.669 B) 0.567 C) 0.331 D) 0.280 A=cholesterol <240 B=cholesterol <200 P(B|A) = P(A  B) / P(A) P(A  B) = P(<240 & <200) = 0.567 (calc in #3) P(A) = P(<240) =.012+.141+.414+.280 = 0.847 P(B|A) = 0.567 / 0.847 = 0.669 A) 0.669 B) 0.567 C) 0.331 D) 0.280 Cholesterol (mg/dl) Probability 80-1190.012 120-1590.141 160-1990.414 200-2390.280 240-2490.108 280-3190.032 320-3590.008 360-3990.005

Question #5 In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. What is the sensitivity and specificity of this test? A) 0.734 and 0.918 B) 0.918 and 0.734 C) 0.833 and 0.813 D) 0.813 and 0.833 In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. What is the sensitivity and specificity of this test? A) 0.734 and 0.918 B) 0.918 and 0.734 C) 0.833 and 0.813 D) 0.813 and 0.833 + Diagnosis- DiagnosisTotal Disease501060 No Disease103449552 Total153459612

Question #5 Sensitivity = P(T + |D + ) = 50/60 = 0.833 Specificity = P(T - |D - ) = 449/552 = 0.813 Sensitivity = P(T + |D + ) = 50/60 = 0.833 Specificity = P(T - |D - ) = 449/552 = 0.813 + Diagnosis- DiagnosisTotal Disease501060 No Disease103449552 Total153459612

Question #5 In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. What is the sensitivity and specificity of this test? A) 0.734 and 0.918 B) 0.918 and 0.734 C) 0.833 and 0.813 D) 0.813 and 0.833 In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. What is the sensitivity and specificity of this test? A) 0.734 and 0.918 B) 0.918 and 0.734 C) 0.833 and 0.813 D) 0.813 and 0.833 + Diagnosis- DiagnosisTotal Disease501060 No Disease103449552 Total153459612

Question #6 In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. What is the probability that a subject has the disease given that the test was positive? A) 19% B) 81% C) 32% D) 68% In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. What is the probability that a subject has the disease given that the test was positive? A) 19% B) 81% C) 32% D) 68% + Diagnosis- DiagnosisTotal Disease501060 No Disease103449552 Total153459612

Question #6 Prevalence = 0.05 Positive Predictive Value = P(D+|T+) Bayes’ Theorem: P(D + |T + ) = P(D + )P(T + |D + ) / [P(D + )P(T + |D + ) + P(D - )P(T + |D - )] = (Prev)(Sens) / (Prev)(Sens) + (1-Prev)(1-Spec) = (0.05)(0.833) / (0.05)(0.833) + (0.95)(0.187) = (0.042) / (0.042+0.178) = 0.191 Prevalence = 0.05 Positive Predictive Value = P(D+|T+) Bayes’ Theorem: P(D + |T + ) = P(D + )P(T + |D + ) / [P(D + )P(T + |D + ) + P(D - )P(T + |D - )] = (Prev)(Sens) / (Prev)(Sens) + (1-Prev)(1-Spec) = (0.05)(0.833) / (0.05)(0.833) + (0.95)(0.187) = (0.042) / (0.042+0.178) = 0.191 + Diagnosis- DiagnosisTotal Disease501060 No Disease103449552 Total153459612

Question #6 In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. What is the probability that a subject has the disease given that the test was positive? A) 19% B) 81% C) 32% D) 68% In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. What is the probability that a subject has the disease given that the test was positive? A) 19% B) 81% C) 32% D) 68% + Diagnosis- DiagnosisTotal Disease501060 No Disease103449552 Total153459612

Question #7 A small p-value for a hypothesis test (i.e., <0.05) signifies which of the following? A) The conditional probability that the null hypothesis is true is <0.05 B) The probability that the alternative hypothesis is true is >0.95 C) The conditional probability of the data being this extreme if the null hypothesis was true is <0.05 D) The conditional probability of the data being this extreme if the alternative hypothesis is true is >0.05 A small p-value for a hypothesis test (i.e., <0.05) signifies which of the following? A) The conditional probability that the null hypothesis is true is <0.05 B) The probability that the alternative hypothesis is true is >0.95 C) The conditional probability of the data being this extreme if the null hypothesis was true is <0.05 D) The conditional probability of the data being this extreme if the alternative hypothesis is true is >0.05

Question #8 The following output comes from an experiment comparing the levels of carboxyhemoglobin for a group of cigarette smokers of size n1=75 and a group of non-smokers of size n2=121. It is believed that the mean carboxyhemoglobin level of the smokers must be higher than the mean level of non-smokers. Assume that the underlying variances are identical and use alpha=0.05. Smokers: Number of obs = 75 Non-smokers: Number of obs = 121 ------------------------------------------------------------------------------------------------------ Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] --------------+-------------------------------------------------------------------------------------- Smokers | 4.1.2309401 17.7535 0.0000 3.639842 4.560158 Non-smok | 1.3.1818182 7.15 0.0000.9400127 1.659987 --------------+-------------------------------------------------------------------------------------- diff | 2.8.2939238 9.52628 0.0000 2.220304 3.379696 ------------------------------------------------------------------------------------------------------ Degrees of freedom: ??? Ho: mean(smoker) - mean(non-smoker) = diff = 0 Ha: diff 0 t = 9.5263 t = 9.5263 t = 9.5263 P |t| = 0.0000 P > t = 0.0000

Question #8 What is the proper test of this hypothesis? A) A z test B) A one-sample t test C) A paired t test D) An unpaired t test What is the proper test of this hypothesis? A) A z test B) A one-sample t test C) A paired t test D) An unpaired t test Smokers: Number of obs = 75 Non-smokers: Number of obs = 121 ------------------------------------------------------------------------------------------------------ Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] --------------+-------------------------------------------------------------------------------------- Smokers | 4.1.2309401 17.7535 0.0000 3.639842 4.560158 Non-smok | 1.3.1818182 7.15 0.0000.9400127 1.659987 --------------+-------------------------------------------------------------------------------------- diff | 2.8.2939238 9.52628 0.0000 2.220304 3.379696 ------------------------------------------------------------------------------------------------------ Degrees of freedom: ??? Ho: mean(smoker) - mean(non-smoker) = diff = 0 Ha: diff 0 t = 9.5263 t = 9.5263 t = 9.5263 P |t| = 0.0000 P > t = 0.0000

Question #9 The following output comes from an experiment comparing the levels of carboxyhemoglobin for a group of cigarette smokers of size n1=75 and a group of non-smokers of size n2=121. It is believed that the mean carboxyhemoglobin level of the smokers must be higher than the mean level of non-smokers. Assume that the underlying variances are identical and use alpha=0.05. What are the null and alternative hypotheses for this test? µ 1 is the mean carboxyhemoglobin level among smokers, and µ 2 is the mean level among non-smokers. A) H 0 : µ 1 = µ 2 ; H A : µ 1 ≠ µ 2 B) H 0 : µ 1 ≤ µ 2 ; H A : µ 1 ≠ µ 2 C) H 0 : µ 1 ≤ µ 2 ; H A : µ 1 > µ 2 D) H 0 : µ 1 ≥ µ 2 ; H A : µ 1 < µ 2 The following output comes from an experiment comparing the levels of carboxyhemoglobin for a group of cigarette smokers of size n1=75 and a group of non-smokers of size n2=121. It is believed that the mean carboxyhemoglobin level of the smokers must be higher than the mean level of non-smokers. Assume that the underlying variances are identical and use alpha=0.05. What are the null and alternative hypotheses for this test? µ 1 is the mean carboxyhemoglobin level among smokers, and µ 2 is the mean level among non-smokers. A) H 0 : µ 1 = µ 2 ; H A : µ 1 ≠ µ 2 B) H 0 : µ 1 ≤ µ 2 ; H A : µ 1 ≠ µ 2 C) H 0 : µ 1 ≤ µ 2 ; H A : µ 1 > µ 2 D) H 0 : µ 1 ≥ µ 2 ; H A : µ 1 < µ 2

Question #10 The following output comes from an experiment comparing the levels of carboxyhemoglobin for a group of cigarette smokers of size n1=75 and a group of non-smokers of size n2=121. It is believed that the mean carboxyhemoglobin level of the smokers must be higher than the mean level of non-smokers. Assume that the underlying variances are identical and use alpha=0.05. What is the number of the degrees of freedom associated with this test? A) df = n 1 – 1 = 74 B) df = n 2 – 1 = 121 C) df = n 1 + n 2 – 1 = 195 D) df = n 1 + n 2 – 2 = 194 The following output comes from an experiment comparing the levels of carboxyhemoglobin for a group of cigarette smokers of size n1=75 and a group of non-smokers of size n2=121. It is believed that the mean carboxyhemoglobin level of the smokers must be higher than the mean level of non-smokers. Assume that the underlying variances are identical and use alpha=0.05. What is the number of the degrees of freedom associated with this test? A) df = n 1 – 1 = 74 B) df = n 2 – 1 = 121 C) df = n 1 + n 2 – 1 = 195 D) df = n 1 + n 2 – 2 = 194

Question #10 This is a t-test of two independent samples, therefore: df = n 1 + n 2 – 2 =75 + 121 – 2 = 194 What is the number of the degrees of freedom associated with this test? A) df = n 1 – 1 = 74 B) df = n 2 – 1 = 121 C) df = n 1 + n 2 – 1 = 195 D) df = n 1 + n 2 – 2 = 194 This is a t-test of two independent samples, therefore: df = n 1 + n 2 – 2 =75 + 121 – 2 = 194 What is the number of the degrees of freedom associated with this test? A) df = n 1 – 1 = 74 B) df = n 2 – 1 = 121 C) df = n 1 + n 2 – 1 = 195 D) df = n 1 + n 2 – 2 = 194

Question #11 Based on the output, what is the decision? A) Reject H 0 ; The carboxyhemoglobin levels are higher among non-smokers B) Do not reject H 0 ; The carboxyhemoglobin levels are equal C) Do not reject H 0 ; The carboxyhemoglobin levels are lower among non- smokers D) Reject H 0 ; The carboxyhemoglobin levels are higher among smokers Based on the output, what is the decision? A) Reject H 0 ; The carboxyhemoglobin levels are higher among non-smokers B) Do not reject H 0 ; The carboxyhemoglobin levels are equal C) Do not reject H 0 ; The carboxyhemoglobin levels are lower among non- smokers D) Reject H 0 ; The carboxyhemoglobin levels are higher among smokers Smokers: Number of obs = 75 Non-smokers: Number of obs = 121 ------------------------------------------------------------------------------------------------------ Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] --------------+-------------------------------------------------------------------------------------- Smokers | 4.1.2309401 17.7535 0.0000 3.639842 4.560158 Non-smok | 1.3.1818182 7.15 0.0000.9400127 1.659987 --------------+-------------------------------------------------------------------------------------- diff | 2.8.2939238 9.52628 0.0000 2.220304 3.379696 ------------------------------------------------------------------------------------------------------ Degrees of freedom: ??? Ho: mean(smoker) - mean(non-smoker) = diff = 0 Ha: diff 0 t = 9.5263 t = 9.5263 t = 9.5263 P |t| = 0.0000 P > t = 0.0000

Question #11 To ease interpretation, rearrange the hypotheses H 0 : µ 1 ≤ µ 2 can be rewritten as H 0 : µ 1 - µ 2 ≤ 0 H A : µ 1 > µ 2 can be rewritten as H A : µ 1 - µ 2 > 0 Now select the appropriate output (circled) p < 0.05; reject H 0 and conclude that smokers have higher carboxyhemoglobin levels than non-smokers To ease interpretation, rearrange the hypotheses H 0 : µ 1 ≤ µ 2 can be rewritten as H 0 : µ 1 - µ 2 ≤ 0 H A : µ 1 > µ 2 can be rewritten as H A : µ 1 - µ 2 > 0 Now select the appropriate output (circled) p < 0.05; reject H 0 and conclude that smokers have higher carboxyhemoglobin levels than non-smokers Smokers: Number of obs = 75 Non-smokers: Number of obs = 121 ------------------------------------------------------------------------------------------------------ Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] --------------+-------------------------------------------------------------------------------------- Smokers | 4.1.2309401 17.7535 0.0000 3.639842 4.560158 Non-smok | 1.3.1818182 7.15 0.0000.9400127 1.659987 --------------+-------------------------------------------------------------------------------------- diff | 2.8.2939238 9.52628 0.0000 2.220304 3.379696 ------------------------------------------------------------------------------------------------------ Degrees of freedom: ??? Ho: mean(smoker) - mean(non-smoker) = diff = 0 Ha: diff 0 t = 9.5263 t = 9.5263 t = 9.5263 P |t| = 0.0000 P > t = 0.0000

Question #11 Based on the output, what is the decision? A) Reject H 0 ; The carboxyhemoglobin levels are higher among non-smokers B) Do not reject H 0 ; The carboxyhemoglobin levels are equal C) Do not reject H 0 ; The carboxyhemoglobin levels are lower among non- smokers D) Reject H 0 ; The carboxyhemoglobin levels are higher among smokers Based on the output, what is the decision? A) Reject H 0 ; The carboxyhemoglobin levels are higher among non-smokers B) Do not reject H 0 ; The carboxyhemoglobin levels are equal C) Do not reject H 0 ; The carboxyhemoglobin levels are lower among non- smokers D) Reject H 0 ; The carboxyhemoglobin levels are higher among smokers Smokers: Number of obs = 75 Non-smokers: Number of obs = 121 ------------------------------------------------------------------------------------------------------ Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] --------------+-------------------------------------------------------------------------------------- Smokers | 4.1.2309401 17.7535 0.0000 3.639842 4.560158 Non-smok | 1.3.1818182 7.15 0.0000.9400127 1.659987 --------------+-------------------------------------------------------------------------------------- diff | 2.8.2939238 9.52628 0.0000 2.220304 3.379696 ------------------------------------------------------------------------------------------------------ Degrees of freedom: ??? Ho: mean(smoker) - mean(non-smoker) = diff = 0 Ha: diff 0 t = 9.5263 t = 9.5263 t = 9.5263 P |t| = 0.0000 P > t = 0.0000

Question #12 Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable monitored is pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline level. The results of one particular hour based on N=17 samples is a mean of 8.42 and a standard deviation of 0.16. Interest focuses on whether there is sufficient evidence to conclude that the mean pH level in the water differs from 8.5. The hypotheses are then: A) H 0 : µ = 8.5, H A : µ ≠ 8.5 B) H 0 : µ ≠ 8.5, H A : µ = 8.5 C) H 0 : µ = 0, H A : µ ≠ 0 D) H 0 : µ = 0, H A : µ ≠ 8.5 Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable monitored is pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline level. The results of one particular hour based on N=17 samples is a mean of 8.42 and a standard deviation of 0.16. Interest focuses on whether there is sufficient evidence to conclude that the mean pH level in the water differs from 8.5. The hypotheses are then: A) H 0 : µ = 8.5, H A : µ ≠ 8.5 B) H 0 : µ ≠ 8.5, H A : µ = 8.5 C) H 0 : µ = 0, H A : µ ≠ 0 D) H 0 : µ = 0, H A : µ ≠ 8.5

Question #13 Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable monitored is pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline level. The results of one particular hour based on N=17 samples is a mean of 8.42 and a standard deviation of 0.16. The degrees of freedom for this test is: A) 7 B) 16 C) 8.5 D) Not applicable Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable monitored is pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline level. The results of one particular hour based on N=17 samples is a mean of 8.42 and a standard deviation of 0.16. The degrees of freedom for this test is: A) 7 B) 16 C) 8.5 D) Not applicable

Question #13 This is a one-sample t-test. *NOTE* Do not be confused by the use of the word “samples” in the set-up…it refers to the number of water samples. df = n – 1 = 17 – 1 = 16 The degrees of freedom for this test is: A) 7 B) 16 C) 8.5 D) Not applicable This is a one-sample t-test. *NOTE* Do not be confused by the use of the word “samples” in the set-up…it refers to the number of water samples. df = n – 1 = 17 – 1 = 16 The degrees of freedom for this test is: A) 7 B) 16 C) 8.5 D) Not applicable

Question #14 Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable monitored is pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline level. The results of one particular hour based on N=17 samples is a mean of 8.42 and a standard deviation of 0.16. Is there sufficient evidence to conclude that the mean pH level in the water differs from 8.5 at the 0.05 level of significance? A) Yes, because the sample mean was 8.42 B) Yes, since p<0.05 C) No, since p>0.05 D) No, since the sample mean (8.42) is practically equal to 8.5 Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable monitored is pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline level. The results of one particular hour based on N=17 samples is a mean of 8.42 and a standard deviation of 0.16. Is there sufficient evidence to conclude that the mean pH level in the water differs from 8.5 at the 0.05 level of significance? A) Yes, because the sample mean was 8.42 B) Yes, since p<0.05 C) No, since p>0.05 D) No, since the sample mean (8.42) is practically equal to 8.5

Question #14 Carry out the one-sample, two-sided t-test at alpha=0.05: _ t = (x - µ) / (s / √n) = (8.42-8.5) / (0.16/√17) = -2.062; 16 df; 2(.025)<p<2(.05) p>0.05; do not reject H 0 In STATA: ttesti 17 8.42 0.16 8.5 p=0.0559; do not reject H 0 Is there sufficient evidence to conclude that the mean pH level in the water differs from 8.5 at the 0.05 level of significance? A) Yes, because the sample mean was 8.42 B) Yes, since p<0.05 C) No, since p>0.05 D) No, since the sample mean (8.42) is practically equal to 8.5 Carry out the one-sample, two-sided t-test at alpha=0.05: _ t = (x - µ) / (s / √n) = (8.42-8.5) / (0.16/√17) = -2.062; 16 df; 2(.025)<p<2(.05) p>0.05; do not reject H 0 In STATA: ttesti 17 8.42 0.16 8.5 p=0.0559; do not reject H 0 Is there sufficient evidence to conclude that the mean pH level in the water differs from 8.5 at the 0.05 level of significance? A) Yes, because the sample mean was 8.42 B) Yes, since p<0.05 C) No, since p>0.05 D) No, since the sample mean (8.42) is practically equal to 8.5

Question #15 A clinical trial is conducted to compare two treatments for asthma. Asthmatics are split into two groups with one group receiving Treatment A and the other group receiving Treatment B. At the end of the trial, the groups are compared with respect to FEV (forced expiratory volume). Differences found at the end of the trial may be due to: A) Treatment effect differences B) Chance C) Group differences with respect to other variables (e.g. sex, age, race, severity of asthma, etc.) D) All of the above A clinical trial is conducted to compare two treatments for asthma. Asthmatics are split into two groups with one group receiving Treatment A and the other group receiving Treatment B. At the end of the trial, the groups are compared with respect to FEV (forced expiratory volume). Differences found at the end of the trial may be due to: A) Treatment effect differences B) Chance C) Group differences with respect to other variables (e.g. sex, age, race, severity of asthma, etc.) D) All of the above

Question #16 A large study has determined that the diastolic blood pressure among women ages 18-74 is normally distributed with mean 70 mm Hg and a standard deviation of σ=10 mm Hg. What is the probability that a randomly chosen woman will have diastolic blood pressure lower than 89.6 mm Hg? A) 2.5% B) 5% C) 97.5% D) 95% A large study has determined that the diastolic blood pressure among women ages 18-74 is normally distributed with mean 70 mm Hg and a standard deviation of σ=10 mm Hg. What is the probability that a randomly chosen woman will have diastolic blood pressure lower than 89.6 mm Hg? A) 2.5% B) 5% C) 97.5% D) 95%

Question #16 Z = X - µ / σ = (89.6-70) / 10 = 1.96 P(x 1.96) = 1 – 0.025 = 0.975 Or, in STATA: display normprob(1.96) = 0.975 What is the probability that a randomly chosen woman will have diastolic blood pressure lower than 89.6 mm Hg? A) 2.5% B) 5% C) 97.5% D) 95% Z = X - µ / σ = (89.6-70) / 10 = 1.96 P(x 1.96) = 1 – 0.025 = 0.975 Or, in STATA: display normprob(1.96) = 0.975 What is the probability that a randomly chosen woman will have diastolic blood pressure lower than 89.6 mm Hg? A) 2.5% B) 5% C) 97.5% D) 95%

Question #17 A large study has determined that the diastolic blood pressure among women ages 18-74 is normally distributed with mean 70 mm Hg and a standard deviation of σ=10 mm Hg. Suppose that you measure the diastolic blood pressure in n=25 women. What is the sampling distribution of X 25, the mean diastolic blood pressure of the women in the sample? A) Normal, µ xbar =70 mmHg and σ xbar =10 mmHg B) Normal, µ xbar =7 mmHg and σ xbar =10 mmHg C) Normal, µ xbar =70 mmHg and σ xbar =2 mmHg D) Normal, µ xbar =7 mmHg and σ xbar =2 mmHg A large study has determined that the diastolic blood pressure among women ages 18-74 is normally distributed with mean 70 mm Hg and a standard deviation of σ=10 mm Hg. Suppose that you measure the diastolic blood pressure in n=25 women. What is the sampling distribution of X 25, the mean diastolic blood pressure of the women in the sample? A) Normal, µ xbar =70 mmHg and σ xbar =10 mmHg B) Normal, µ xbar =7 mmHg and σ xbar =10 mmHg C) Normal, µ xbar =70 mmHg and σ xbar =2 mmHg D) Normal, µ xbar =7 mmHg and σ xbar =2 mmHg

Question #17 If the underlying distribution is normal, the sampling distribution is assumed to be normal. µ xbar = µ 0 = 70 mmHg σ xbar = σ 0 / √n = 10 / √25 = 2 mmHg Suppose that you measure the diastolic blood pressure in n=25 women. What is the sampling distribution of X 25, the mean diastolic blood pressure of the women in the sample? A) Normal, µ xbar =70 mmHg and σ xbar =10 mmHg B) Normal, µ xbar =7 mmHg and σ xbar =10 mmHg C) Normal, µ xbar =70 mmHg and σ xbar =2 mmHg D) Normal, µ xbar =7 mmHg and σ xbar =2 mmHg If the underlying distribution is normal, the sampling distribution is assumed to be normal. µ xbar = µ 0 = 70 mmHg σ xbar = σ 0 / √n = 10 / √25 = 2 mmHg Suppose that you measure the diastolic blood pressure in n=25 women. What is the sampling distribution of X 25, the mean diastolic blood pressure of the women in the sample? A) Normal, µ xbar =70 mmHg and σ xbar =10 mmHg B) Normal, µ xbar =7 mmHg and σ xbar =10 mmHg C) Normal, µ xbar =70 mmHg and σ xbar =2 mmHg D) Normal, µ xbar =7 mmHg and σ xbar =2 mmHg

Question #18 A large study has determined that the diastolic blood pressure among women ages 18-74 is normally distributed with mean 70 mm Hg and a standard deviation of σ=10 mm Hg. What is the lower 5 th percentile of the above distribution (i.e., the point below which, the mean diastolic blood pressures of these 25-women samples will be 5% of the time)? A) 66.08 mmHg B) 66.71 mmHg C) 50.40 mmHg D) 27.55 mmHg A large study has determined that the diastolic blood pressure among women ages 18-74 is normally distributed with mean 70 mm Hg and a standard deviation of σ=10 mm Hg. What is the lower 5 th percentile of the above distribution (i.e., the point below which, the mean diastolic blood pressures of these 25-women samples will be 5% of the time)? A) 66.08 mmHg B) 66.71 mmHg C) 50.40 mmHg D) 27.55 mmHg

Question #18 From Z table, we know that Z=-1.645 cuts off the lower 5 th percentile. Alternatively, can get this in STATA: display invnorm(.05) Rearrange Z equation and solve for X bar : Z = (X bar - µ xbar ) / σ xbar X bar = (Z)(σ xbar ) + µ xbar = (-1.645)(2) + 70 = 66.71 What is the lower 5 th percentile of the above distribution (i.e., the point below which, the mean diastolic blood pressures of these 25- women samples will be 5% of the time)? A) 66.08 mmHg B) 66.71 mmHg C) 50.40 mmHg D) 27.55 mmHg From Z table, we know that Z=-1.645 cuts off the lower 5 th percentile. Alternatively, can get this in STATA: display invnorm(.05) Rearrange Z equation and solve for X bar : Z = (X bar - µ xbar ) / σ xbar X bar = (Z)(σ xbar ) + µ xbar = (-1.645)(2) + 70 = 66.71 What is the lower 5 th percentile of the above distribution (i.e., the point below which, the mean diastolic blood pressures of these 25- women samples will be 5% of the time)? A) 66.08 mmHg B) 66.71 mmHg C) 50.40 mmHg D) 27.55 mmHg

Question #19 Consider the following: a hypothesis test was performed. The resulting p-value was higher than the pre-specified alpha and thus there was a failure to reject the null hypothesis. Which of the following are correct? A) The null hypothesis must be true B) An evaluation of power and type II error may be warranted C) It is still feasible that the alternative hypothesis is true D) Both B and C Consider the following: a hypothesis test was performed. The resulting p-value was higher than the pre-specified alpha and thus there was a failure to reject the null hypothesis. Which of the following are correct? A) The null hypothesis must be true B) An evaluation of power and type II error may be warranted C) It is still feasible that the alternative hypothesis is true D) Both B and C

Question #19 We haven’t proven the null by failing to reject it. It is still possible that the alternative is actually true. Recall that power (1-β) is the probability of rejecting the null when it is false. β is the probability of making a Type II error (failing to reject the null when it is false). In this case it is possible that statistical power was too low, leading to a Type II error where we have failed to reject H 0 when it is actually false. An evaluation is appropriate. A) The null hypothesis must be true B) An evaluation of power and type II error may be warranted C) It is still feasible that the alternative hypothesis is true D) Both B and C We haven’t proven the null by failing to reject it. It is still possible that the alternative is actually true. Recall that power (1-β) is the probability of rejecting the null when it is false. β is the probability of making a Type II error (failing to reject the null when it is false). In this case it is possible that statistical power was too low, leading to a Type II error where we have failed to reject H 0 when it is actually false. An evaluation is appropriate. A) The null hypothesis must be true B) An evaluation of power and type II error may be warranted C) It is still feasible that the alternative hypothesis is true D) Both B and C

Question #20 The mean cholesterol level of healthy 20-74 year old males in the US is 211 mg/mL, with an unknown standard deviation. The average cholesterol level among 16 hypertensive males was 232 mg/mL. Conduct a statistical test at the 10% level about whether hypertension is associated with higher cholesterol levels (consult the STATA output below). Assuming that the α -level of the test is 10%, what can you conclude? The mean cholesterol level of healthy 20-74 year old males in the US is 211 mg/mL, with an unknown standard deviation. The average cholesterol level among 16 hypertensive males was 232 mg/mL. Conduct a statistical test at the 10% level about whether hypertension is associated with higher cholesterol levels (consult the STATA output below). Assuming that the α -level of the test is 10%, what can you conclude? Number of obs = 16 --------------------------------------------------------------------------------------------------- Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] ------------+-------------------------------------------------------------------------------------- x | 232 12 19.3333 0.0000 206.4226 257.5774 ---------------------------------------------------------------------------------------------------- Degrees of freedom: 15 Ho: mean(x) = 211 Ha: mean 211 t = 1.7500 t = 1.7500 t = 1.7500 P |t| = 0.1005 P > t = 0.0503

Question #20 This is a one-sample, one-sided t-test. Set up the hypotheses and select relevant output: H 0 : µ ≤ 211 H A : µ > 211 The alpha level is 0.10. The p-value is 0.0503. p<.10; reject H 0 ; conclude that hypertensive males have higher cholesterol levels than the healthy population of US males 20-74. This is a one-sample, one-sided t-test. Set up the hypotheses and select relevant output: H 0 : µ ≤ 211 H A : µ > 211 The alpha level is 0.10. The p-value is 0.0503. p<.10; reject H 0 ; conclude that hypertensive males have higher cholesterol levels than the healthy population of US males 20-74. Number of obs = 16 --------------------------------------------------------------------------------------------------- Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] ------------+-------------------------------------------------------------------------------------- x | 232 12 19.3333 0.0000 206.4226 257.5774 ---------------------------------------------------------------------------------------------------- Degrees of freedom: 15 Ho: mean(x) = 211 Ha: mean 211 t = 1.7500 t = 1.7500 t = 1.7500 P |t| = 0.1005 P > t = 0.0503

Question #20 Assuming that the α -level of the test is 10%, what can you conclude? A) The p-value of the test is 0.1005 and thus we fail to reject the null hypothesis and thus conclude that hypertensives have a higher cholesterol level. B) The p-value of the test is 0.9497 and thus we fail to reject the null hypothesis and thus conclude that hypertensives have a lower cholesterol level. C) The p-value of the test is 0.0503 and thus we fail to reject the null hypothesis and conclude that hypertension is not associated with higher cholesterol levels. D) The p-value of the test is 0.0503 and thus we reject the null hypothesis and conclude that the mean cholesterol level for hypertensives is higher than healthy 20-74 year old US males. Assuming that the α -level of the test is 10%, what can you conclude? A) The p-value of the test is 0.1005 and thus we fail to reject the null hypothesis and thus conclude that hypertensives have a higher cholesterol level. B) The p-value of the test is 0.9497 and thus we fail to reject the null hypothesis and thus conclude that hypertensives have a lower cholesterol level. C) The p-value of the test is 0.0503 and thus we fail to reject the null hypothesis and conclude that hypertension is not associated with higher cholesterol levels. D) The p-value of the test is 0.0503 and thus we reject the null hypothesis and conclude that the mean cholesterol level for hypertensives is higher than healthy 20-74 year old US males. Number of obs = 16 --------------------------------------------------------------------------------------------------- Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] ------------+-------------------------------------------------------------------------------------- x | 232 12 19.3333 0.0000 206.4226 257.5774 ---------------------------------------------------------------------------------------------------- Degrees of freedom: 15 Ho: mean(x) = 211 Ha: mean 211 t = 1.7500 t = 1.7500 t = 1.7500 P |t| = 0.1005 P > t = 0.0503

Question #21 Ten infants were involved in a study to compare the effectiveness of two medications for the treatment of diaper rash. For each baby, two areas of approximately the same size and rash severity were selected, and one area was treated with medication A and the other area was treated with medication B. The number of hours for the rash to disappear was recorded for each medication and each infant. The question of interest: is there sufficient evidence to conclude that a difference exists in the mean time required for elimination of the rash? The appropriate statistical methodology to use is: A) A 2-sample (independent) t-test B) A z-test C) A paired t-test D) A sensitivity analysis Ten infants were involved in a study to compare the effectiveness of two medications for the treatment of diaper rash. For each baby, two areas of approximately the same size and rash severity were selected, and one area was treated with medication A and the other area was treated with medication B. The number of hours for the rash to disappear was recorded for each medication and each infant. The question of interest: is there sufficient evidence to conclude that a difference exists in the mean time required for elimination of the rash? The appropriate statistical methodology to use is: A) A 2-sample (independent) t-test B) A z-test C) A paired t-test D) A sensitivity analysis

Question #21 Medication A and Medication B are “matched” to the same infant. The samples are not independent. We were not given a known standard deviation σ. A sensitivity analysis is typically a sub-analysis that is performed, changing model parameters to see if the results remain consistent. This concept has not been covered in class and is not relevant to this question. This study calls for a paired t-test. A) A 2-sample (independent) t-test B) A z-test C) A paired t-test D) A sensitivity analysis Medication A and Medication B are “matched” to the same infant. The samples are not independent. We were not given a known standard deviation σ. A sensitivity analysis is typically a sub-analysis that is performed, changing model parameters to see if the results remain consistent. This concept has not been covered in class and is not relevant to this question. This study calls for a paired t-test. A) A 2-sample (independent) t-test B) A z-test C) A paired t-test D) A sensitivity analysis

GOOD LUCK!

STAT E102 Midterm Review March 15, 2006. Review Topics Populations, parameters, sampling Types of data i.e., nominal, ordinal, discrete, continuous Graphical/tabular.

Similar presentations

Presentation on theme: "STAT E102 Midterm Review March 15, 2006. Review Topics Populations, parameters, sampling Types of data i.e., nominal, ordinal, discrete, continuous Graphical/tabular."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

STAT E102 Midterm Review March 15, 2006. Review Topics Populations, parameters, sampling Types of data i.e., nominal, ordinal, discrete, continuous Graphical/tabular.

Similar presentations

Presentation on theme: "STAT E102 Midterm Review March 15, 2006. Review Topics Populations, parameters, sampling Types of data i.e., nominal, ordinal, discrete, continuous Graphical/tabular."— Presentation transcript:

Similar presentations

About project

Feedback