STAT E102 Midterm Review March 15, 2006. Review Topics Populations, parameters, sampling Types of data i.e., nominal, ordinal, discrete, continuous Graphical/tabular.

Slides:



Advertisements
Similar presentations
Inferential Statistics
Advertisements

Is it statistically significant?
Objectives (BPS chapter 18) Inference about a Population Mean  Conditions for inference  The t distribution  The one-sample t confidence interval 
Inference for a population mean BPS chapter 18 © 2006 W. H. Freeman and Company.
Inferential Statistics & Hypothesis Testing
STAT E-102 Midterm Review March 14, 2007.
Chapter 10: Hypothesis Testing
Significance Testing Chapter 13 Victor Katch Kinesiology.
Lab 4: What is a t-test? Something British mothers use to see if the new girlfriend is significantly better than the old one?
What z-scores represent
BCOR 1020 Business Statistics
Sociology 601 Class 7: September 22, 2009
Topic 2: Statistical Concepts and Market Returns
Introduction to Hypothesis Testing CJ 526 Statistical Analysis in Criminal Justice.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.Chap 9-1 Statistics for Managers Using Microsoft® Excel 5th Edition.
HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 12 Additional.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
BS704 Class 7 Hypothesis Testing Procedures
Inferences About Process Quality
Chapter 9 Hypothesis Testing.
Hypothesis Testing – Part I
Hypothesis Testing Using The One-Sample t-Test
Sample Size Determination Ziad Taib March 7, 2014.
Statistical Analysis. Purpose of Statistical Analysis Determines whether the results found in an experiment are meaningful. Answers the question: –Does.
Fall 2012Biostat 5110 (Biostatistics 511) Discussion Section Week 8 C. Jason Liang Medical Biometry I.
Chapter 9 Comparing Means
Presentation 12 Chi-Square test.
1. Statistics: Learning from Samples about Populations Inference 1: Confidence Intervals What does the 95% CI really mean? Inference 2: Hypothesis Tests.
Statistics 11 Hypothesis Testing Discover the relationships that exist between events/things Accomplished by: Asking questions Getting answers In accord.
Hypothesis Testing:.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Probability Distributions and Test of Hypothesis Ka-Lok Ng Dept. of Bioinformatics Asia University.
Chapter 13 – 1 Chapter 12: Testing Hypotheses Overview Research and null hypotheses One and two-tailed tests Errors Testing the difference between two.
Chapter 10 Hypothesis Testing
Week 9 Chapter 9 - Hypothesis Testing II: The Two-Sample Case.
Overview Definition Hypothesis
1/2555 สมศักดิ์ ศิวดำรงพงศ์
Statistical Analysis Statistical Analysis
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
More About Significance Tests
Comparing Two Population Means
1 Power and Sample Size in Testing One Mean. 2 Type I & Type II Error Type I Error: reject the null hypothesis when it is true. The probability of a Type.
Week 8 Chapter 8 - Hypothesis Testing I: The One-Sample Case.
Chapter 9: Testing Hypotheses
Week 111 Power of the t-test - Example In a metropolitan area, the concentration of cadmium (Cd) in leaf lettuce was measured in 7 representative gardens.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
Hypothesis Testing Hypothesis Testing Topic 11. Hypothesis Testing Another way of looking at statistical inference in which we want to ask a question.
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
Essential Question:  How do scientists use statistical analyses to draw meaningful conclusions from experimental results?
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
Introduction to the Practice of Statistics Fifth Edition Chapter 6: Introduction to Inference Copyright © 2005 by W. H. Freeman and Company David S. Moore.
1 9 Tests of Hypotheses for a Single Sample. © John Wiley & Sons, Inc. Applied Statistics and Probability for Engineers, by Montgomery and Runger. 9-1.
How confident are we in the estimation of mean/proportion we have calculated?
Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall 9-1 σ σ.
Week111 The t distribution Suppose that a SRS of size n is drawn from a N(μ, σ) population. Then the one sample t statistic has a t distribution with n.
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
Statistical Analysis II Lan Kong Associate Professor Division of Biostatistics and Bioinformatics Department of Public Health Sciences December 15, 2015.
© Copyright McGraw-Hill 2004
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.1 Categorical Response: Comparing Two Proportions.
WS 2007/08Prof. Dr. J. Schütze, FB GW KI 1 Hypothesis testing Statistical Tests Sometimes you have to make a decision about a characteristic of a population.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
1 Testing Statistical Hypothesis The One Sample t-Test Heibatollah Baghi, and Mastee Badii.
Chapter 7 Inference Concerning Populations (Numeric Responses)
Hypothesis Tests. An Hypothesis is a guess about a situation that can be tested, and the test outcome can be either true or false. –The Null Hypothesis.
CHAPTER 7: TESTING HYPOTHESES Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for a Diverse Society.
When  is unknown  The sample standard deviation s provides an estimate of the population standard deviation .  Larger samples give more reliable estimates.
Objectives (PSLS Chapter 18) Comparing two means (σ unknown)  Two-sample situations  t-distribution for two independent samples  Two-sample t test 
What are their purposes? What kinds?
Type I and Type II Errors
Presentation transcript:

STAT E102 Midterm Review March 15, 2006

Review Topics Populations, parameters, sampling Types of data i.e., nominal, ordinal, discrete, continuous Graphical/tabular representations of data Summary statistics Central tendency: mean, median, mode Variability: variance, standard deviation, range, interquartile range Populations, parameters, sampling Types of data i.e., nominal, ordinal, discrete, continuous Graphical/tabular representations of data Summary statistics Central tendency: mean, median, mode Variability: variance, standard deviation, range, interquartile range

Review Topics Normal distribution Standardization Z-statistic Probability, conditional probability, Bayes’ Theorem Diagnostic/screening tests Sensitivity, specificity, predictive value CLT, sampling distributions Normal distribution Standardization Z-statistic Probability, conditional probability, Bayes’ Theorem Diagnostic/screening tests Sensitivity, specificity, predictive value CLT, sampling distributions

Review Topics Hypothesis testing, p-values, alpha 1- vs. 2-sided T distribution, t-tests 1- vs. 2-sample Independent vs. paired (for 2-sample) Type I / Type II error Power, sample size estimation Hypothesis testing, p-values, alpha 1- vs. 2-sided T distribution, t-tests 1- vs. 2-sample Independent vs. paired (for 2-sample) Type I / Type II error Power, sample size estimation

Question #1 A clinical trial is in development to compare a new drug for Progressive Multifocal Leukoencephalopathy (PML) to placebo. It is decided that a 2-sample independent t-test will compare the groups at the end of the trial with respect to a continuous variable endpoint. Early drafts of the study protocol estimate that the sample size would be 200 (100 per group). However, a recently completed study suggests that the variability of the endpoint in the two groups is larger than originally planned (i.e., larger than assumed in the original sample size calculations). The sample size is re-estimated using the new variability estimates from the recently completed trial. The result is: A) A larger total sample size B) A smaller total sample size C) The sample size does not change D) We cannot perform a t-test on these data any longer A clinical trial is in development to compare a new drug for Progressive Multifocal Leukoencephalopathy (PML) to placebo. It is decided that a 2-sample independent t-test will compare the groups at the end of the trial with respect to a continuous variable endpoint. Early drafts of the study protocol estimate that the sample size would be 200 (100 per group). However, a recently completed study suggests that the variability of the endpoint in the two groups is larger than originally planned (i.e., larger than assumed in the original sample size calculations). The sample size is re-estimated using the new variability estimates from the recently completed trial. The result is: A) A larger total sample size B) A smaller total sample size C) The sample size does not change D) We cannot perform a t-test on these data any longer

Question #1 Sample size equation: n = [(z α/2 + z β )(σ) / (µ 1 - µ 0 )] 2 Variability is represented by σ in the numerator. As variability (standard deviation σ) increases, n increases. The result is: A) A larger total sample size B) A smaller total sample size C) The sample size does not change D) We cannot perform a t-test on these data any longer Sample size equation: n = [(z α/2 + z β )(σ) / (µ 1 - µ 0 )] 2 Variability is represented by σ in the numerator. As variability (standard deviation σ) increases, n increases. The result is: A) A larger total sample size B) A smaller total sample size C) The sample size does not change D) We cannot perform a t-test on these data any longer

Question #2 T-tests are most useful for what type of data (variables)? A) Continuous B) Ordinal C) Nominal D) Binary T-tests are most useful for what type of data (variables)? A) Continuous B) Ordinal C) Nominal D) Binary

Question #2 T-tests are used for continuous data. Recall: Ordinal data have natural order without defined magnitude (ex. low, moderate, high) Nominal data have categories with no order or rank (ex. gender, race) Binary data are discrete data with only two possible outcomes (ex. cancer vs. no cancer) A) Continuous B) Ordinal C) Nominal D) Binary T-tests are used for continuous data. Recall: Ordinal data have natural order without defined magnitude (ex. low, moderate, high) Nominal data have categories with no order or rank (ex. gender, race) Binary data are discrete data with only two possible outcomes (ex. cancer vs. no cancer) A) Continuous B) Ordinal C) Nominal D) Binary

Question #3 The probabilities that a 25- to 34-year old U.S. male’s cholesterol level belongs to one of the following intervals are listed. What is the probability that a male from this population has cholesterol <200 mg/dl? A) B) C) D) The probabilities that a 25- to 34-year old U.S. male’s cholesterol level belongs to one of the following intervals are listed. What is the probability that a male from this population has cholesterol <200 mg/dl? A) B) C) D) Cholesterol (mg/dl) Probability

Question #3 The probabilities that a 25- to 34-year old U.S. male’s cholesterol level belongs to one of the following intervals are listed. What is the probability that a male from this population has cholesterol <200 mg/dl? A) B) C) D) The probabilities that a 25- to 34-year old U.S. male’s cholesterol level belongs to one of the following intervals are listed. What is the probability that a male from this population has cholesterol <200 mg/dl? A) B) C) D) Cholesterol (mg/dl) Probability

Question #3 These categories are mutually exclusive. P(<200) =P(80-119)+P( )+P( ) = =0.567 What is the probability that a male from this population has cholesterol <200 mg/dl? A) B) C) D) These categories are mutually exclusive. P(<200) =P(80-119)+P( )+P( ) = =0.567 What is the probability that a male from this population has cholesterol <200 mg/dl? A) B) C) D) 0.280

Question #4 The probabilities that a 25- to 34- year old U.S. male’s cholesterol level belongs to one of the following intervals are listed. Given that a person from this population has serum cholesterol level <240 mg/dL, compute the conditional probability that he will have cholesterol <200 mg/dL. A) B) C) D) The probabilities that a 25- to 34- year old U.S. male’s cholesterol level belongs to one of the following intervals are listed. Given that a person from this population has serum cholesterol level <240 mg/dL, compute the conditional probability that he will have cholesterol <200 mg/dL. A) B) C) D) Cholesterol (mg/dl) Probability

Question #4 A=cholesterol <240 B=cholesterol <200 P(B|A) = P(A  B) / P(A) P(A  B) = P(<240 & <200) = (calc in #3) P(A) = P(<240) = = P(B|A) = / = A) B) C) D) A=cholesterol <240 B=cholesterol <200 P(B|A) = P(A  B) / P(A) P(A  B) = P(<240 & <200) = (calc in #3) P(A) = P(<240) = = P(B|A) = / = A) B) C) D) Cholesterol (mg/dl) Probability

Question #5 In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. What is the sensitivity and specificity of this test? A) and B) and C) and D) and In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. What is the sensitivity and specificity of this test? A) and B) and C) and D) and Diagnosis- DiagnosisTotal Disease No Disease Total

Question #5 Sensitivity = P(T + |D + ) = 50/60 = Specificity = P(T - |D - ) = 449/552 = Sensitivity = P(T + |D + ) = 50/60 = Specificity = P(T - |D - ) = 449/552 = Diagnosis- DiagnosisTotal Disease No Disease Total

Question #5 In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. What is the sensitivity and specificity of this test? A) and B) and C) and D) and In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. What is the sensitivity and specificity of this test? A) and B) and C) and D) and Diagnosis- DiagnosisTotal Disease No Disease Total

Question #6 In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. What is the probability that a subject has the disease given that the test was positive? A) 19% B) 81% C) 32% D) 68% In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. What is the probability that a subject has the disease given that the test was positive? A) 19% B) 81% C) 32% D) 68% + Diagnosis- DiagnosisTotal Disease No Disease Total

Question #6 Prevalence = 0.05 Positive Predictive Value = P(D+|T+) Bayes’ Theorem: P(D + |T + ) = P(D + )P(T + |D + ) / [P(D + )P(T + |D + ) + P(D - )P(T + |D - )] = (Prev)(Sens) / (Prev)(Sens) + (1-Prev)(1-Spec) = (0.05)(0.833) / (0.05)(0.833) + (0.95)(0.187) = (0.042) / ( ) = Prevalence = 0.05 Positive Predictive Value = P(D+|T+) Bayes’ Theorem: P(D + |T + ) = P(D + )P(T + |D + ) / [P(D + )P(T + |D + ) + P(D - )P(T + |D - )] = (Prev)(Sens) / (Prev)(Sens) + (1-Prev)(1-Spec) = (0.05)(0.833) / (0.05)(0.833) + (0.95)(0.187) = (0.042) / ( ) = Diagnosis- DiagnosisTotal Disease No Disease Total

Question #6 In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. What is the probability that a subject has the disease given that the test was positive? A) 19% B) 81% C) 32% D) 68% In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. What is the probability that a subject has the disease given that the test was positive? A) 19% B) 81% C) 32% D) 68% + Diagnosis- DiagnosisTotal Disease No Disease Total

Question #7 A small p-value for a hypothesis test (i.e., <0.05) signifies which of the following? A) The conditional probability that the null hypothesis is true is <0.05 B) The probability that the alternative hypothesis is true is >0.95 C) The conditional probability of the data being this extreme if the null hypothesis was true is <0.05 D) The conditional probability of the data being this extreme if the alternative hypothesis is true is >0.05 A small p-value for a hypothesis test (i.e., <0.05) signifies which of the following? A) The conditional probability that the null hypothesis is true is <0.05 B) The probability that the alternative hypothesis is true is >0.95 C) The conditional probability of the data being this extreme if the null hypothesis was true is <0.05 D) The conditional probability of the data being this extreme if the alternative hypothesis is true is >0.05

Question #7 A small p-value for a hypothesis test (i.e., <0.05) signifies which of the following? A) The conditional probability that the null hypothesis is true is <0.05 B) The probability that the alternative hypothesis is true is >0.95 C) The conditional probability of the data being this extreme if the null hypothesis was true is <0.05 D) The conditional probability of the data being this extreme if the alternative hypothesis is true is >0.05 A small p-value for a hypothesis test (i.e., <0.05) signifies which of the following? A) The conditional probability that the null hypothesis is true is <0.05 B) The probability that the alternative hypothesis is true is >0.95 C) The conditional probability of the data being this extreme if the null hypothesis was true is <0.05 D) The conditional probability of the data being this extreme if the alternative hypothesis is true is >0.05

Question #8 The following output comes from an experiment comparing the levels of carboxyhemoglobin for a group of cigarette smokers of size n1=75 and a group of non-smokers of size n2=121. It is believed that the mean carboxyhemoglobin level of the smokers must be higher than the mean level of non-smokers. Assume that the underlying variances are identical and use alpha=0.05. Smokers: Number of obs = 75 Non-smokers: Number of obs = Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] Smokers | Non-smok | diff | Degrees of freedom: ??? Ho: mean(smoker) - mean(non-smoker) = diff = 0 Ha: diff 0 t = t = t = P |t| = P > t =

Question #8 What is the proper test of this hypothesis? A) A z test B) A one-sample t test C) A paired t test D) An unpaired t test What is the proper test of this hypothesis? A) A z test B) A one-sample t test C) A paired t test D) An unpaired t test Smokers: Number of obs = 75 Non-smokers: Number of obs = Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] Smokers | Non-smok | diff | Degrees of freedom: ??? Ho: mean(smoker) - mean(non-smoker) = diff = 0 Ha: diff 0 t = t = t = P |t| = P > t =

Question #8 What is the proper test of this hypothesis? A) A z test B) A one-sample t test C) A paired t test D) An unpaired t test What is the proper test of this hypothesis? A) A z test B) A one-sample t test C) A paired t test D) An unpaired t test Smokers: Number of obs = 75 Non-smokers: Number of obs = Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] Smokers | Non-smok | diff | Degrees of freedom: ??? Ho: mean(smoker) - mean(non-smoker) = diff = 0 Ha: diff 0 t = t = t = P |t| = P > t =

Question #9 The following output comes from an experiment comparing the levels of carboxyhemoglobin for a group of cigarette smokers of size n1=75 and a group of non-smokers of size n2=121. It is believed that the mean carboxyhemoglobin level of the smokers must be higher than the mean level of non-smokers. Assume that the underlying variances are identical and use alpha=0.05. What are the null and alternative hypotheses for this test? µ 1 is the mean carboxyhemoglobin level among smokers, and µ 2 is the mean level among non-smokers. A) H 0 : µ 1 = µ 2 ; H A : µ 1 ≠ µ 2 B) H 0 : µ 1 ≤ µ 2 ; H A : µ 1 ≠ µ 2 C) H 0 : µ 1 ≤ µ 2 ; H A : µ 1 > µ 2 D) H 0 : µ 1 ≥ µ 2 ; H A : µ 1 < µ 2 The following output comes from an experiment comparing the levels of carboxyhemoglobin for a group of cigarette smokers of size n1=75 and a group of non-smokers of size n2=121. It is believed that the mean carboxyhemoglobin level of the smokers must be higher than the mean level of non-smokers. Assume that the underlying variances are identical and use alpha=0.05. What are the null and alternative hypotheses for this test? µ 1 is the mean carboxyhemoglobin level among smokers, and µ 2 is the mean level among non-smokers. A) H 0 : µ 1 = µ 2 ; H A : µ 1 ≠ µ 2 B) H 0 : µ 1 ≤ µ 2 ; H A : µ 1 ≠ µ 2 C) H 0 : µ 1 ≤ µ 2 ; H A : µ 1 > µ 2 D) H 0 : µ 1 ≥ µ 2 ; H A : µ 1 < µ 2

Question #9 The following output comes from an experiment comparing the levels of carboxyhemoglobin for a group of cigarette smokers of size n1=75 and a group of non-smokers of size n2=121. It is believed that the mean carboxyhemoglobin level of the smokers must be higher than the mean level of non-smokers. Assume that the underlying variances are identical and use alpha=0.05. What are the null and alternative hypotheses for this test? µ 1 is the mean carboxyhemoglobin level among smokers, and µ 2 is the mean level among non-smokers. A) H 0 : µ 1 = µ 2 ; H A : µ 1 ≠ µ 2 B) H 0 : µ 1 ≤ µ 2 ; H A : µ 1 ≠ µ 2 C) H 0 : µ 1 ≤ µ 2 ; H A : µ 1 > µ 2 D) H 0 : µ 1 ≥ µ 2 ; H A : µ 1 < µ 2 The following output comes from an experiment comparing the levels of carboxyhemoglobin for a group of cigarette smokers of size n1=75 and a group of non-smokers of size n2=121. It is believed that the mean carboxyhemoglobin level of the smokers must be higher than the mean level of non-smokers. Assume that the underlying variances are identical and use alpha=0.05. What are the null and alternative hypotheses for this test? µ 1 is the mean carboxyhemoglobin level among smokers, and µ 2 is the mean level among non-smokers. A) H 0 : µ 1 = µ 2 ; H A : µ 1 ≠ µ 2 B) H 0 : µ 1 ≤ µ 2 ; H A : µ 1 ≠ µ 2 C) H 0 : µ 1 ≤ µ 2 ; H A : µ 1 > µ 2 D) H 0 : µ 1 ≥ µ 2 ; H A : µ 1 < µ 2

Question #10 The following output comes from an experiment comparing the levels of carboxyhemoglobin for a group of cigarette smokers of size n1=75 and a group of non-smokers of size n2=121. It is believed that the mean carboxyhemoglobin level of the smokers must be higher than the mean level of non-smokers. Assume that the underlying variances are identical and use alpha=0.05. What is the number of the degrees of freedom associated with this test? A) df = n 1 – 1 = 74 B) df = n 2 – 1 = 121 C) df = n 1 + n 2 – 1 = 195 D) df = n 1 + n 2 – 2 = 194 The following output comes from an experiment comparing the levels of carboxyhemoglobin for a group of cigarette smokers of size n1=75 and a group of non-smokers of size n2=121. It is believed that the mean carboxyhemoglobin level of the smokers must be higher than the mean level of non-smokers. Assume that the underlying variances are identical and use alpha=0.05. What is the number of the degrees of freedom associated with this test? A) df = n 1 – 1 = 74 B) df = n 2 – 1 = 121 C) df = n 1 + n 2 – 1 = 195 D) df = n 1 + n 2 – 2 = 194

Question #10 This is a t-test of two independent samples, therefore: df = n 1 + n 2 – 2 = – 2 = 194 What is the number of the degrees of freedom associated with this test? A) df = n 1 – 1 = 74 B) df = n 2 – 1 = 121 C) df = n 1 + n 2 – 1 = 195 D) df = n 1 + n 2 – 2 = 194 This is a t-test of two independent samples, therefore: df = n 1 + n 2 – 2 = – 2 = 194 What is the number of the degrees of freedom associated with this test? A) df = n 1 – 1 = 74 B) df = n 2 – 1 = 121 C) df = n 1 + n 2 – 1 = 195 D) df = n 1 + n 2 – 2 = 194

Question #11 Based on the output, what is the decision? A) Reject H 0 ; The carboxyhemoglobin levels are higher among non-smokers B) Do not reject H 0 ; The carboxyhemoglobin levels are equal C) Do not reject H 0 ; The carboxyhemoglobin levels are lower among non- smokers D) Reject H 0 ; The carboxyhemoglobin levels are higher among smokers Based on the output, what is the decision? A) Reject H 0 ; The carboxyhemoglobin levels are higher among non-smokers B) Do not reject H 0 ; The carboxyhemoglobin levels are equal C) Do not reject H 0 ; The carboxyhemoglobin levels are lower among non- smokers D) Reject H 0 ; The carboxyhemoglobin levels are higher among smokers Smokers: Number of obs = 75 Non-smokers: Number of obs = Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] Smokers | Non-smok | diff | Degrees of freedom: ??? Ho: mean(smoker) - mean(non-smoker) = diff = 0 Ha: diff 0 t = t = t = P |t| = P > t =

Question #11 To ease interpretation, rearrange the hypotheses H 0 : µ 1 ≤ µ 2 can be rewritten as H 0 : µ 1 - µ 2 ≤ 0 H A : µ 1 > µ 2 can be rewritten as H A : µ 1 - µ 2 > 0 Now select the appropriate output (circled) p < 0.05; reject H 0 and conclude that smokers have higher carboxyhemoglobin levels than non-smokers To ease interpretation, rearrange the hypotheses H 0 : µ 1 ≤ µ 2 can be rewritten as H 0 : µ 1 - µ 2 ≤ 0 H A : µ 1 > µ 2 can be rewritten as H A : µ 1 - µ 2 > 0 Now select the appropriate output (circled) p < 0.05; reject H 0 and conclude that smokers have higher carboxyhemoglobin levels than non-smokers Smokers: Number of obs = 75 Non-smokers: Number of obs = Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] Smokers | Non-smok | diff | Degrees of freedom: ??? Ho: mean(smoker) - mean(non-smoker) = diff = 0 Ha: diff 0 t = t = t = P |t| = P > t =

Question #11 Based on the output, what is the decision? A) Reject H 0 ; The carboxyhemoglobin levels are higher among non-smokers B) Do not reject H 0 ; The carboxyhemoglobin levels are equal C) Do not reject H 0 ; The carboxyhemoglobin levels are lower among non- smokers D) Reject H 0 ; The carboxyhemoglobin levels are higher among smokers Based on the output, what is the decision? A) Reject H 0 ; The carboxyhemoglobin levels are higher among non-smokers B) Do not reject H 0 ; The carboxyhemoglobin levels are equal C) Do not reject H 0 ; The carboxyhemoglobin levels are lower among non- smokers D) Reject H 0 ; The carboxyhemoglobin levels are higher among smokers Smokers: Number of obs = 75 Non-smokers: Number of obs = Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] Smokers | Non-smok | diff | Degrees of freedom: ??? Ho: mean(smoker) - mean(non-smoker) = diff = 0 Ha: diff 0 t = t = t = P |t| = P > t =

Question #12 Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable monitored is pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline level. The results of one particular hour based on N=17 samples is a mean of 8.42 and a standard deviation of Interest focuses on whether there is sufficient evidence to conclude that the mean pH level in the water differs from 8.5. The hypotheses are then: A) H 0 : µ = 8.5, H A : µ ≠ 8.5 B) H 0 : µ ≠ 8.5, H A : µ = 8.5 C) H 0 : µ = 0, H A : µ ≠ 0 D) H 0 : µ = 0, H A : µ ≠ 8.5 Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable monitored is pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline level. The results of one particular hour based on N=17 samples is a mean of 8.42 and a standard deviation of Interest focuses on whether there is sufficient evidence to conclude that the mean pH level in the water differs from 8.5. The hypotheses are then: A) H 0 : µ = 8.5, H A : µ ≠ 8.5 B) H 0 : µ ≠ 8.5, H A : µ = 8.5 C) H 0 : µ = 0, H A : µ ≠ 0 D) H 0 : µ = 0, H A : µ ≠ 8.5

Question #12 Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable monitored is pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline level. The results of one particular hour based on N=17 samples is a mean of 8.42 and a standard deviation of Interest focuses on whether there is sufficient evidence to conclude that the mean pH level in the water differs from 8.5. The hypotheses are then: A) H 0 : µ = 8.5, H A : µ ≠ 8.5 B) H 0 : µ ≠ 8.5, H A : µ = 8.5 C) H 0 : µ = 0, H A : µ ≠ 0 D) H 0 : µ = 0, H A : µ ≠ 8.5 Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable monitored is pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline level. The results of one particular hour based on N=17 samples is a mean of 8.42 and a standard deviation of Interest focuses on whether there is sufficient evidence to conclude that the mean pH level in the water differs from 8.5. The hypotheses are then: A) H 0 : µ = 8.5, H A : µ ≠ 8.5 B) H 0 : µ ≠ 8.5, H A : µ = 8.5 C) H 0 : µ = 0, H A : µ ≠ 0 D) H 0 : µ = 0, H A : µ ≠ 8.5

Question #13 Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable monitored is pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline level. The results of one particular hour based on N=17 samples is a mean of 8.42 and a standard deviation of The degrees of freedom for this test is: A) 7 B) 16 C) 8.5 D) Not applicable Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable monitored is pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline level. The results of one particular hour based on N=17 samples is a mean of 8.42 and a standard deviation of The degrees of freedom for this test is: A) 7 B) 16 C) 8.5 D) Not applicable

Question #13 This is a one-sample t-test. *NOTE* Do not be confused by the use of the word “samples” in the set-up…it refers to the number of water samples. df = n – 1 = 17 – 1 = 16 The degrees of freedom for this test is: A) 7 B) 16 C) 8.5 D) Not applicable This is a one-sample t-test. *NOTE* Do not be confused by the use of the word “samples” in the set-up…it refers to the number of water samples. df = n – 1 = 17 – 1 = 16 The degrees of freedom for this test is: A) 7 B) 16 C) 8.5 D) Not applicable

Question #14 Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable monitored is pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline level. The results of one particular hour based on N=17 samples is a mean of 8.42 and a standard deviation of Is there sufficient evidence to conclude that the mean pH level in the water differs from 8.5 at the 0.05 level of significance? A) Yes, because the sample mean was 8.42 B) Yes, since p<0.05 C) No, since p>0.05 D) No, since the sample mean (8.42) is practically equal to 8.5 Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable monitored is pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline level. The results of one particular hour based on N=17 samples is a mean of 8.42 and a standard deviation of Is there sufficient evidence to conclude that the mean pH level in the water differs from 8.5 at the 0.05 level of significance? A) Yes, because the sample mean was 8.42 B) Yes, since p<0.05 C) No, since p>0.05 D) No, since the sample mean (8.42) is practically equal to 8.5

Question #14 Carry out the one-sample, two-sided t-test at alpha=0.05: _ t = (x - µ) / (s / √n) = ( ) / (0.16/√17) = ; 16 df; 2(.025)<p<2(.05) p>0.05; do not reject H 0 In STATA: ttesti p=0.0559; do not reject H 0 Is there sufficient evidence to conclude that the mean pH level in the water differs from 8.5 at the 0.05 level of significance? A) Yes, because the sample mean was 8.42 B) Yes, since p<0.05 C) No, since p>0.05 D) No, since the sample mean (8.42) is practically equal to 8.5 Carry out the one-sample, two-sided t-test at alpha=0.05: _ t = (x - µ) / (s / √n) = ( ) / (0.16/√17) = ; 16 df; 2(.025)<p<2(.05) p>0.05; do not reject H 0 In STATA: ttesti p=0.0559; do not reject H 0 Is there sufficient evidence to conclude that the mean pH level in the water differs from 8.5 at the 0.05 level of significance? A) Yes, because the sample mean was 8.42 B) Yes, since p<0.05 C) No, since p>0.05 D) No, since the sample mean (8.42) is practically equal to 8.5

Question #15 A clinical trial is conducted to compare two treatments for asthma. Asthmatics are split into two groups with one group receiving Treatment A and the other group receiving Treatment B. At the end of the trial, the groups are compared with respect to FEV (forced expiratory volume). Differences found at the end of the trial may be due to: A) Treatment effect differences B) Chance C) Group differences with respect to other variables (e.g. sex, age, race, severity of asthma, etc.) D) All of the above A clinical trial is conducted to compare two treatments for asthma. Asthmatics are split into two groups with one group receiving Treatment A and the other group receiving Treatment B. At the end of the trial, the groups are compared with respect to FEV (forced expiratory volume). Differences found at the end of the trial may be due to: A) Treatment effect differences B) Chance C) Group differences with respect to other variables (e.g. sex, age, race, severity of asthma, etc.) D) All of the above

Question #15 A clinical trial is conducted to compare two treatments for asthma. Asthmatics are split into two groups with one group receiving Treatment A and the other group receiving Treatment B. At the end of the trial, the groups are compared with respect to FEV (forced expiratory volume). Differences found at the end of the trial may be due to: A) Treatment effect differences B) Chance C) Group differences with respect to other variables (e.g. sex, age, race, severity of asthma, etc.) D) All of the above A clinical trial is conducted to compare two treatments for asthma. Asthmatics are split into two groups with one group receiving Treatment A and the other group receiving Treatment B. At the end of the trial, the groups are compared with respect to FEV (forced expiratory volume). Differences found at the end of the trial may be due to: A) Treatment effect differences B) Chance C) Group differences with respect to other variables (e.g. sex, age, race, severity of asthma, etc.) D) All of the above

Question #16 A large study has determined that the diastolic blood pressure among women ages is normally distributed with mean 70 mm Hg and a standard deviation of σ=10 mm Hg. What is the probability that a randomly chosen woman will have diastolic blood pressure lower than 89.6 mm Hg? A) 2.5% B) 5% C) 97.5% D) 95% A large study has determined that the diastolic blood pressure among women ages is normally distributed with mean 70 mm Hg and a standard deviation of σ=10 mm Hg. What is the probability that a randomly chosen woman will have diastolic blood pressure lower than 89.6 mm Hg? A) 2.5% B) 5% C) 97.5% D) 95%

Question #16 Z = X - µ / σ = ( ) / 10 = 1.96 P(x 1.96) = 1 – = Or, in STATA: display normprob(1.96) = What is the probability that a randomly chosen woman will have diastolic blood pressure lower than 89.6 mm Hg? A) 2.5% B) 5% C) 97.5% D) 95% Z = X - µ / σ = ( ) / 10 = 1.96 P(x 1.96) = 1 – = Or, in STATA: display normprob(1.96) = What is the probability that a randomly chosen woman will have diastolic blood pressure lower than 89.6 mm Hg? A) 2.5% B) 5% C) 97.5% D) 95%

Question #17 A large study has determined that the diastolic blood pressure among women ages is normally distributed with mean 70 mm Hg and a standard deviation of σ=10 mm Hg. Suppose that you measure the diastolic blood pressure in n=25 women. What is the sampling distribution of X 25, the mean diastolic blood pressure of the women in the sample? A) Normal, µ xbar =70 mmHg and σ xbar =10 mmHg B) Normal, µ xbar =7 mmHg and σ xbar =10 mmHg C) Normal, µ xbar =70 mmHg and σ xbar =2 mmHg D) Normal, µ xbar =7 mmHg and σ xbar =2 mmHg A large study has determined that the diastolic blood pressure among women ages is normally distributed with mean 70 mm Hg and a standard deviation of σ=10 mm Hg. Suppose that you measure the diastolic blood pressure in n=25 women. What is the sampling distribution of X 25, the mean diastolic blood pressure of the women in the sample? A) Normal, µ xbar =70 mmHg and σ xbar =10 mmHg B) Normal, µ xbar =7 mmHg and σ xbar =10 mmHg C) Normal, µ xbar =70 mmHg and σ xbar =2 mmHg D) Normal, µ xbar =7 mmHg and σ xbar =2 mmHg

Question #17 If the underlying distribution is normal, the sampling distribution is assumed to be normal. µ xbar = µ 0 = 70 mmHg σ xbar = σ 0 / √n = 10 / √25 = 2 mmHg Suppose that you measure the diastolic blood pressure in n=25 women. What is the sampling distribution of X 25, the mean diastolic blood pressure of the women in the sample? A) Normal, µ xbar =70 mmHg and σ xbar =10 mmHg B) Normal, µ xbar =7 mmHg and σ xbar =10 mmHg C) Normal, µ xbar =70 mmHg and σ xbar =2 mmHg D) Normal, µ xbar =7 mmHg and σ xbar =2 mmHg If the underlying distribution is normal, the sampling distribution is assumed to be normal. µ xbar = µ 0 = 70 mmHg σ xbar = σ 0 / √n = 10 / √25 = 2 mmHg Suppose that you measure the diastolic blood pressure in n=25 women. What is the sampling distribution of X 25, the mean diastolic blood pressure of the women in the sample? A) Normal, µ xbar =70 mmHg and σ xbar =10 mmHg B) Normal, µ xbar =7 mmHg and σ xbar =10 mmHg C) Normal, µ xbar =70 mmHg and σ xbar =2 mmHg D) Normal, µ xbar =7 mmHg and σ xbar =2 mmHg

Question #18 A large study has determined that the diastolic blood pressure among women ages is normally distributed with mean 70 mm Hg and a standard deviation of σ=10 mm Hg. What is the lower 5 th percentile of the above distribution (i.e., the point below which, the mean diastolic blood pressures of these 25-women samples will be 5% of the time)? A) mmHg B) mmHg C) mmHg D) mmHg A large study has determined that the diastolic blood pressure among women ages is normally distributed with mean 70 mm Hg and a standard deviation of σ=10 mm Hg. What is the lower 5 th percentile of the above distribution (i.e., the point below which, the mean diastolic blood pressures of these 25-women samples will be 5% of the time)? A) mmHg B) mmHg C) mmHg D) mmHg

Question #18 From Z table, we know that Z= cuts off the lower 5 th percentile. Alternatively, can get this in STATA: display invnorm(.05) Rearrange Z equation and solve for X bar : Z = (X bar - µ xbar ) / σ xbar X bar = (Z)(σ xbar ) + µ xbar = (-1.645)(2) + 70 = What is the lower 5 th percentile of the above distribution (i.e., the point below which, the mean diastolic blood pressures of these 25- women samples will be 5% of the time)? A) mmHg B) mmHg C) mmHg D) mmHg From Z table, we know that Z= cuts off the lower 5 th percentile. Alternatively, can get this in STATA: display invnorm(.05) Rearrange Z equation and solve for X bar : Z = (X bar - µ xbar ) / σ xbar X bar = (Z)(σ xbar ) + µ xbar = (-1.645)(2) + 70 = What is the lower 5 th percentile of the above distribution (i.e., the point below which, the mean diastolic blood pressures of these 25- women samples will be 5% of the time)? A) mmHg B) mmHg C) mmHg D) mmHg

Question #19 Consider the following: a hypothesis test was performed. The resulting p-value was higher than the pre-specified alpha and thus there was a failure to reject the null hypothesis. Which of the following are correct? A) The null hypothesis must be true B) An evaluation of power and type II error may be warranted C) It is still feasible that the alternative hypothesis is true D) Both B and C Consider the following: a hypothesis test was performed. The resulting p-value was higher than the pre-specified alpha and thus there was a failure to reject the null hypothesis. Which of the following are correct? A) The null hypothesis must be true B) An evaluation of power and type II error may be warranted C) It is still feasible that the alternative hypothesis is true D) Both B and C

Question #19 We haven’t proven the null by failing to reject it. It is still possible that the alternative is actually true. Recall that power (1-β) is the probability of rejecting the null when it is false. β is the probability of making a Type II error (failing to reject the null when it is false). In this case it is possible that statistical power was too low, leading to a Type II error where we have failed to reject H 0 when it is actually false. An evaluation is appropriate. A) The null hypothesis must be true B) An evaluation of power and type II error may be warranted C) It is still feasible that the alternative hypothesis is true D) Both B and C We haven’t proven the null by failing to reject it. It is still possible that the alternative is actually true. Recall that power (1-β) is the probability of rejecting the null when it is false. β is the probability of making a Type II error (failing to reject the null when it is false). In this case it is possible that statistical power was too low, leading to a Type II error where we have failed to reject H 0 when it is actually false. An evaluation is appropriate. A) The null hypothesis must be true B) An evaluation of power and type II error may be warranted C) It is still feasible that the alternative hypothesis is true D) Both B and C

Question #20 The mean cholesterol level of healthy year old males in the US is 211 mg/mL, with an unknown standard deviation. The average cholesterol level among 16 hypertensive males was 232 mg/mL. Conduct a statistical test at the 10% level about whether hypertension is associated with higher cholesterol levels (consult the STATA output below). Assuming that the α -level of the test is 10%, what can you conclude? The mean cholesterol level of healthy year old males in the US is 211 mg/mL, with an unknown standard deviation. The average cholesterol level among 16 hypertensive males was 232 mg/mL. Conduct a statistical test at the 10% level about whether hypertension is associated with higher cholesterol levels (consult the STATA output below). Assuming that the α -level of the test is 10%, what can you conclude? Number of obs = Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] x | Degrees of freedom: 15 Ho: mean(x) = 211 Ha: mean 211 t = t = t = P |t| = P > t =

Question #20 This is a one-sample, one-sided t-test. Set up the hypotheses and select relevant output: H 0 : µ ≤ 211 H A : µ > 211 The alpha level is The p-value is p<.10; reject H 0 ; conclude that hypertensive males have higher cholesterol levels than the healthy population of US males This is a one-sample, one-sided t-test. Set up the hypotheses and select relevant output: H 0 : µ ≤ 211 H A : µ > 211 The alpha level is The p-value is p<.10; reject H 0 ; conclude that hypertensive males have higher cholesterol levels than the healthy population of US males Number of obs = Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] x | Degrees of freedom: 15 Ho: mean(x) = 211 Ha: mean 211 t = t = t = P |t| = P > t =

Question #20 Assuming that the α -level of the test is 10%, what can you conclude? A) The p-value of the test is and thus we fail to reject the null hypothesis and thus conclude that hypertensives have a higher cholesterol level. B) The p-value of the test is and thus we fail to reject the null hypothesis and thus conclude that hypertensives have a lower cholesterol level. C) The p-value of the test is and thus we fail to reject the null hypothesis and conclude that hypertension is not associated with higher cholesterol levels. D) The p-value of the test is and thus we reject the null hypothesis and conclude that the mean cholesterol level for hypertensives is higher than healthy year old US males. Assuming that the α -level of the test is 10%, what can you conclude? A) The p-value of the test is and thus we fail to reject the null hypothesis and thus conclude that hypertensives have a higher cholesterol level. B) The p-value of the test is and thus we fail to reject the null hypothesis and thus conclude that hypertensives have a lower cholesterol level. C) The p-value of the test is and thus we fail to reject the null hypothesis and conclude that hypertension is not associated with higher cholesterol levels. D) The p-value of the test is and thus we reject the null hypothesis and conclude that the mean cholesterol level for hypertensives is higher than healthy year old US males. Number of obs = Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] x | Degrees of freedom: 15 Ho: mean(x) = 211 Ha: mean 211 t = t = t = P |t| = P > t =

Question #21 Ten infants were involved in a study to compare the effectiveness of two medications for the treatment of diaper rash. For each baby, two areas of approximately the same size and rash severity were selected, and one area was treated with medication A and the other area was treated with medication B. The number of hours for the rash to disappear was recorded for each medication and each infant. The question of interest: is there sufficient evidence to conclude that a difference exists in the mean time required for elimination of the rash? The appropriate statistical methodology to use is: A) A 2-sample (independent) t-test B) A z-test C) A paired t-test D) A sensitivity analysis Ten infants were involved in a study to compare the effectiveness of two medications for the treatment of diaper rash. For each baby, two areas of approximately the same size and rash severity were selected, and one area was treated with medication A and the other area was treated with medication B. The number of hours for the rash to disappear was recorded for each medication and each infant. The question of interest: is there sufficient evidence to conclude that a difference exists in the mean time required for elimination of the rash? The appropriate statistical methodology to use is: A) A 2-sample (independent) t-test B) A z-test C) A paired t-test D) A sensitivity analysis

Question #21 Medication A and Medication B are “matched” to the same infant. The samples are not independent. We were not given a known standard deviation σ. A sensitivity analysis is typically a sub-analysis that is performed, changing model parameters to see if the results remain consistent. This concept has not been covered in class and is not relevant to this question. This study calls for a paired t-test. A) A 2-sample (independent) t-test B) A z-test C) A paired t-test D) A sensitivity analysis Medication A and Medication B are “matched” to the same infant. The samples are not independent. We were not given a known standard deviation σ. A sensitivity analysis is typically a sub-analysis that is performed, changing model parameters to see if the results remain consistent. This concept has not been covered in class and is not relevant to this question. This study calls for a paired t-test. A) A 2-sample (independent) t-test B) A z-test C) A paired t-test D) A sensitivity analysis

GOOD LUCK!