STAT E-102 Midterm Review March 14, 2007
Review Topics—Class 1 Ch. 1, 2 Populations and samples Parameters (usually unknown) and statistics Types of data i.e. nominal, ordinal, discrete, continuous Data Summaries Graphs (bar charts, histograms, box plots. . .) Frequency Tables Histogram—used with continuous or ordinal data on a horizontal axis (ex. # of car accidents per year) Bar chart—similar to histograms although there is no order to the horizontal axis and thus are used with nominal data (ex. Profits of three cereal companies in 2006) Box plots—used with continuous data, display median, quartiles, 1.5 times the interquartile range, and extreme values (outliers) Other graphs—scatter plots, line graphs. . .
Review Topics—Class 2 Ch. 6.1-6.4 Probability Intersection, union, complement, null event, mutually exclusive, independent P(A U B) = P(A) + P(B) – P(A ∩ B) Conditional Probability P(B|A) = P(A ∩ B) / P(A) Sensitivity, specificity, predictive values, p-value Bayes’ Theorem If a trial (experiment) is independently repeated a large number of times (N) with outcome, A, occurring n times then the probability that A will occur on the next trial is P(A)=n/N Mutually exclusive--P(A ∩ B) = 0 Independent--P(B|A) = P(B) and P(A ∩ B) = P(A) * P(B) False Positive Rate = 1-specificity, False Negative Rate = 1-sensitivity Positive Predictive Value = P(D+|T+) = P(D+)P(T+|D+)/[P(D+) P(T+|D+) + P(D-)P(T+|D-)] Negative Predictive Value = P(D-|T-) = P(D-)P(T-|D-)/[P(D-) P(T-|D-) + P(D+)P(T-|D+)] Where P(D+) is the prevalence of the disease in the population
Practice Test Question #2 Cholesterol (mg/dl) Probability 80-119 0.012 120-159 0.141 160-199 0.414 200-239 0.280 240-249 0.108 280-319 0.032 320-359 0.008 360-399 0.005 The probabilities that a 25- to 34-year old U.S. male’s cholesterol level belongs to one of the following intervals are listed. What is the probability that a male from this population has cholesterol <200 mg/dl? A) 0.414 B) 0.567 C) 0.847 D) 0.280
Practice Test Question #2 The probabilities that a 25- to 34-year old U.S. male’s cholesterol level belongs to one of the following intervals are listed. What is the probability that a male from this population has cholesterol <200 mg/dl? A) 0.414 B) 0.567 C) 0.847 D) 0.280 Cholesterol (mg/dl) Probability 80-119 0.012 120-159 0.141 160-199 0.414 200-239 0.280 240-249 0.108 280-319 0.032 320-359 0.008 360-399 0.005
Practice Test Question #2 These categories are mutually exclusive. P(<200) =P(80-119)+P(120-159)+P(160-199) =0.012+0.141+0.414 =0.567 What is the probability that a male from this population has cholesterol <200 mg/dl? A) 0.414 B) 0.567 C) 0.847 D) 0.280
Practice Test Question #3 Cholesterol (mg/dl) Probability 80-119 0.012 120-159 0.141 160-199 0.414 200-239 0.280 240-249 0.108 280-319 0.032 320-359 0.008 360-399 0.005 The probabilities that a 25- to 34-year old U.S. male’s cholesterol level belongs to one of the following intervals are listed. Given that a person from this population has serum cholesterol level <240 mg/dL, compute the conditional probability that he will have cholesterol <200 mg/dL. A) 0.669 B) 0.567 C) 0.331 D) 0.280
Practice Test Question #3 Cholesterol (mg/dl) Probability 80-119 0.012 120-159 0.141 160-199 0.414 200-239 0.280 240-249 0.108 280-319 0.032 320-359 0.008 360-399 0.005 A=cholesterol <240 B=cholesterol <200 P(B|A) = P(A ∩ B) / P(A) P(A ∩ B) = P(<240 & <200) = 0.567 (calc in #3) P(A) = P(<240) = .012+.141+.414+.280 = 0.847 P(B|A) = 0.567 / 0.847 = 0.669 A) 0.669 B) 0.567 C) 0.331 D) 0.280
Review Topics—Class 2 Ch. 6.1-6.4 Evaluating a Screening Test Sensitivity = P(T+|D+), Specificity = P(T-|D-) False Positive Rate = 1 – Specificity False Negative Rate = 1 - Sensitivity Positive Predictive Value = Negative Predictive Value = ROC Curves plots of sensitivity vs. specificity If a trial (experiment) is independently repeated a large number of times (N) with outcome, A, occurring n times then the probability that A will occur on the next trial is P(A)=n/N Mutually exclusive--P(A ∩ B) = 0 Independent--P(B|A) = P(B) and P(A ∩ B) = P(A) * P(B) False Positive Rate = 1-specificity, False Negative Rate = 1-sensitivity Positive Predictive Value = P(D+|T+) = P(D+)P(T+|D+)/[P(D+) P(T+|D+) + P(D-)P(T+|D-)] Negative Predictive Value = P(D-|T-) = P(D-)P(T-|D-)/[P(D-) P(T-|D-) + P(D+)P(T-|D+)] Where P(D+) is the prevalence of the disease in the population
Practice Test Question #4 In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. What is the sensitivity and specificity of this test? A) 0.734 and 0.918 B) 0.918 and 0.734 C) 0.833 and 0.813 D) 0.813 and 0.833 + Diagnosis - Diagnosis Total Disease 50 10 60 No Disease 103 449 552 153 459 612
Practice Test Question #4 Sensitivity = P(T+|D+) = 50/60 = 0.833 Specificity = P(T-|D-) = 449/552 = 0.813 + Diagnosis - Diagnosis Total Disease 50 10 60 No Disease 103 449 552 153 459 612
Practice Test Question #4 In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. What is the sensitivity and specificity of this test? A) 0.734 and 0.918 B) 0.918 and 0.734 C) 0.833 and 0.813 D) 0.813 and 0.833 + Diagnosis - Diagnosis Total Disease 50 10 60 No Disease 103 449 552 153 459 612
Practice Test Question #5 In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. What is the probability that a subject has the disease given that the test was positive? A) 19% B) 81% C) 32% D) 68% + Diagnosis - Diagnosis Total Disease 50 10 60 No Disease 103 449 552 153 459 612
Practice Test Question #5 Prevalence = 0.05 Positive Predictive Value = P(D+|T+) Bayes’ Theorem: P(D+|T+) = P(D+)P(T+|D+) / [P(D+)P(T+|D+) + P(D-)P(T+|D-)] = (Prev)(Sens) / (Prev)(Sens) + (1-Prev)(1-Spec) = (0.05)(0.833) / (0.05)(0.833) + (0.95)(0.187) = (0.042) / (0.042+0.178) = 0.191 + Diagnosis - Diagnosis Total Disease 50 10 60 No Disease 103 449 552 153 459 612
Practice Test Question #5 In the following table, the performance of a diagnostic test is summarized. Given that the prevalence of this condition in the overall population is 5%, please answer the following questions. What is the probability that a subject has the disease given that the test was positive? A) 19% B) 81% C) 32% D) 68% + Diagnosis - Diagnosis Total Disease 50 10 60 No Disease 103 449 552 153 459 612
Review Topics—Class 3 Ch. 3.1, 3.2, 3.3, 3.5, 7.1, 7.4 Summary Statistics Central tendency: mean , median, mode Variability: variance , standard deviation, range, interquartile range Variable, random variable, continuous random variable Probability Distribution Describes the behavior of a random variable Normal distribution—N(μ, σ) Standard Normal Distribution—N(0, 1) Z-statistic Median—good for skewed data (distributions with a long tail) Mode—good for ordinal or nominal data (categorical) Variable—a characteristic that can be measured, categorized, quantified, or qualified Random variable—a variable whose value is determined by a random event (not by study design!) Continuous random variable—a random variable that can be any value within a specified interval or space (ex. weight) Probability distribution—identifies possible values of the random variable and provides information about the probability that these values (or ranges of values) will occur Normal Distribution– μ controls the location and σ controls the chape
Practice Test Question #15 A large study has determined that the diastolic blood pressure among women ages 18-74 is normally distributed with mean 70 mm Hg and a standard deviation of σ=10 mm Hg. What is the probability that a randomly chosen woman will have diastolic blood pressure lower than 89.6 mm Hg? A) 2.5% B) 5% C) 97.5% D) 95%
Practice Test Question #15 Z = X - µ / σ = (89.6-70) / 10 = 1.96 P(x<89.6) = P(z<1.96) = 1 – P(z>1.96) = 1 – 0.025 = 0.975 Or, in STATA: display normprob(1.96) What is the probability that a randomly chosen woman will have diastolic blood pressure lower than 89.6 mm Hg? A) 2.5% B) 5% C) 97.5% D) 95%
Review Topics—Class 3 Ch. 8.1-8.3 Central Limit Theorem Take many samples of size n from a population The sample means, , create a sampling distribution The standard deviation of the sample means is , which is called the standard error of the mean When n is large the sampling distribution is approximately normal and Remember, the mean of the sampling distribution is the same as the mean of the population
Practice Test Question #16 A large study has determined that the diastolic blood pressure among women ages 18-74 is normally distributed with mean 70 mm Hg and a standard deviation of σ=10 mm Hg. Suppose that you measure the diastolic blood pressure in n=25 women. What is the sampling distribution of X25, the mean diastolic blood pressure of the women in the sample? A) Normal, µxbar=70 mmHg and σxbar=10 mmHg B) Normal, µxbar=7 mmHg and σxbar=10 mmHg C) Normal, µxbar=70 mmHg and σxbar=2 mmHg D) Normal, µxbar=7 mmHg and σxbar=2 mmHg
Practice Test Question #16 If the underlying distribution is normal, the sampling distribution is assumed to be normal. µxbar = µ0 = 70 mmHg σxbar = σ0 / √n = 10 / √25 = 2 mmHg Suppose that you measure the diastolic blood pressure in n=25 women. What is the sampling distribution of X25, the mean diastolic blood pressure of the women in the sample? A) Normal, µxbar=70 mmHg and σxbar=10 mmHg B) Normal, µxbar=7 mmHg and σxbar=10 mmHg C) Normal, µxbar=70 mmHg and σxbar=2 mmHg D) Normal, µxbar=7 mmHg and σxbar=2 mmHg
Practice Test Question #17 A large study has determined that the diastolic blood pressure among women ages 18-74 is normally distributed with mean 70 mm Hg and a standard deviation of σ=10 mm Hg. What is the lower 5th percentile of the above distribution (i.e., the point below which, the mean diastolic blood pressures of these 25-women samples will be 5% of the time)? A) 66.08 mmHg B) 66.71 mmHg C) 50.40 mmHg D) 27.55 mmHg
Practice Test Question #17 From Z table, we know that Z=-1.645 cuts off the lower 5th percentile. Alternatively, can get this in STATA: display invnorm(.05) Rearrange Z equation and solve for Xbar: Z = (Xbar - µxbar) / σxbar Xbar = (Z)(σxbar) + µxbar = (-1.645)(2) + 70 = 66.71 What is the lower 5th percentile of the above distribution (i.e., the point below which, the mean diastolic blood pressures of these 25-women samples will be 5% of the time)? A) 66.08 mmHg B) 66.71 mmHg C) 50.40 mmHg D) 27.55 mmHg
Review Topics—Class 4 Ch. 10.1-10.3 Statistical Inference Hypothesis Testing Null Hypothesis : No effect or no difference Alternative Hypothesis : An effect or difference what you are trying to prove Court Trial Example P-values Given that is true, the probability that of observing a result as or more extreme than the one observed Statistical Inference—(hypothesis testing and estimation—point and interval) inferences about the population (parameters) made based on information from the sample (statistics) Point Estimation—sample mean Interval Estimation—Confidence intervals Court Trial Example—Ho—innocent, H1—guilty, assumed innocent until proven guilty. Can prove guilt (H1) but cannot prove innocence (Ho), instead can only prove not guilty (not H1). In other words, we fail to reject the possibility of innocence. Hypotheses are set BEFORE the study/test is performed P-value is found using the sample—when p-value is small we reject Ho—not very informative by itself
Practice Test Question #6 A small p-value for a hypothesis test (i.e., <0.05) signifies which of the following? A) The conditional probability that the null hypothesis is true is <0.05 B) The probability that the alternative hypothesis is true is >0.95 C) The conditional probability of the data being this extreme if the null hypothesis was true is <0.05 D) The conditional probability of the data being this extreme if the alternative hypothesis is true is >0.05
Practice Test Question #6 A small p-value for a hypothesis test (i.e., <0.05) signifies which of the following? A) The conditional probability that the null hypothesis is true is <0.05 B) The probability that the alternative hypothesis is true is >0.95 C) The conditional probability of the data being this extreme if the null hypothesis was true is <0.05 D) The conditional probability of the data being this extreme if the alternative hypothesis is true is >0.05
Review Topics—Class 4 Ch. 10.4, 10.5 Type I Error (α) P(rejecting Ho|Ho is true) Probability of rejecting when we should not Type II Error (β) Power = 1- β P(rejecting Ho| is true) Probability of rejecting when we should Hypothesis Testing Steps 1) State Ho 2)State 3)Determine α 4)Determine the test statistic and associated p-value 5)Determine whether to reject or fail to reject Ho Alpha is set before the test is run and is completely up to you (not random!) Beta is the probability of not rejecting the null hypothesis when we should (when H1 is true) Alpha and beta are inversely related
Practice Test Question #18 Consider the following: a hypothesis test was performed. The resulting p-value was higher than the pre-specified alpha and thus there was a failure to reject the null hypothesis. Which of the following are correct? A) The null hypothesis must be true B) An evaluation of power and type II error may be warranted C) It is still feasible that the alternative hypothesis is true D) Both B and C
Practice Test Question #18 We haven’t proven the null by failing to reject it. It is still possible that the alternative is actually true. Recall that power (1-β) is the probability of rejecting the null when it is false. β is the probability of making a Type II error (failing to reject the null when it is false). In this case it is possible that statistical power was too low, leading to a Type II error where we have failed to reject H0 when it is actually false. An evaluation is appropriate. A) The null hypothesis must be true B) An evaluation of power and type II error may be warranted C) It is still feasible that the alternative hypothesis is true D) Both B and C
Review Topics—Class 5 Ch. 9.1, 9.2 Confidence Intervals Range of values associated with a parameter Calculated using the sample data Will cover the true parameter a specified probability of the time (ex. 95%) Understand what affects confidence interval width Sample size, higher confidence (i.e. 99%), larger sample variability 1-sided (-∞, upper bound) used when (lower bound, ∞) used when 2-sided—(lower bound, upper bound) used when Confidence intervals are a type of statistical inference—interval estimation An interval within which we expect the true parameter to be contained Remember, the CI is random and can move around. . .the true parameter is NOT random and does not move More informative than a p-value—provides information on the precision of the estimate (narrow CI are more precise, wide CI are less precise) Provides both a range of values AND is a test for the parameter of interest For a 95% CI the CI will not cover the true parameter 5% of the time
Review Topics—Class 5 Ch. 9.1, 9.2, 9.3, 9.4 Confidence Intervals (cont.) Normal Distribution— Student’s t Distribution— Student’s T-test Used when we believe we have a normal distribution but do not know σ We use s to estimate σ, The t-statistic is calculated by , and has n-1 degrees of freedom CI—Note, for one-sided CI, we only calculate one bound and thus do not divide alpha by 2. REMEMBER to use the t-table to calculate CI for the Student’s t Distribution We gain a degree of freedom for each piece of data we use to estimate our statistic and we loose one degree of freedom by calculating the sample mean and thus have n-1 df.
Practice Test Question #11 Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable monitored is pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline level. The results of one particular hour based on N=17 samples is a mean of 8.42 and a standard deviation of 0.16. Interest focuses on whether there is sufficient evidence to conclude that the mean pH level in the water differs from 8.5. The hypotheses are then: A) H0: µ = 8.5, HA: µ ≠ 8.5 B) H0: µ ≠ 8.5, HA: µ = 8.5 C) H0: µ = 0, HA: µ ≠ 0 D) H0: µ = 0, HA: µ ≠ 8.5
Practice Test Question #11 Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable monitored is pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline level. The results of one particular hour based on N=17 samples is a mean of 8.42 and a standard deviation of 0.16. Interest focuses on whether there is sufficient evidence to conclude that the mean pH level in the water differs from 8.5. The hypotheses are then: A) H0: µ = 8.5, HA: µ ≠ 8.5 B) H0: µ ≠ 8.5, HA: µ = 8.5 C) H0: µ = 0, HA: µ ≠ 0 D) H0: µ = 0, HA: µ ≠ 8.5
Practice Test Question #12 Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable monitored is pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline level. The results of one particular hour based on N=17 samples is a mean of 8.42 and a standard deviation of 0.16. The degrees of freedom for this test is: A) 7 B) 16 C) 8.5 D) Not applicable
Practice Test Question #12 This is a one-sample t-test. *NOTE* Do not be confused by the use of the word “samples” in the set-up…it refers to the number of water samples. df = n – 1 = 17 – 1 = 16 The degrees of freedom for this test is: A) 7 B) 16 C) 8.5 D) Not applicable
Practice Test Question #13 Most water treatment facilities monitor the quality of their drinking water on an hourly basis. One variable monitored is pH, which measures the degree of alkalinity or acidity in the water. A pH below 7.0 is acidic, one above 7.0 is alkaline, and a pH of 7.0 is neutral. One water treatment plant has a target pH of 8.5 as most facilities try to maintain a slightly alkaline level. The results of one particular hour based on N=17 samples is a mean of 8.42 and a standard deviation of 0.16. Is there sufficient evidence to conclude that the mean pH level in the water differs from 8.5 at the 0.05 level of significance? A) Yes, because the sample mean was 8.42 B) Yes, since p<0.05 C) No, since p>0.05 D) No, since the sample mean (8.42) is practically equal to 8.5
Practice Test Question #13 Carry out the one-sample, two-sided t-test at alpha=0.05: _ t = (x - µ) / (s / √n) = (8.42-8.5) / (0.16/√17) = -2.062; 16 df; 2(.025)<p<2(.05) p>0.05; do not reject H0 In STATA: ttesti 17 8.42 0.16 8.5 p=0.0559; do not reject H0 Is there sufficient evidence to conclude that the mean pH level in the water differs from 8.5 at the 0.05 level of significance? A) Yes, because the sample mean was 8.42 B) Yes, since p<0.05 C) No, since p>0.05 D) No, since the sample mean (8.42) is practically equal to 8.5
Practice Test Question #19 The mean cholesterol level of healthy 20-74 year old males in the US is 211 mg/mL, with an unknown standard deviation. The average cholesterol level among 16 hypertensive males was 232 mg/mL. Conduct a statistical test at the 10% level about whether hypertension is associated with higher cholesterol levels (consult the STATA output below). Assuming that the α-level of the test is 10%, what can you conclude? Number of obs = 16 --------------------------------------------------------------------------------------------------- Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] ------------+-------------------------------------------------------------------------------------- x | 232 12 19.3333 0.0000 206.4226 257.5774 ---------------------------------------------------------------------------------------------------- Degrees of freedom: 15 Ho: mean(x) = 211 Ha: mean < 211 Ha: mean ~= 211 Ha: mean > 211 t = 1.7500 t = 1.7500 t = 1.7500 P < t = 0.9497 P > |t| = 0.1005 P > t = 0.0503
Practice Test Question #19 This is a one-sample, one-sided t-test. Set up the hypotheses and select relevant output: H0: µ ≤ 211 HA: µ > 211 The alpha level is 0.10. The p-value is 0.0503. p<.10; reject H0; conclude that hypertensive males have higher cholesterol levels than the healthy population of US males 20-74. Number of obs = 16 --------------------------------------------------------------------------------------------------- Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] ------------+-------------------------------------------------------------------------------------- x | 232 12 19.3333 0.0000 206.4226 257.5774 ---------------------------------------------------------------------------------------------------- Degrees of freedom: 15 Ho: mean(x) = 211 Ha: mean < 211 Ha: mean ~= 211 Ha: mean > 211 t = 1.7500 t = 1.7500 t = 1.7500 P < t = 0.9497 P > |t| = 0.1005 P > t = 0.0503
Practice Test Question #19 Assuming that the α-level of the test is 10%, what can you conclude? A) The p-value of the test is 0.1005 and thus we fail to reject the null hypothesis and thus conclude that hypertensives have a higher cholesterol level. B) The p-value of the test is 0.9497 and thus we fail to reject the null hypothesis and thus conclude that hypertensives have a lower cholesterol level. C) The p-value of the test is 0.0503 and thus we fail to reject the null hypothesis and conclude that hypertension is not associated with higher cholesterol levels. D) The p-value of the test is 0.0503 and thus we reject the null hypothesis and conclude that the mean cholesterol level for hypertensives is higher than healthy 20-74 year old US males. Number of obs = 16 --------------------------------------------------------------------------------------------------- Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] ------------+-------------------------------------------------------------------------------------- x | 232 12 19.3333 0.0000 206.4226 257.5774 ---------------------------------------------------------------------------------------------------- Degrees of freedom: 15 Ho: mean(x) = 211 Ha: mean < 211 Ha: mean ~= 211 Ha: mean > 211 t = 1.7500 t = 1.7500 t = 1.7500 P < t = 0.9497 P > |t| = 0.1005 P > t = 0.0503
Review Topics—Class 6 Ch. 11.2-11.3 Two Sample Tests—Independent Samples Necessary assumptions for T-test The two samples are independent Each sample is approximately normally distributed The variances of the two populations are not significantly different If the assumptions hold then can use with degrees of freedom Where Two Sample Tests—comparison of two means Independent—one set of data tells us nothing about the other set of data Independent samples do not depend on each other (ex. smokers vs. nonsmokers) Again, we get one degree of freedom for each subject we use for our test (n1 + n2) but we loose 1 degree of freedom for each statistic we calculate (1 for each mean) and thus we loose two df. Must use a Sp when calculating the CI
Practice Test Question #1 T-tests are most useful for what type of data (variables)? A) Continuous B) Ordinal C) Nominal D) Binary
Practice Test Question #1 T-tests are used for continuous data. Recall: Ordinal data have natural order without defined magnitude (ex. low, moderate, high) Nominal data have categories with no order or rank (ex. gender, race) Binary data are discrete data with only two possible outcomes (ex. cancer vs. no cancer) A) Continuous B) Ordinal C) Nominal D) Binary
Review Topics—Class 6 Ch. 11.2-11.3 Two Sample Tests—Independent Samples Unequal Variances The other two assumptions must still hold Now must use with v degrees of freedom Where round v down to the nearest integer
Review Topics—Class 6 Ch. 11.1, 11.3 Two Sample Tests—Paired Samples For each observation in group 1 there is a corresponding observation in group 2 First find the difference between the corresponding observations δ is the true difference in population means Thus the hypotheses for a two sided test are The t-statistic is with n-1 degrees of freedom Ex. Alcohol study Find d (the difference between observations) for each subject Can think of a paired t-test as a one-sample t-test based on the differences, d.
Practice Test Question #7 The following output comes from an experiment comparing the levels of carboxyhemoglobin for a group of cigarette smokers of size n1=75 and a group of non-smokers of size n2=121. It is believed that the mean carboxyhemoglobin level of the smokers must be higher than the mean level of non-smokers. Assume that the underlying variances are identical and use alpha=0.05. Smokers: Number of obs = 75 Non-smokers: Number of obs = 121 ------------------------------------------------------------------------------------------------------ Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] --------------+-------------------------------------------------------------------------------------- Smokers | 4.1 .2309401 17.7535 0.0000 3.639842 4.560158 Non-smok | 1.3 .1818182 7.15 0.0000 .9400127 1.659987 diff | 2.8 .2939238 9.52628 0.0000 2.220304 3.379696 Degrees of freedom: ??? Ho: mean(smoker) - mean(non-smoker) = diff = 0 Ha: diff < 0 Ha: diff ~= 0 Ha: diff > 0 t = 9.5263 t = 9.5263 t = 9.5263 P < t = 1.0000 P > |t| = 0.0000 P > t = 0.0000
Practice Test Question #7 What is the proper test of this hypothesis? A) A z test B) A one-sample t test C) A paired t test D) An unpaired t test Smokers: Number of obs = 75 Non-smokers: Number of obs = 121 ------------------------------------------------------------------------------------------------------ Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] --------------+-------------------------------------------------------------------------------------- Smokers | 4.1 .2309401 17.7535 0.0000 3.639842 4.560158 Non-smok | 1.3 .1818182 7.15 0.0000 .9400127 1.659987 diff | 2.8 .2939238 9.52628 0.0000 2.220304 3.379696 Degrees of freedom: ??? Ho: mean(smoker) - mean(non-smoker) = diff = 0 Ha: diff < 0 Ha: diff ~= 0 Ha: diff > 0 t = 9.5263 t = 9.5263 t = 9.5263 P < t = 1.0000 P > |t| = 0.0000 P > t = 0.0000
Practice Test Question #7 What is the proper test of this hypothesis? A) A z test B) A one-sample t test C) A paired t test D) An unpaired t test Smokers: Number of obs = 75 Non-smokers: Number of obs = 121 ------------------------------------------------------------------------------------------------------ Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] --------------+-------------------------------------------------------------------------------------- Smokers | 4.1 .2309401 17.7535 0.0000 3.639842 4.560158 Non-smok | 1.3 .1818182 7.15 0.0000 .9400127 1.659987 diff | 2.8 .2939238 9.52628 0.0000 2.220304 3.379696 Degrees of freedom: ??? Ho: mean(smoker) - mean(non-smoker) = diff = 0 Ha: diff < 0 Ha: diff ~= 0 Ha: diff > 0 t = 9.5263 t = 9.5263 t = 9.5263 P < t = 1.0000 P > |t| = 0.0000 P > t = 0.0000
Practice Test Question #8 The following output comes from an experiment comparing the levels of carboxyhemoglobin for a group of cigarette smokers of size n1=75 and a group of non-smokers of size n2=121. It is believed that the mean carboxyhemoglobin level of the smokers must be higher than the mean level of non-smokers. Assume that the underlying variances are identical and use alpha=0.05. What are the null and alternative hypotheses for this test? µ1 is the mean carboxyhemoglobin level among smokers, and µ2 is the mean level among non-smokers. A) H0: µ1 = µ2 , HA: µ1 ≠ µ2 B) H0: µ1 ≤ µ2 , HA: µ1 ≠ µ2 C) H0: µ1 ≤ µ2 , HA: µ 1> µ2 D) H0: µ1 ≥ µ2 , HA: µ1 < µ2
Practice Test Question #8 The following output comes from an experiment comparing the levels of carboxyhemoglobin for a group of cigarette smokers of size n1=75 and a group of non-smokers of size n2=121. It is believed that the mean carboxyhemoglobin level of the smokers must be higher than the mean level of non-smokers. Assume that the underlying variances are identical and use alpha=0.05. What are the null and alternative hypotheses for this test? µ1 is the mean carboxyhemoglobin level among smokers, and µ2 is the mean level among non-smokers. A) H0: µ1 = µ2 , HA: µ1 ≠ µ2 B) H0: µ1 ≤ µ2 , HA: µ1 ≠ µ2 C) H0: µ1 ≤ µ2 , HA: µ 1> µ2 D) H0: µ1 ≥ µ2 , HA: µ1 < µ2
Practice Test Question #9 The following output comes from an experiment comparing the levels of carboxyhemoglobin for a group of cigarette smokers of size n1=75 and a group of non-smokers of size n2=121. It is believed that the mean carboxyhemoglobin level of the smokers must be higher than the mean level of non-smokers. Assume that the underlying variances are identical and use alpha=0.05. What is the number of the degrees of freedom associated with this test? A) df = n1 – 1 = 74 B) df = n2 – 1 = 121 C) df = n1 + n2 – 1 = 195 D) df = n1 + n2 – 2 = 194
Practice Test Question #9 This is a t-test of two independent samples, therefore: df = n1 + n2 – 2 =75 + 121 – 2 = 194 What is the number of the degrees of freedom associated with this test? A) df = n1 – 1 = 74 B) df = n2 – 1 = 121 C) df = n1 + n2 – 1 = 195 D) df = n1 + n2 – 2 = 194
Practice Test Question #10 Based on the output, what is the decision? A) Reject H0; The carboxyhemoglobin levels are higher among non-smokers B) Do not reject H0; The carboxyhemoglobin levels are equal C) Do not reject H0; The carboxyhemoglobin levels are lower among non-smokers D) Reject H0; The carboxyhemoglobin levels are higher among smokers Smokers: Number of obs = 75 Non-smokers: Number of obs = 121 ------------------------------------------------------------------------------------------------------ Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] --------------+-------------------------------------------------------------------------------------- Smokers | 4.1 .2309401 17.7535 0.0000 3.639842 4.560158 Non-smok | 1.3 .1818182 7.15 0.0000 .9400127 1.659987 diff | 2.8 .2939238 9.52628 0.0000 2.220304 3.379696 Degrees of freedom: ??? Ho: mean(smoker) - mean(non-smoker) = diff = 0 Ha: diff < 0 Ha: diff ~= 0 Ha: diff > 0 t = 9.5263 t = 9.5263 t = 9.5263 P < t = 1.0000 P > |t| = 0.0000 P > t = 0.0000
Practice Test Question #10 To ease interpretation, rearrange the hypotheses H0: µ1 ≤ µ2 can be rewritten as H0: µ1 - µ2 ≤ 0 HA: µ1 > µ2 can be rewritten as HA: µ1 - µ2 > 0 Now select the appropriate output (circled) p < 0.05; reject H0 and conclude that smokers have higher carboxyhemoglobin levels than non-smokers Smokers: Number of obs = 75 Non-smokers: Number of obs = 121 ------------------------------------------------------------------------------------------------------ Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] --------------+-------------------------------------------------------------------------------------- Smokers | 4.1 .2309401 17.7535 0.0000 3.639842 4.560158 Non-smok | 1.3 .1818182 7.15 0.0000 .9400127 1.659987 diff | 2.8 .2939238 9.52628 0.0000 2.220304 3.379696 Degrees of freedom: ??? Ho: mean(smoker) - mean(non-smoker) = diff = 0 Ha: diff < 0 Ha: diff ~= 0 Ha: diff > 0 t = 9.5263 t = 9.5263 t = 9.5263 P < t = 1.0000 P > |t| = 0.0000 P > t = 0.0000
Practice Test Question #10 Based on the output, what is the decision? A) Reject H0; The carboxyhemoglobin levels are higher among non-smokers B) Do not reject H0; The carboxyhemoglobin levels are equal C) Do not reject H0; The carboxyhemoglobin levels are lower among non-smokers D) Reject H0; The carboxyhemoglobin levels are higher among smokers Smokers: Number of obs = 75 Non-smokers: Number of obs = 121 ------------------------------------------------------------------------------------------------------ Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] --------------+-------------------------------------------------------------------------------------- Smokers | 4.1 .2309401 17.7535 0.0000 3.639842 4.560158 Non-smok | 1.3 .1818182 7.15 0.0000 .9400127 1.659987 diff | 2.8 .2939238 9.52628 0.0000 2.220304 3.379696 Degrees of freedom: ??? Ho: mean(smoker) - mean(non-smoker) = diff = 0 Ha: diff < 0 Ha: diff ~= 0 Ha: diff > 0 t = 9.5263 t = 9.5263 t = 9.5263 P < t = 1.0000 P > |t| = 0.0000 P > t = 0.0000
Practice Test Question #20 Ten infants were involved in a study to compare the effectiveness of two medications for the treatment of diaper rash. For each baby, two areas of approximately the same size and rash severity were selected, and one area was treated with medication A and the other area was treated with medication B. The number of hours for the rash to disappear was recorded for each medication and each infant. The question of interest: is there sufficient evidence to conclude that a difference exists in the mean time required for elimination of the rash? The appropriate statistical methodology to use is: A) A 2-sample (independent) t-test B) A z-test C) A paired t-test D) A sensitivity analysis
Practice Test Question #20 Medication A and Medication B are “matched” to the same infant. The samples are not independent. We were not given a known standard deviation σ. A sensitivity analysis is typically a sub-analysis that is performed, changing model parameters to see if the results remain consistent. This concept has not been covered in class and is not relevant to this question. This study calls for a paired t-test. A) A 2-sample (independent) t-test B) A z-test C) A paired t-test D) A sensitivity analysis
Review Topics—Class 6 Ch. 13.1-13.5 Nonparametric Necessary when the data is not normally distributed We rank the data rather than using the raw data Be able to interpret STATA output Did not have homework on this When we think that no distribution fits the data well we often use nonparametric tests to analyze our data Examples of data—skewed, bimodal, ordinal Advantages—does not rely on a distribution Disadvantages—may be less powerful if the data could be fit to a distribution Often use the median
Practice Test Question #14 A clinical trial is conducted to compare two treatments for asthma. Asthmatics are split into two groups with one group receiving Treatment A and the other group receiving Treatment B. At the end of the trial, the groups are compared with respect to FEV (forced expiratory volume). Differences found at the end of the trial may be due to: A) Treatment effect differences B) Chance C) Group differences with respect to other variables (e.g. sex, age, race, severity of asthma, etc.) D) All of the above
Practice Test Question #14 A clinical trial is conducted to compare two treatments for asthma. Asthmatics are split into two groups with one group receiving Treatment A and the other group receiving Treatment B. At the end of the trial, the groups are compared with respect to FEV (forced expiratory volume). Differences found at the end of the trial may be due to: A) Treatment effect differences B) Chance C) Group differences with respect to other variables (e.g. sex, age, race, severity of asthma, etc.) D) All of the above
GOOD LUCK!