Inferences Based on Two Samples

Slides:

Advertisements

Similar presentations

Request Dispatching for Cheap Energy Prices in Cloud Data Centers

Advertisements

SpringerLink Training Kit

Luminosity measurements at Hadron Colliders

From Word Embeddings To Document Distances

Choosing a Dental Plan Student Name

Virtual Environments and Computer Graphics

Chương 1: CÁC PHƯƠNG THỨC GIAO DỊCH TRÊN THỊ TRƯỜNG THẾ GIỚI

THỰC TIỄN KINH DOANH TRONG CỘNG ĐỒNG KINH TẾ ASEAN –

D. Phát triển thương hiệu

NHỮNG VẤN ĐỀ NỔI BẬT CỦA NỀN KINH TẾ VIỆT NAM GIAI ĐOẠN

Điều trị chống huyết khối trong tai biến mạch máu não

BÖnh Parkinson PGS.TS.BS NGUYỄN TRỌNG HƯNG BỆNH VIỆN LÃO KHOA TRUNG ƯƠNG TRƯỜNG ĐẠI HỌC Y HÀ NỘI Bác Ninh 2013.

Nasal Cannula X particulate mask

Evolving Architecture for Beyond the Standard Model

HF NOISE FILTERS PERFORMANCE

Electronics for Pedestrians – Passive Components –

Parameterization of Tabulated BRDFs Ian Mallett (me), Cem Yuksel

L-Systems and Affine Transformations

CMSC423: Bioinformatic Algorithms, Databases and Tools

Some aspect concerning the LMDZ dynamical core and its use

Bayesian Confidence Limits and Intervals

实习总结（Internship Summary)

Current State of Japanese Economy under Negative Interest Rate and Proposed Remedies Naoyuki Yoshino Dean Asian Development Bank Institute Professor Emeritus,

Front End Electronics for SOI Monolithic Pixel Sensor

Face Recognition Monday, February 1, 2016.

Solving Rubik's Cube By: Etai Nativ.

CS284 Paper Presentation Arpad Kovacs

انتقال حرارت 2 خانم خسرویار.

Summer Student Program First results

Theoretical Results on Neutrinos

HERMESでのHard Exclusive生成過程による核子内クォーク全角運動量についての研究

Wavelet Coherence & Cross-Wavelet Transform

yaSpMV: Yet Another SpMV Framework on GPUs

Creating Synthetic Microdata for Higher Educational Use in Japan: Reproduction of Distribution Type based on the Descriptive Statistics Kiyomi Shirakawa.

MOCLA02 Design of a Compact L-band Transverse Deflecting Cavity with Arbitrary Polarizations for the SACLA Injector Sep. 14th, 2015 H. Maesaka, T. Asaka,

Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,

Fuel cell development program for electric vehicle

Overview of TST-2 Experiment

Optomechanics with atoms

داده کاوی سئوالات نمونه

Inter-system biases estimation in multi-GNSS relative positioning with GPS and Galileo Cecile Deprez and Rene Warnant University of Liege, Belgium

ლექცია 4 - ფული და ინფლაცია

10. predavanje Novac i financijski sustav

Wissenschaftliche Aussprache zur Dissertation

FLUORECENCE MICROSCOPY SUPERRESOLUTION BLINK MICROSCOPY ON THE BASIS OF ENGINEERED DARK STATES* *Christian Steinhauer, Carsten Forthmann, Jan Vogelsang,

Particle acceleration during the gamma-ray flares of the Crab Nebular

Interpretations of the Derivative Gottfried Wilhelm Leibniz

Advisor: Chiuyuan Chen Student: Shao-Chun Lin

Widow Rockfish Assessment

SiW-ECAL Beam Test 2015 Kick-Off meeting

On Robust Neighbor Discovery in Mobile Wireless Networks

Chapter 6 并发：死锁和饥饿 Operating Systems: Internals and Design Principles

You NEED your book!!! Frequency Distribution

Y V =0 a V =V0 x b b V =0 z

Fairness-oriented Scheduling Support for Multicore Systems

Climate-Energy-Policy Interaction

Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,

Ch48 Statistics by Chtan FYHSKulai

The ABCD matrix for parabolic reflectors and its application to astigmatism free four-mirror cavities.

Measure Twice and Cut Once: Robust Dynamic Voltage Scaling for FPGAs

Online Learning: An Introduction

Factor Based Index of Systemic Stress (FISS)

What is Chemistry? Chemistry is: the study of matter & the changes it undergoes Composition Structure Properties Energy changes.

THE BERRY PHASE OF A BOGOLIUBOV QUASIPARTICLE IN AN ABRIKOSOV VORTEX*

Quantum-classical transition in optical twin beams and experimental applications to quantum metrology Ivano Ruo-Berchera Frascati.

The Toroidal Sporadic Source: Understanding Temporal Variations

FW 3.4: More Circle Practice

ارائه یک روش حل مبتنی بر استراتژی های تکاملی گروه بندی برای حل مسئله بسته بندی اقلام در ظروف

Decision Procedures Christoph M. Wintersteiger 9/11/2017 3:14 PM

Limits on Anomalous WWγ and WWZ Couplings from DØ

Presentation transcript:

Inferences Based on Two Samples 9 Inferences Based on Two Samples Copyright © Cengage Learning. All rights reserved.

Copyright © Cengage Learning. All rights reserved. z Tests and Confidence Intervals for a Difference Between Two Population Means 9.1 Copyright © Cengage Learning. All rights reserved.

z Tests and Confidence Intervals for a Difference Between Two Population Means The inferences discussed in this section concern a difference 1 – 2 between the means of two different population distributions. An investigator might, for example, wish to test hypotheses about the difference between true average breaking strengths of two different types of corrugated fiberboard.

z Tests and Confidence Intervals for a Difference Between Two Population Means One such hypothesis would state that 1 – 2 = 0 that is, that 1 = 2. Alternatively, it may be appropriate to estimate 1 – 2 by computing a 95% CI. Such inferences necessitate obtaining a sample of strength observations for each type of fiberboard.

z Tests and Confidence Intervals for a Difference Between Two Population Means

z Tests and Confidence Intervals for a Difference Between Two Population Means The use of m for the number of observations in the first sample and n for the number of observations in the second sample allows for the two sample sizes to be different. Sometimes this is because it is more difficult or expensive to sample one population than another. In other situations, equal sample sizes may initially be specified, but for reasons beyond the scope of the experiment, the actual sample sizes may differ.

Test Procedures for Normal Populations with Known Variances

Test Procedures for Normal Populations with Known Variances

Example 9.1 Analysis of a random sample consisting of m = 20 specimens of cold-rolled steel to determine yield strengths resulted in a sample average strength of A second random sample of n = 25 two-sided galvanized steel specimens gave a sample average strength of

Example 9.1 cont’d Assuming that the two yield-strength distributions are normal with 1 = 4.0 and 2 = 5.0 (suggested by a graph in the article “Zinc-Coated Sheet Steel: An Overview,” Automotive Engr., Dec. 1984: 39–43), does the data indicate that the corresponding true average yield strengths 1 and 2 are different? Let’s carry out a test at significance level  = 0.1.

Example 9.1 cont’d 1. The parameter of interest is 1 – 2, the difference between the true average strengths for the two types of steel. 2. The null hypothesis is H0 : 1 – 2 = 0 3. The alternative hypothesis is Ha : 1 – 2 ≠ 0 if Ha is true, then 1 and 2 are different. 4. With 0 = 0,the test statistic value is

Example 9.1 5. Substituting m = 20, = 29.8, = 16.0, n = 25, = 34.7 cont’d 5. Substituting m = 20, = 29.8, = 16.0, n = 25, = 34.7 and = 25.0 into the formula for z yields That is, the observed value of is more than 3 standard deviations below what would be expected were H0 true.

Example 9.1 6. The ≠ inequality in 𝐻 𝑎 implies that a two-tailed test is appropriate. The P-value is

Example 9.1 cont’d 7. Since P-value ≈0≤.01=𝛼, 𝐻 𝑎 is therefore rejected at level .01 in favor of the conclusion that 𝜇 1 ≠ 𝜇 2 . In fact, with a P-value this small, the null hypothesis would be rejected at any sensible significance level. The sample data strongly suggests that the true average yield strength for cold-rolled steel differs from that for galvanized steel.

Using a Comparison to Identify Causality

Using a Comparison to Identify Causality Investigators are often interested in comparing either the effects of two different treatments on a response or the response after treatment with the response after no treatment (treatment vs. control). If the individuals or objects to be used in the comparison are not assigned by the investigators to the two different conditions, the study is said to be observational.

Using a Comparison to Identify Causality The difficulty with drawing conclusions based on an observational study is that although statistical analysis may indicate a significant difference in response between the two groups. The difference may be due to some underlying factors that had not been controlled rather than to any difference in treatments.

Example 9.2 A letter in the Journal of the American Medical Association (May 19, 1978) reported that of 215 male physicians who were Harvard graduates and died between November 1974 and October 1977. The 125 in full-time practice lived an average of 48.9 years beyond graduation, whereas the 90 with academic affiliations lived an average of 43.2 years beyond graduation.

Example 9.2 cont’d Does the data suggest that the mean lifetime after graduation for doctors in full-time practice exceeds the mean lifetime for those who have an academic affiliation? (If so, those medical students who say that they are “dying to obtain an academic affiliation” may be closer to the truth than they realize; in other words, is “publish or perish” really “publish and perish”?)

Example 9.2 cont’d Let 1 denote the true average number of years lived beyond graduation for physicians in full-time practice, and let 2 denote the same quantity for physicians with academic affiliations. Assume the 125 and 90 physicians to be random samples from populations 1 and 2, respectively (which may not be reasonable if there is reason to believe that Harvard graduates have special characteristics that differentiate them from all other physicians—in this case inferences would be restricted just to the “Harvard populations”).

Example 9.2 cont’d The letter from which the data was taken gave no information about variances. So for illustration assume that 1 = 14.6 and 2 = 14.4. The hypotheses are H0 = 1 – 2 = 0 versus Ha = 1 – 2 > 0, so 0 is zero.

Example 9.2 cont’d The computed value of the test statistic is

Example 9.2 cont’d The P-value for an upper-tailed test is 1 – F(2.85) = .0022. At significance level .01, H0 is rejected (because  > P-value) in favor of the conclusion that 1 – 2 > 0 (1 > 2). This is consistent with the information reported in the letter.

Example 9.2 cont’d This data resulted from a retrospective observational study; the investigator did not start out by selecting a sample of doctors and assigning some to the “academic affiliation” treatment and the others to the “full-time practice” treatment, but instead identified members of the two groups by looking backward in time (through obituaries!) to past records.

Example 9.2 cont’d Can the statistically significant result here really be attributed to a difference in the type of medical practice after graduation, or is there some other underlying factor (e.g., age at graduation, exercise regimens, etc.) that might also furnish a plausible explanation for the difference? Observational studies have been used to argue for a causal link between smoking and lung cancer.

Example 9.2 cont’d There are many studies that show that the incidence of lung cancer is significantly higher among smokers than among nonsmokers. However, individuals had decided whether to become smokers long before investigators arrived on the scene, and factors in making this decision may have played a causal role in the contraction of lung cancer.

Using a Comparison to Identify Causality A randomized controlled experiment results when investigators assign subjects to the two treatments in a random fashion. When statistical significance is observed in such an experiment, the investigator and other interested parties will have more confidence in the conclusion that the difference in response has been caused by a difference in treatments.

Large-Sample Tests

Large-Sample Tests

Example 9.4 What impact does fast-food consumption have on various dietary and health characteristics? The article “Effects of Fast-Food Consumption on Energy Intake and Diet Quality Among Children in a National Household Study” (Pediatrics, 2004:112–118) reported the accompanying summary data on daily calorie intake both for a sample of teens who said they did not typically eat fast food and another sample of teens who said they did usually eat fast food.

Example 9.4 cont’d Does this data provide strong evidence for concluding that true average calorie intake for teens who typically eat fast food exceeds by more than 200 calories per day the true average intake for those who don’t typically eat fast food? Let’s investigate by carrying out a test of hypotheses at a significance level of approximately .05.

Example 9.4 cont’d The parameter of interest is 1 – 2, where 1 is the true average calorie intake for teens who don’t typically eat fast food and 2 is true average intake for teens who do typically eat fast food. The hypotheses of interest are H0 : 1 – 2 = –200 versus Ha : 1 – 2 < –200 The alternative hypothesis asserts that true average daily intake for those who typically eat fast food exceeds that for those who don’t by more than 200 calories.

Example 9.4 The test statistic value is cont’d The test statistic value is The inequality in Ha implies that the test is lower-tailed; H0 should be rejected if z  –z0.5 = –1.645. The calculated test statistic value is

Example 9.4 cont’d The inequality in 𝐻 𝑎 implies that P-value = Φ(-2.20) = .0139 Since –2.20  –1.645, the null hypothesis is rejected. At a significance level of .05, it does appear that true average daily calorie intake for teens who typically eat fast food exceeds by more than 200 the true average intake for those who don’t typically eat such food.

Example 9.4 cont’d However, the P-value is not small enough to justify rejecting H0 at significance level .01. Notice that if the label 1 had instead been used for the fast-food condition and 2 had been used for the no-fast-food condition, then 200 would have replaced –200 in both hypotheses and Ha would have contained the inequality >, implying an upper-tailed test. The resulting test statistic value would have been 2.20, giving the same P-value as before.

9.2 The Two-Sample t Test and Confidence Interval Copyright © Cengage Learning. All rights reserved.

The Two-Sample t Test and Confidence Interval We could, for example, assume that both population distributions are members of the Weibull family or that they are both Poisson distributions. It shouldn’t surprise you to learn that normality is typically the most reasonable assumption. Assumptions

The Two-Sample t Test and Confidence Interval

Example: Among the 𝑛1=10 subjects who followed diet A, their mean weight loss was 𝑥 1 =4.5 lb with a standard deviation of 𝑠1=6.5 lb. Among the 𝑛2=10 subjects who followed diet B, their mean weight loss was 𝑥 2 =3.2 lb with a standard deviation of 𝑠2=4.5 lb. Test the claim that the mean weight loss of diet A is more than that of diet B. Assume the two populations have the same variance. Use α = 0.05.

Example The parameters about which the claim is made are Assume equal population variances. Test statistic:

Example P-value = 0.305 > α = 0.05. Technical conclusion: Do not reject H0 Final conclusion: There is not sufficient evidence to support the claim that the mean weight loss from diet A is more than the mean weight loss from diet B.

9.3 Analysis of Paired Data Copyright © Cengage Learning. All rights reserved.

Analysis of Paired Data We considered making an inference about a difference between two means 1 and 2. This was done by utilizing the results of a random sample X1, X2,…Xm from the distribution with mean 1 and a completely independent (of the X’s) sample Y1,…,Yn from the distribution with mean 2. That is, either m individuals were selected from population 1 and n different individuals from population 2, or m individuals (or experimental objects) were given one treatment and another set of n individuals were given the other treatment.

Analysis of Paired Data In contrast, there are a number of experimental situations in which there is only one set of n individuals or experimental objects; making two observations on each one results in a natural pairing of values.

Analysis of Paired Data Assumptions

The Paired t Test

Example 9.9 Musculoskeletal neck-and-shoulder disorders are all too common among office staff who perform repetitive tasks using visual display units. The article “Upper-Arm Elevation During Office Work” (Ergonomics, 1996: 1221 – 1230) reported on a study to determine whether more varied work conditions would have any impact on arm movement.

Example 9.9 The accompanying data was obtained from a sample of cont’d The accompanying data was obtained from a sample of n = 16 subjects.

Example 9.9 cont’d Each observation is the amount of time, expressed as a proportion of total time observed, during which arm elevation was below 30°. The two measurements from each subject were obtained 18 months apart. During this period, work conditions were changed, and subjects were allowed to engage in a wider variety of work tasks. Does the data suggest that true average time during which elevation is below 30° differs after the change from what it was before the change?

Example 9.9 cont’d Figure 9.5 shows a normal probability plot of the 16 differences; the pattern in the plot is quite straight, supporting the normality assumption. A normal probability plot from Minitab of the differences in Example 9 Figure 9.5

Example 9.9 cont’d A boxplot of these differences appears in Figure 9.6; the boxplot is located considerably to the right of zero, suggesting that perhaps D > 0 (note also that 13 of the 16 differences are positive and only two are negative). A boxplot of the differences in Example 9.9 Figure 9.6

Example 9.9 Let’s now test the appropriate hypotheses. cont’d Let’s now test the appropriate hypotheses. Let D denote the true average difference between elevation time before the change in work conditions and time after the change. 2. H0: D = 0 (there is no difference between true average time before the change and true average time after the change) 3. H0: D ≠ 0

Example 9.9 4. 5. n = 16, di = 108, and  = 1746, from which = 6.75, cont’d 4. 5. n = 16, di = 108, and  = 1746, from which = 6.75, sD = 8.234, and 6. Appendix Table A.8 shows that the area to the right of 3.3 under the t curve with 15 df is .002. The inequality in Ha implies that a two-tailed test is appropriate, so the P-value is approximately 2(.002) = .004 (Minitab gives .0051).

Example 9.9 cont’d 7. Since .004 < .01, the null hypothesis can be rejected at either significance level .05 or .01. It does appear that the true average difference between times is something other than zero; that is, true average time after the change is different from that before the change.

9.4 Inferences Concerning a Difference Between Population Proportions Copyright © Cengage Learning. All rights reserved.

Inferences Concerning a Difference Between Population Proportions Proposition

Example 9.11 The article “Aspirin Use and Survival After Diagnosis of Colorectal Cancer” (J. of the Amer. Med. Assoc., 2009: 649–658) reported that of 549 study participants who regularly used aspirin after being diagnosed with colorectal cancer, there were 81 colorectal cancer-specific deaths, whereas among 730 similarly diagnosed individuals who did not subsequently use aspirin, there were 141 colorectal cancer-specific deaths. Does this data suggest that the regular use of aspirin after diagnosis will decrease the incidence rate of colorectal cancer-specific deaths? Let’s test the appropriate hypotheses using a significance level of .05.

Example 9.11 cont’d The parameter of interest is the difference p1 – p2, where p1 is the true proportion of deaths for those who regularly used aspirin and p2 is the true proportion of deaths for those who did not use aspirin. The use of aspirin is beneficial if p1 < p2 which corresponds to a negative difference between the two proportions. The relevant hypotheses are therefore H0: p1 – p2 = 0 versus Ha: p1 – p2 < 0

Example 9.11 Parameter estimates are = 81/549 = .1475, cont’d Parameter estimates are = 81/549 = .1475, = 141/730 = .1932 and =(81 + 141)/(549 + 730) = .1736. A z test is appropriate here because all of and are at least 10. The resulting test statistic value is The corresponding P-value for a lower-tailed z test is (– 2.14) = .0162.

Example 9.11 cont’d Because .0162  .05, the null hypothesis can be rejected at significance level .05. So anyone adopting this significance level would be convinced that the use of aspirin in these circumstances is beneficial. However, someone looking for more compelling evidence might select a significance level .01 and then not be persuaded.

9.5 Inferences Concerning Two Population Variances Copyright © Cengage Learning. All rights reserved.

The F Distribution

The F Distribution The F probability distribution has two parameters, denoted by v1 and v2. The parameter v1 is called the number of numerator degrees of freedom, and v2 is the number of denominator degrees of freedom; here v1 and v2 are positive integers. A random variable that has an F distribution cannot assume a negative value. Since the density function is complicated and will not be used explicitly, we omit the formula. There is an important connection between an F variable and chi-squared variables.

The F Distribution If X1 and X2 are independent chi-squared rv’s with v1 and v2 df, respectively, then the rv (the ratio of the two chi-squared variables divided by their respective degrees of freedom), can be shown to have an F distribution. (9.8)

The F Distribution Figure 9.7 illustrates the graph of a typical F density function. An F density curve and critical value Figure 9.7

The F Test for Equality of Variances

The F Test for Equality of Variances

Example 9.14 A random sample of 200 vehicles traveling on gravel roads in a county with a posted speed limit of 35 mph on such roads resulted in a sample mean speed of 37.5 mph and a sample standard deviation of 8.6 mph, whereas another random sample of 200 vehicles in a county with a posted speed limit of 55 mph resulted in a sample mean and sample standard deviation of 35.8 mph and 9.2 mph, respectively (these means and standard deviations were reported in the article “Evaluation of Criteria for Setting Speed Limits on Gravel Roads” (J. of Transp. Engr., 2011: 57–63); the actual sample sizes result in dfs that exceed the largest of those in our F table).

Example 9.14 Let’s carry out a test at significance level .10 to decide whether the two population distribution variances are identical. 𝜎 1 2 is the variance of the speed distribution on the 35 mph roads, and 𝜎 2 2 is the variance of the speed distribution on 55 mph roads. 2. 𝐻 0 : 𝜎 1 2 = 𝜎 2 2 3. 𝐻 𝑎 : 𝜎 1 2 ≠ 𝜎 2 2 4. Test statistic value:𝑓= (8.9) 2 / (9.2) 2 =.87

Example 9.14 5. Calculation: f 5 (8.6)2y(9.2)2 5 .87 6. P-value determination: .87 lies in the lower tail of the F curve with 199 numerator df and 199 denominator df. A glance at the F table shows that 𝐹 .10,199,200 ≈ 𝐹 .10,200,200 ≈ 1.20 (consult the 𝑣 1 =120 and 𝑣 1 = 1000 columns), implying 𝐹 .90,199,199 ≈1/1.20 =.83 (these values are confirmed by software). That is, the area under the relevant F curve to the left of .83 is .10. Thus the area under the curve to the left of .87 exceeds .10, and so P-value > 2(.10) = .2 (software gives .342).

Example 9.14 7. The P-value clearly exceeds the mandated significance level. The null hypothesis therefore cannot be rejected; it is plausible that the two speed distribution variances are identical. The sample sizes in the cited article were 2665 and 1868, respectively, and the P-value reported there was .0008. So for the actual data, the hypothesis of equal variances would be rejected not only at significance level .10—in contrast to our conclusion—but also at level .05, .01, and even .001.

Example 9.14 This illustrates again how quite large sample sizes can magnify a small difference in estimated values. Note also that the sample mean speed for the county with the lower posted speed limit was higher than for the county with the lower limit, a counterintuitive result that surprised the investigators; and because of the very large sample sizes, this difference in means is highly statistically significant.