Topic 8 - Comparing two samples

Name: Topic 8 - Comparing two samples
Uploaded: 2017-09-10T12:07:18+00:00
Duration: PTM13S45
Channel: Britton Lambert
Description: Topic 8 - Comparing two samples

Topic 8 - Comparing two samples
Confidence intervals/hypothesis tests for two means Hypothesis test for two variances

Comparing two populations
Sometimes we want to compare two populations rather making decisions about a single population. For example, we might want to compare two population means or two population proportions to see if they are equal. Is the expected drying time for one type of paint lower than that of another type of paint? Is a new drug more effective? Either increased or decreased mean versus the “established” drug, or increased or decreased percentage vs. control Does the new method actually result in increased crop yields or percentages, or decrease in tons lost to insects, etc.

Behind the scenes. What do the distributions look like?

Comparing two population means
Suppose we have two independent samples, X1,…,Xm and Y1,…,Yn, from two separate populations. A natural statistic for comparing the two population means, mX and mY, is The distribution of is also Normal for m and n both large.

Large samples test for comparing population means
To test H0: mX – mY = D0, use the test statistic HA Reject H0 if mX – mY < D0 Z < -za mX – mY > D0 Z > za mX – mY ≠ D0 |Z| > za/2

Home sales data A realtor in Albuquerque wants to argue that houses in the Northeast are more expensive on average than those in the rest of town. NE = 0 indicates a home was not in the Northeast. Test the appropriate hypotheses with a = 0.01.

This is what the StatCrunch data looks like.

Here’s the output in StatCrunch

What does it look like?

Large samples confidence interval for the difference between two population means
A large sample (1-a)100% confidence interval for mX – mY is For the home sales data, what is a 99% confidence interval for the difference between sale prices in the Northeast and the rest of town?

Equal population variances
Suppose we assume that the two populations have a common variance s2. We can then estimate this common variance using the pooled sample variance:

Small samples test for comparing population means from Normal distributions with equal variances
To test H0: mX – mY = D0, use the test statistic HA Reject H0 if mX – mY < D0 T < -ta,n+m-2 mX – mY > D0 T > ta,n+m-2 mX – mY ≠ D0 |T| > ta/2,n+m-2

THC example with equal variances
The active component in marijuana is THC. An experiment was conducted to compare two slightly different configurations of this substance. The THC data set contains the time until the effect was perceived for 6 subjects exposed to each configuration. Is there any evidence that the mean time to perception is different between the two configurations using a = 0.01?

Here’s what the calculations look like.
Pooled standard deviation

What does it look like? Twice the one tail value.

Small samples confidence interval for the difference between two population means
Assuming equal variances, a small sample (1-a)100% confidence interval for mX – mY is For the THC data, what is a 99% confidence interval for the mean difference between the detection times for the two configurations?

Unequal population variances
The pooled procedures we have discussed previously are fairly robust to the assumption of equal variances. In other words if the two population variances are relatively close, the procedures perform well: The level of significance for the hypothesis test is close to what it should be The coverage probability for the confidence interval is close to what it should be If the variances are quite different, then we need a different procedure.

Small samples test for comparing population means from Normal distributions with unequal variances
To test H0: mX – mY = D0, use the test statistic with degrees of freedom HA Reject H0 if mX – mY < D0 T < -ta,v mX – mY > D0 T > ta,v mX – mY ≠ D0 |T| > ta/2,v

Small samples confidence interval for the difference between two population means… with unequal variances. Assuming unequal variances, a small sample (1-a)100% confidence interval for mX – mY is For the THC data, what is a 99% confidence interval for the mean difference between the detection times for the two configurations?

Comparing two population variances
Suppose two chemical companies can supply a raw material, but we suspect the variability in concentration may differ between the two. The standard deviation of concentration in a random sample of 15 batches from company 1 was found to be 4.7 g/l (variance 22.09). A sample of 21 batches from company 2 yielded a standard deviation of 5.8 g/l (variance 33.64). Is there sufficient evidence to conclude that the variability in concentration differs for the two companies?

Test for comparing population variances from Normal distributions
To test H0: sX2= sY2, use the test statistic HA Reject H0 if sX2 > sY2 F > Fa,m-1,n-1 sX2 < sY2 F < F1-a,m-1,n-1 sX2 ≠ sY2 F > Fa/2,m-1,n-1 or F < F1-a/2,m-1,n-1

Chemical example Is there sufficient evidence to conclude that the variability in concentration differs for the two companies with a = 0.05? Demonstrate the F calculator.

Confidence interval for the ratio of two Normal population variances
A (1-a)100% confidence interval for sX2/sY2 is For the THC example, what is a 95% confidence interval for the ratio of concentration variances? The additional file for Topic 8 contains examples of large and small scale tests on the differences in population means and proportions.

Paired data Sometimes we have a third variable that connects elements from the X and Y samples. In this case, the assumption of independence between the two samples may be violated. Is there any evidence that the first twin and the second twin have different average weights among boy-boy twins? In this case, the twins are clearly connected by the mother. It might be better to base our test on the n pairwise differences, Di = Xi – Yi.

Paired test for comparing population means
To test H0: mX – mY = D0, use the test statistic HA Reject H0 if mX – mY < D0 T < -ta,n-1 mX – mY > D0 T > ta,n-1 mX – mY ≠ D0 |T| > ta/2,n-1

Twins example Load the Twins data from StatCrunch sample data sets.
Is there any evidence that Twin A and Twin B have different average weights among boy-boy twins with a = 0.1?

Additional pooled vs. paired
Example: The article “Sex and Race Discrimination in the New Car Showroom: A fact or Myth” (J. Consumer Affairs, 1977, pp ) reports the results of an experiment in which individuals of different races and sexes visited 9 car dealerships to request the best possible deal on a certain car. The actual car prices obtained are shown below:

The standard deviations are relatively close, so we could consider this as a pooled
test of differences, with the following results;

Two ways to look at the situation
Why did we get such poor results from our test? The assumption in a pooled test is that there’s independence of data. In other words, any values from the woman’s distribution of prices are independent of values from the man’s distribution…. A valid comparison in that situation looks like this….

However, we know that’s not the case.
Prices from dealership 1 can be compared to each other (M to W), dealership 2, etc. There’s a relationship between the prices, a “pairing variable”. They are not independent and when viewed correctly, the data shows something completely diffferent…..

Paired confidence interval for the difference between two population means
A small sample (1-a)100% confidence interval for mX – mY is For the car price example, what is a 90% confidence interval for the mean difference between the prices quoted to the black woman vs. the white man? CarData

Comparing two population proportions
A natural statistic for comparing the two population proportions, pX and pY, is The distribution of is also Normal for m and n both large.

Large samples test for comparing population proportions
To test H0: pX – pY = 0, use the test statistic HA Reject H0 if pX – pY < 0 Z < -za pX – pY > 0 Z > za pX – pY ≠ 0 |Z| > za/2 Please note that the common p listed above is calculated as the total number of successes overall in the study, divided by the total number of observations…..

Polio example The following table summarizes a study of the efficacy of the Salk vaccine. (Please note that I changed the actual percentages who got polio in this example to make the numbers MUCH more workable….don’t panic). Was the vaccine effective? Test at a = 0.05. Treatment Total Patients Polio Vaccine 2,000 30 Placebo 100

Large samples confidence interval for the difference between two population proportions
A large sample (1-a)100% confidence interval for pX – pY is For the Polio data, what is a 95% confidence interval for the difference between the proportion who contract the disease under each treatment?

Topic 8 - Comparing two samples

Similar presentations

Presentation on theme: "Topic 8 - Comparing two samples"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Topic 8 - Comparing two samples

Similar presentations

Presentation on theme: "Topic 8 - Comparing two samples"— Presentation transcript:

Similar presentations

About project

Feedback