Comparing Populations

Comparing Populations
Proportions and means

Most studies will have more than one population.
Example The Salk-vaccine trial 1954 A large study to determine if the Salk vaccine was effective in reducing the incidence of polio. Two populations: Individuals vaccinated with the Salk vaccine Individuals vaccinated with a placebo A double blind study both individuals vaccinated and MD’s treating the cases did not know who recieved the vaccine and who received the placebo

When there are more than one population one will be interested in making comparisons.
Comparisons are sometimes made through differences, sometimes through ratios

The sampling distribution of differences of Normal Random Variables
An important fact: The sampling distribution of differences of Normal Random Variables If X and Y denote two independent normal random variables, then : D = X – Y is normal with This fact allows us to determine the sampling distribution of differences

Comparing proportions

Situation We have two populations (1 and 2) Let p1 denote the probability (proportion) of “success” in population 1. Let p2 denote the probability (proportion) of “success” in population 2. Objective is to compare the two population proportions

Consider the statistic:
This statistic has a normal distribution with using the important fact

Thus Has a standard normal distribution

We want to test either: or or

If p1 = p2 (p say) then the test statistic:

has a standard normal distribution.
where is an estimate of the common value of p1 and p2.

Thus for comparing two binomial probabilities p1 and p2
The test statistic where

The Alternative Hypothesis HA
The Critical Region The Alternative Hypothesis HA The Critical Region

Example In a national study to determine if there was an increase in mortality due to pipe smoking, a random sample of n1 = 1067 male nonsmoking pensioners were observed for a five-year period. In addition a sample of n2 = 402 male pensioners who had smoked a pipe for more than six years were observed for the same five-year period. At the end of the five-year period, x1 = 117 of the nonsmoking pensioners had died while x2 = 54 of the pipe-smoking pensioners had died. Is there a the mortality rate for pipe smokers higher than that for non-smokers

We want to test: The test statistic:

Note: (Non smokers) (Pipe smokers) (Combined)

The test statistic:

We reject H0 if: Not true hence we accept H0. Conclusion: There is not a significant (a = 0.05) increase in the mortality rate due to pipe-smoking

Estimating a difference proportions using confidence intervals
Situation We have two populations (1 and 2) Let p1 denote the probability (proportion) of “success” in population 1. Let p2 denote the probability (proportion) of “success” in population 2. Objective is to estimate the difference in the two population proportions d = p1 – p2.

Confidence Interval for d = p1 – p2 100P% = 100(1 – a) % :

Example Estimating the increase in the mortality rate for pipe smokers higher over that for non-smokers d = p2 – p1

Comparing Proportions

Summary The test for a difference in proportions
(The test statistic) Estimating the difference in proportion by a confidence interval

Comparing Means

Comparing Means Situation We have two normal populations (1 and 2)
Let m1 and s1 denote the mean and standard deviation of population 1. Let m2 and s2 denote the mean and standard deviation of population 2. Let x1, x2, x3 , … , xn denote a sample from a normal population 1. Let y1, y2, y3 , … , ym denote a sample from a normal population 2. Objective is to compare the two population means

Consider the test statistic:

If: will have a standard Normal distribution
This will also be true for the approximation (obtained by replacing s1 by sx and s2 by sy) if the sample sizes n and m are large (greater than 30)

The Critical Region

Example A study was interested in determining if an exercise program had some effect on reduction of Blood Pressure in subjects with abnormally high blood pressure. For this purpose a sample of n = 500 patients with abnormally high blood pressure were required to adhere to the exercise regime. A second sample m = 400 of patients with abnormally high blood pressure were not required to adhere to the exercise regime. After a period of one year the reduction in blood pressure was measured for each patient in the study.

We want to test: The exercise group did not have a higher average reduction in blood pressure vs The exercise group did have a higher average reduction in blood pressure

The test statistic:

Suppose the data has been collected and:

The test statistic:

We reject H0 if: True hence we reject H0. Conclusion: There is a significant (a = 0.05) effect due to the exercise regime on the reduction in Blood pressure

Estimating a difference means using confidence intervals
Situation We have two populations (1 and 2) Let m1 denote the mean of population 1. Let m2 denote the mean of population 2. Objective is to estimate the difference in the two population proportions d = m1 – m2.

Confidence Interval for d = m1 – m2

Example Estimating the increase in the average reduction in Blood pressure due to the excercize regime d = m1 – m2

Comparing Means – small samples
The t test

Comparing Means – small samples
Situation We have two normal populations (1 and 2) Let m1 and s1 denote the mean and standard deviation of population 1. Let m2 and s2 denote the mean and standard deviation of population 1. Let x1, x2, x3 , … , xn denote a sample from a normal population 1. Let y1, y2, y3 , … , ym denote a sample from a normal population 2. Objective is to compare the two population means

Consider the test statistic:

If the sample sizes (m and n) are large the statistic
will have approximately a standard normal distribution This will not be the case if sample sizes (m and n) are small

The t test – for comparing means – small samples (equal variances)
Situation We have two normal populations (1 and 2) Let m1 and s denote the mean and standard deviation of population 1. Let m2 and s denote the mean and standard deviation of population 1. Note: we assume that the standard deviation for each population is the same. s1 = s2 = s

The pooled estimate of s.
Note: both sx and sy are estimators of s. These can be combined to form a single estimator of s, sPooled.

The test statistic: If m1 = m2 this statistic has a t distribution with n + m –2 degrees of freedom

The Critical Region are critical points under the t distribution with degrees of freedom n + m –2.

Example A study was interested in determining if administration of a drug reduces cancerous tumor size. For this purpose n +m = 9 test animals are implanted with a cancerous tumor. n = 3 are selected at random and administered the drug. The remaining m = 6 are left untreated. Final tumour sizes are measured at the end of the test period

We want to test: The treated group did not have a lower average final tumour size. vs The treated group did have a lower average final tumour size.

The test statistic:

We reject H0 if: with d.f. = n + m – 2 = 7 Hence we accept H0. Conclusion: The drug treatment does not result in a significant (a = 0.05) smaller final tumour size,

Confidence intervals for the difference in two means of normal populations (small sample sizes equal variances) (1 – a)100% confidence limits for m1 – m2 where

Tests, Confidence intervals for the difference in two means of normal populations (small sample sizes, unequal variances)

Consider the statistic
For large sample sizes this statistic has standard normal distribution. For small sample sizes this statistic has been shown to have approximately a t distribution with

The approximate test for a comparing two means of Normal Populations (unequal variances)
Test statistic Null Hypothesis Alt. Hypothesis Critical Region H0: m1 = m2 H0: m1 ≠ m2 t < -ta/2 or t > ta/2 H0: m1 > m2 t > ta H0: m1 < m2 t < -ta

Confidence intervals for the difference in two means of normal populations (small samples, unequal variances) (1 – a)100% confidence limits for m1 – m2 with

Testing for the equality of variances
The F test

Situation: Let x1, x2, x3, … xn, denote a sample from a Normal distribution with mean mx and standard deviation sx Let y1, y2, y3, … ym, denote a second independent sample from a Normal distribution with mean my and standard deviation sy We want to test for the equality of the two variances

i.e.: Test (Two sided alternative) or Test (one sided alternative) or Test (one sided alternative)

The sampling distribution of the test statistic
The test statistic (F) The sampling distribution of the test statistic If the Null Hypothesis (H0) is true then the sampling distribution of F is called the F-distribution with n1 = n - 1 degrees in the numerator and n2 = m - 1 degrees in the denominator

The F distribution n1 = n - 1 degrees in the numerator
n2 = m - 1 degrees in the denominator a Fa(n1, n2)

Note: If has F-distribution with n1 = n - 1 degrees in the numerator
and n2 = m - 1 degrees in the denominator then has F-distribution with n1 = m - 1 degrees in the numerator and n2 = n - 1 degrees in the denominator

Critical region for the test:
(Two sided alternative) Reject H0 if or

Critical region for the test (one tailed):
(one sided alternative) Reject H0 if

Example A study was interested in determining if administration of a drug reduces cancerous tumor size. For this purpose n +m = 9 test animals are implanted with a cancerous tumor. n = 3 are selected at random and administered the drug. The remaining m = 6 are left untreated. Final tumour sizes are measured at the end of the test period

We want to test: (H0 is assumed for the t-test for comparing the means ) Using a =0.05 we will reject H0 if or

Test statistic: and Therefore we accept

An example of improved experimental design
The paired t-test An example of improved experimental design

Often we are interested in comparing the effect of two (or more) treatments on some variable.
Examples: The effect of two diets on weight loss. The effect of two drugs on the drop in Cholesterol levels. The effects of two method in teaching on Math Proficiency

One possible design is to randomly divide the available subjects into two groups.
The first group will receive treatment 1. The 2nd group will receive treatment 2. We then collect data on the two groups Let x1, x2, x3,…, xn denote the data for treatment 1. Let y1, y2, y3,…, ym denote the data for treatment 2. This design is called the independent sample design. To test for the equality of treatment means we use the two sample t test

The test statistic: d.f. = n + m - 2

The matched pair experimental design (The paired sample experiment)
Prior to assigning the treatments the subjects are grouped into pairs of similar subjects. Suppose that there are n such pairs (Total of 2n = n + n subjects or cases), The two treatments are then randomly assigned to each pair. One member of a pair will receive treatment 1, while the other receives treatment 2. The data collected is as follows: (x1, y1), (x2 ,y2), (x3 ,y3),, …, (xn, yn) . xi = the response for the case in pair i that receives treatment 1. yi = the response for the case in pair i that receives treatment 2.

Let xi = the measurement of the response for the subject in pair i that received treatment 1.
Let yi = the measurement of the response for the subject in pair i that received treatment 2. The data … x1 y1 x2 y2 x3 y3 xn yn

Let di = yi - xi. Then d1, d2, d3 , … , dn is a sample from a normal distribution with mean, md = m2 – m1 , and standard deviation Note if the x and y measurements are positively correlated (this will be true if the cases in the pair are matched effectively) than sd will be small.

To test H0: m1 = m2 is equivalent to testing H0: md = 0.
(we have converted the two sample problem into a single sample problem). The test statistic is the single sample t-test on the differences d1, d2, d3 , … , dn namely df = n - 1

Example We are interested in comparing the effectiveness of two method for reducing high cholesterol The methods Use of a drug. Control of diet. The 2n = 8 subjects were paired into 4 match pairs. In each matched pair one subject was given the drug treatment, the other subject was given the diet control treatment. Assignment of treatments was random.

The data reduction in cholesterol after 6 month period
Pair Treatment 1 2 3 4 Drug treatment 30.3 10.2 22.3 15.0 Diet control Treatment 25.7 9.4 24.6 8.9

Differences Pair Treatment 1 2 3 4 30.3 10.2 22.3 15.0 25.7 9.4 24.6
Drug treatment 30.3 10.2 22.3 15.0 Diet control Treatment 25.7 9.4 24.6 8.9 di 4.6 0.8 -2.3 6.1 for df = n – 1 = 3, Hence we accept H0.

Example 2 In this example the researcher is interested in the effect of an antidepressant in reducing depression. Subjects were given a psychological test measuring depression (on a scale 0-100) at the beginning of the study (Pre-score) and after a period of one month on the anti-depressant (Post-score). Did the drug have any effect on reducing depression?

Table: Prescore (xi), Postscore (yi), difference (di)

Comments This last example is a matched pair experiment that occurs frequently. You have two observations on the same subject. One observation under 1 condition or treatment (the Pre score), the other observation under a second condition (the Post score) (after treatment) The subject is his own matched twin. This design is sometimes called a Repeated Measures design

Example 3 In this example, one is interested in determining if a new method of mathematics instruction is an improvement over the current method. To determine this, 20 grade 4 students were selected. They were divided into n = 10 matched pairs. The students were matched relative to ability. One member of each matched pair was instructed using the new method, the other member using the current method. All students were tested at the end of the instruction period

The data

Summary of Tests

One Sample Tests p ≠ p0 p = p0 p > p0 p < p0

Two Sample Tests

Two Sample Tests - continued
Situation Test statistic H0 HA Critical Region Two independent Normal samples with unknown means and variances (unequal) m1 = m2 m1 ≠ m2 t < - ta/2 or t > ta/2 df = * m1 > m2 t > ta df = * m1 < m2 t < - ta df = * s1 = s2 s1 ≠ s2 F > Fa/2(n-1, m -1) or 1/F > Fa/2(m-1, n -1) s1 > s2 F > Fa(n-1, m -1) s1 < s2 1/F > Fa(m-1, n -1) * = 1 2 n2

The paired t test Situation Test statistic H0 HA Critical Region
n matched pair of subjects are treated with two treatments. di = xi – yi has mean d = m1 – m2 m1 = m2 m1 ≠ m2 t < - ta/2 or t > ta/2 df = n - 1 m1 > m2 t > ta df = n - 1 m1 < m2 t < - ta df = n - 1 Independent samples Matched Pairs Treat 1 Treat 2 Treat 2 Treat 1 Pair 1 Pair 2 Pair 3 Possibly equal numbers Pair n

Sample size determination
When comparing two or more populations

Estimating a difference in proportions using confidence intervals
Confidence Interval for d = p1 – p2 : Again we want to choose n1 and n2 to set B at some predetermined level with a fixed level of confidence 1 – a.

There are many solutions for n1 and n2 that will achieve a specified error bound B with level of confidence 1 – a. You can make B small by increasing n1 or n2 or a combination of both. Some useful practical solutions satisfy Equal sample size: n1 = n2 This would be an appropriate choice if one researcher was to collect data from population 1, another was to collect data from population 2 and you wanted to equalize the workload.

Minimize Total sample size: Choose n1 and n2 so that the required error bound B is achieved and the total sample size, n1 + n2, is minimized. This would be an appropriate choice if a single researcher was to collect data from both population 1 and population 2 and you wanted to minimize his workload.

Minimize Total Cost of the sample: Suppose that the study has a fixed cost of C0$ and that the cost of a single observation populations 1 and 2 is c1$ and c2$ repectively, Then the total cost of the study is: C0 + n1c1 + n2c2 . This approach chooses n1 and n2 so that the required error bound B is achieved and the total cost, C0 + n1c1 + n2c2, is minimized.

Special solutions - case 1: n1 = n2 = n.
then

Special solutions - case 2: Choose n1 and n2 to minimize N = n1 + n2 = total sample size
then

Special solutions - case 3: Choose n1 and n2 to minimize C = C0 + c1 n1 + c2 n2 = total cost of the study Note: C0 = fixed (set-up) costs c1 = cost per unit in population 1 c2 = cost per unit in population 2 then

An example Suppose we are interested in comparing two drugs (A and B) for their effectiveness in reducing blood pressure in patients with abnormally high blood pressure. We give the drug A for 1 month to n = 100 patients and out of these, x = 55 were able to reduce their BP by 10 units. Similarly a sample of m = 100 patients given drug B resulted in y =61 patients reducing BP.

The data 95% confidence interval Error Bound Drug Sample Size
Positive Outcomes proportion A 100 55 0.55 B 61 0.61 95% confidence interval Error Bound

Suppose I wanted to determine the sample sizes that would estimate with a smaller error bound, B = 0.02 (2%) with a 95% level of confidence. Determine the sample sizes that would achieve this objective. Use the data from the preliminary samples in this calculation. Solutions Equal sample sizes

Total sample sizes minimized

Total Cost minimized (assume

Determination of sample size (means)
When the objective is to compare the two means of two Normal populations

Estimating a difference in means using confidence intervals
Confidence Interval for d = m1 – m2 : Again we want to choose n1 and n2 to set B at some predetermined level with a fixed level of confidence 1 – a.

The sample sizes required, n1 and n2, to estimate m1 – m2 within an error bound B with level of confidence 1 – a are: Equal sample sizes Minimizing the total sample size N = n1 + n2 . Minimizing the total cost C = C0 + c1n1 + c2n2 .

Some general comments If a population is more variable (s2 larger)
more observations should be assigned to the sample from that population If it is less costly to take observations in a population

Next Topic: Comparing k populations

Comparing Populations

Similar presentations

Presentation on theme: "Comparing Populations"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Comparing Populations

Similar presentations

Presentation on theme: "Comparing Populations"— Presentation transcript:

Similar presentations

About project

Feedback