Presentation is loading. Please wait.

Presentation is loading. Please wait.

STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent.

Similar presentations


Presentation on theme: "STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent."— Presentation transcript:

1 STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent of Y 1, …, Y m are iid from another distribution. Further suppose that both n and m are small and we are interested in testing whether the two populations have the same means. Can use the t-test (pooled or unpooled) since it is robust as long as there are no extreme 1outliers and skewness. Alternatively, we can use bootstrap hypothesis testing.

2 STA248 week 122 Bootstrap Hypothesis Testing - Introduction Suppose X 1, …, X n is a random sample of size n, independent from another random sample Y 1, …, Y m of size m. and we wish to test vs. As a test statistics we will use. The P-values of this test is. We want the bootstrap estimate of this P-value.

3 STA248 week 123 Bootstrap Test Procedure To obtain the bootstrap estimate of the P-value we need to generate samples with H 0 true. One way of doing this (assuming X and Y have same distribution) is to combine 2 samples into 1 of size n+m. Then re-sample with replacement from this combined sample such that each re-sampling has two groups … For each bootstrap sample calculate the bootstrap estimate of the test statistics, j = 1, …, B. The bootstrap estimate of the P-value is ….

4 STA248 week 124 Example

5 STA248 week 125 Data Collection There are three main methods for collecting data.  Observational studies  Sample survey  Planned / designed experiments These methods differ in the strength of conclusion that can be drawn.

6 STA248 week 126 Observational Studies In some cases, a study may be undertaken retrospectively. In observational studies we simply collect information about variables of interest without applying any intervention or controlling for any factors. When factors are not controlled we are not able to infer a cause- effect relationship. Other problems with observation studies are:  Confounding – can’t separate effect of one variable from another.  Lack of generalization.

7 STA248 week 127 Sample Surveys Sample surveys are observational in nature. Surveys require existence of physically real population. Data is collected on a random sample from the target population. Survey design includes selection of sample so it is representative of the population as a whole. Use statistics to make inference about entire population. Confounding is still a problem. However, the results can be generalized to the population. Cause of any observed differences cannot be determined. To allow generalization and to avoid bias – sample must be chosen randomly e.g., SRS.

8 STA248 week 128 Planned / Designed Experiments There are few key features of designed experiments that distinguish it from any other type of study. Independent variables of interest are carefully controlled by the experimenter in order to determine their effect on a response (dependent) variable. Researcher randomly assign a treatment to the subjects or experimental units. Control of independent variables and randomization make it possible to infer cause and effect relationship. Use of replication – multiple observation per treatment. Replication allows measurement of variability.

9 Treatments are sometimes called predictor variables and sometimes called “factors”. The values of a factor are its “levels”. A design is balanced if each treatment has the same number of experimental units. Problem: can’t always carry out an experiment. STA248 week 109

10 STA248 week 1210 Randomization The use of randomization to allocate treatments to experimental units (or vice versa) is the key element of well-designed experiment. Random allocation tends to produce subgroups which are comparable with respect to the variables known to influence the response. Randomization ensures that no bias is introduced in allocation of treatments to experimental units. Randomization reduces the possibility that factors not included in the design will be confounded with treatment.

11 Cautions Regarding Experiments “Effective sample size” – all statistical techniques we have learned assume observations are independent. If they are not but treated as if they were, get more power and smaller CI than you should. “Fishing expedition” – if doing 100 tests at α = 0.05 significant level, expect 5 of 100 tests to show significant differences from H 0 even when H 0 is always true (type I errors). STA248 week 1011

12 Controlling for Type I error One widely use method for controlling for type I error uses Bonferoni Inequality…. If A i is the event that the i th test has a type I error, and typically P(A i ) = α, then by Bonferoni Inequality we that:.. That is the probability of committing at least one type I error in k tests is at most kα. Therefore, if use significant level of α/k for each individual test, then the “overall significant level” (P(at least 1 type I error)) is at most α. The Bonferoni method is very conserevative. STA248 week 1012

13 Analysis of Variance – Introduction Generalization of the two sample t-procedures (with equal variances). The objective in analysis of variance is to determine whether there are differences in means of more than 2 groups. The statistical methodology for comparing several means is called analysis of variance, or simply ANOVA. When studying the effect of one factor only on the response we use one-way ANOVA to analyze the data. When studying the effect of two factors on the response we use two- way ANOVA. STA248 week 1013

14 One-Way ANOVA model The response variable Y is measured on each experimental unit in each treatment group. Measure Y ij for the j th subject in the i th group. The one-way ANOVA model is: Y ij = μ i + ε ij for i = 1, 2,…, k and j = 1, 2, …, n i. μ i is the unknown mean response for the i th group. The ε ij are called “random errors” and are assumed to be i.i.d N(0, σ 2 ). The parameters of the model are the population means μ 1, μ 2,…, μ k and the common standard deviation σ. The objective of one-way ANOVA is to test whether the mean response in each treatment group is the same. The null and alternative hypotheses are…. STA248 week 1014

15 Derivation of Test Statistics STA248 week 1015


Download ppt "STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent."

Similar presentations


Ads by Google