Testing Multiple Means and the Analysis of Variance (§8.1, 8.2, 8.6) Situations where comparing more than two means is important. The approach to testing.

Slides:



Advertisements
Similar presentations
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Advertisements

EPI 809/Spring Probability Distribution of Random Error.
PSY 307 – Statistics for the Behavioral Sciences Chapter 20 – Tests for Ranked Data, Choosing Statistical Tests.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16.
Lecture 10 Non Parametric Testing STAT 3120 Statistical Methods I.
Analysis and Interpretation Inferential Statistics ANOVA
© 2010 Pearson Prentice Hall. All rights reserved Single Factor ANOVA.
Multiple regression analysis
Copyright ©2011 Brooks/Cole, Cengage Learning Analysis of Variance Chapter 16 1.
Part I – MULTIVARIATE ANALYSIS
Be humble in our attribute, be loving and varying in our attitude, that is the way to live in heaven.
ANOVA Determining Which Means Differ in Single Factor Models Determining Which Means Differ in Single Factor Models.
© 2002 Prentice-Hall, Inc.Chap 8-1 Statistics for Managers using Microsoft Excel 3 rd Edition Chapter 8 Two Sample Tests with Numerical Data.
ANalysis Of VAriance (ANOVA) Comparing > 2 means Frequently applied to experimental data Why not do multiple t-tests? If you want to test H 0 : m 1 = m.
Chapter Topics The Completely Randomized Model: One-Factor Analysis of Variance F-Test for Difference in c Means The Tukey-Kramer Procedure ANOVA Assumptions.
Chap 11-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 11 Hypothesis Testing II Statistics for Business and Economics.
Chapter 3 Analysis of Variance
Test statistic: Group Comparison Jobayer Hossain Larry Holmes, Jr Research Statistics, Lecture 5 October 30,2008.
1/45 Chapter 11 Hypothesis Testing II EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008.
Lecture 9: One Way ANOVA Between Subjects
Hypothesis Testing. Introduction Always about a population parameter Attempt to prove (or disprove) some assumption Setup: alternate hypothesis: What.
Chapter 17 Analysis of Variance
One-way Between Groups Analysis of Variance
Student’s t statistic Use Test for equality of two means
Today Concepts underlying inferential statistics
Statistical Methods in Computer Science Hypothesis Testing II: Single-Factor Experiments Ido Dagan.
Linear Contrasts and Multiple Comparisons (Chapter 9)
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.
 Combines linear regression and ANOVA  Can be used to compare g treatments, after controlling for quantitative factor believed to be related to response.
NONPARAMETRIC STATISTICS
1 Experimental Statistics - week 4 Chapter 8: 1-factor ANOVA models Using SAS.
© 2002 Prentice-Hall, Inc.Chap 9-1 Statistics for Managers Using Microsoft Excel 3 rd Edition Chapter 9 Analysis of Variance.
23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,
 The idea of ANOVA  Comparing several means  The problem of multiple comparisons  The ANOVA F test 1.
Design and Analysis of Experiments Dr. Tai-Yue Wang Department of Industrial and Information Management National Cheng Kung University Tainan, TAIWAN,
Biostat 200 Lecture 7 1. Hypothesis tests so far T-test of one mean: Null hypothesis µ=µ 0 Test of one proportion: Null hypothesis p=p 0 Paired t-test:
Analysis of variance Petter Mostad Comparing more than two groups Up to now we have studied situations with –One observation per object One.
Chapter 10 Analysis of Variance.
5-5 Inference on the Ratio of Variances of Two Normal Populations The F Distribution We wish to test the hypotheses: The development of a test procedure.
Testing Hypotheses about Differences among Several Means.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 10-1 Chapter 10 Two-Sample Tests and One-Way ANOVA Business Statistics, A First.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
One-Way Analysis of Variance
The Completely Randomized Design (§8.3)
Chapter 13 Multiple Regression
General Linear Model 2 Intro to ANOVA.
1 Always be mindful of the kindness and not the faults of others.
Previous Lecture: Phylogenetics. Analysis of Variance This Lecture Judy Zhong Ph.D.
PSYC 3030 Review Session April 19, Housekeeping Exam: –April 26, 2004 (Monday) –RN 203 –Use pencil, bring calculator & eraser –Make use of your.
Applied Quantitative Analysis and Practices LECTURE#25 By Dr. Osman Sadiq Paracha.
Business Statistics: A First Course (3rd Edition)
Experimental Statistics - week 3
Chapter 4 Analysis of Variance
One-Way Analysis of Variance Recapitulation Recapitulation 1. Comparing differences among three or more subsamples requires a different statistical test.
Other Types of t-tests Recapitulation Recapitulation 1. Still dealing with random samples. 2. However, they are partitioned into two subsamples. 3. Interest.
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
Analysis of variance Tron Anders Moger
Chapters Way Analysis of Variance - Completely Randomized Design.
Formula for Linear Regression y = bx + a Y variable plotted on vertical axis. X variable plotted on horizontal axis. Slope or the change in y for every.
ANOVA and Multiple Comparison Tests
Two-Sample-Means-1 Two Independent Populations (Chapter 6) Develop a confidence interval for the difference in means between two independent normal populations.
ENGR 610 Applied Statistics Fall Week 8 Marshall University CITE Jack Smith.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
1 Experimental Statistics - week 11 Chapter 11: Linear Regression and Correlation.
Statistics for Managers Using Microsoft Excel 3rd Edition
Introduction to ANOVA.
Presentation transcript:

Testing Multiple Means and the Analysis of Variance (§8.1, 8.2, 8.6) Situations where comparing more than two means is important. The approach to testing equality of more than two means. Introduction to the analysis of variance table, its construction and use.

Study Designs and Analysis Approaches 1.Simple Random Sample from a population with known  - continuous response. 2.Simple Random Sample from a population with unknown  - continuous response. 3.Simple RandomSamples from 2 popns with known . 4.Simple Random Samples from 2 popns with unknown . One sample z-test. One sample t-test. Two sample z-test. Two sample t-test.

One sample is drawn independently and randomly from each of t > 2 populations. Sampling Study with t>2 Populations Objective: to compare the means of the t populations for statistically significant differences in responses. Initially we will assume all populations have common variance, later, we will test to see if this is indeed true. (Homogeneity of variance tests).

Health Eaters Sampling Study Vegetarians Meat & Potato Eaters Random Sample Cholesterol Levels Random Sample Random Sample

Experimental Study with t>2 treatments Experimental Units: samples of size n 1, n 2, …, n t, are independently and randomly drawn from each of the t populations. Separate treatments are applied to each sample. A treatment is something done to the experimental units which would be expected to change the distribution (usually only the mean) of the response(s). Note: if all samples are drawn from the same population before application of treatments, the homogeneity of variances assumption might be plausible.

Experimental Study Male College Undergraduate Students Set of Experimental Units Health Diet Veg. Diet M & P Diet Cholesterol 1 year. Set of Experimental Units Set of Experimental Units Random Sampling Responses

Hypothesis Let  i be the true mean of treatment group i (or population i ). Hence we are interested in whether all the groups (populations) have exactly the same true means. The alternative is that some of the groups (populations) differ from the others in their means. Let  0 be the hypothesized common mean under H 0.

Each population has normally distributed responses around their own means, but the variances are the same across all populations. A Simple Model Let y ij be the response for experimental unit j in group i, i=1,2,..., t, j=1,2,..., n i. The model is:  ij is the residual or deviation from the group mean. Assuming y ij ~ N(  i,  2 ), then  ij ~ N(0,  2 ) If H 0 holds, y ij =  0 +  ij, that is, all groups have the same mean and variance. E(y ij )=  i : we expect the group mean to be  i.

A Naïve Testing Approach Test each possible pair of groups by performing all pair-wise t-tests. Assume each test is performed at the  =0.05 level. For each test, the probability of not rejecting H o when H o is true is 0.95 (1-  ). The probability of not rejecting H o when H o is true for all three tests is (0.95) 3 = Thus the true significance level (type I error) for the overall test of no difference in the means will be = 0.143, NOT the  =0.05 level we thought it would be. Also, in each individual t-test, only part of the information available to estimate the underlying variance is actually used. This is inefficient - WE CAN DO MUCH BETTER!

Testing Approaches - Analysis of Variance The term “analysis of variance” comes from the fact that this approach compares the variability observed between sample means to a (pooled) estimate of the variability among observations within each group (treatment). Between sample means variance Within groups variance

Extreme Situations Within group variance is large compared to variability between means. Unclear separation of means. Within group variance is small compared to variability between means. Clear separation of means.

Pooled Variance From two-sample t-test with assumed equal variance,  2, we produced a pooled (within-group) sample variance estimate. If all the n i are equal to n then this reduces to an average variance. Extend the concept of a pooled variance to t groups as follows:

Variance Between Group Means If we assume each group is of the same size, say n, then under H 0, s 2 is an estimate of  2 /n (the variance of the sampling distribution of the sample mean). Hence, n times s 2 is an estimate of  2. When the sample sizes are unequal, the estimate is given by: Consider the variance between the group means computed as: where

F-test Now we have two estimates of s 2, within and between means. An F-test can be used to determine if the two statistics are equal. Note that if the groups truly have different means, s B 2 will be greater than s w 2. Hence the F-statistic is written as: If H 0 holds, the computed F-statistics should be close to 1. If H A holds, the computed F-statistic should be much greater than 1. We use the appropriate critical value from the F - table to help make this decision. Hence, the F-test is really a test of equality of means under the assumption of normal populations and homogeneous variances.

Partition of Sums of Squares Total Sums of Squares Sums of Squares Between Means Sums of Squares Within Groups = + TSS = SSW + SSB TSS measures variability about the overall mean

The AOV (Analysis of Variance) Table The computations needed to perform the F-test for equality of variances are organized into a table. Partition of the sums of squares Partition of the degrees of freedom Variance Estimates Test Statistic

Example-Excel =average(b6:b10) =var(b6:b10) =sqrt(b13) =count(b6:b10) =(B15-1)*B13 =(sum(B15:D15)-1)*var(B6:D10) =sum(b16:d16) =b18-b19

Excel Analysis Tool Pac

proc anova; class popn; model resp = popn ; title 'Table 13.1 in Ott - Analysis of Variance'; run; Table 13.1 in Ott - Analysis of Variance 31 Analysis of Variance Procedure Dependent Variable: RESP Sum of Mean Source DF Squares Square F Value Pr > F Model Error Corrected Total R-Square C.V. Root MSE RESP Mean Source DF Anova SS Mean Square F Value Pr > F POPN Example SAS

proc glm; class popn; model resp = popn / solution; title 'Table 13.1 in Ott’; run; General Linear Models Procedure Dependent Variable: RESP Sum of Mean Source DF Squares Square F Value Pr > F Model Error Corrected Total R-Square C.V. Root MSE RESP Mean Source DF Type I SS Mean Square F Value Pr > F POPN Source DF Type III SS Mean Square F Value Pr > F POPN T for H0: Pr > |T| Std Error of Parameter Estimate Parameter=0 Estimate INTERCEPT B POPN B B B... NOTE: The X'X matrix has been found to be singular and a generalized inverse was used to solve the normal equations. Estimates followed by the letter 'B' are biased, and are not unique estimators of the parameters. GLM in SAS

Minitab Example One-way Analysis of Variance Analysis of Variance Source DF SS MS F P Factor Error Total Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev EG (* EG *) EG (* Pooled StDev = STAT > ANOVA > OneWay (Unstacked)

R Example Df Sum Sq Mean Sq F value Pr(>F) factor(egroup) e-16 *** Residuals Signif. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 > hwages <- c(5.90,5.92,5.91,5.89,5.88,5.51,5.50,5.50,5.49,5.50,5.01, 5.00,4.99,4.98,5.02) > egroup <- factor(c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3)) > wages.fit <- lm(hwages~egroup) > anova(wages.fit) Function: lm( )

R Example Call: lm(formula = hwages ~ egroup) Residuals: Min 1Q Median 3Q Max e e e e e-02 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) < 2e-16 *** egroup e-15 *** egroup < 2e-16 *** --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 12 degrees of freedom Multiple R-Squared: , Adjusted R-squared: F-statistic: 5545 on 2 and 12 DF, p-value: < 2.2e-16 > summary(wages.fit) (Intercept) egroup2 egroup

A Nonparametric Alternative to the AOV Test: The Kruskal - Wallis Test (§8.5). What can we do if the normality assumption is rejected in the one-way AOV test? We can use the standard nonparametric alternative: the Kruskal-Wallis Test. This is an extension of the Wilcoxon Rank Sum Test to more than two samples.

Kruskal - Wallis Test Extension of the rank-sum test for t=2 to the t>2 case. H 0 : The center of the t groups are identical. H a : Not all centers are the same. Test Statistic: T i denotes the sum of the ranks for the measurements in sample i after the combined sample measurements have been ranked. Reject if H >  2 (t-1),  With large numbers of ties in the ranks of the sample measurements use: where t j is the number of observations in the j th group of tied ranks.

options ls=78 ps=49 nodate; data OneWay; input popn resp ; cards; ; run; proc print; run; Proc npar1way; class popn; var resp; run; OBS POPN RESP

Analysis of Variance for Variable resp Classified by Variable popn popn N Mean ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Source DF Sum of Squares Mean Square F Value Pr > F ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Among <.0001 Within Average scores were used for ties.

The NPAR1WAY Procedure Wilcoxon Scores (Rank Sums) for Variable resp Classified by Variable popn Sum of Expected Std Dev Mean popn N Scores Under H0 Under H0 Score ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Average scores were used for ties. Kruskal-Wallis Test Chi-Square DF 2 Pr > Chi-Square

Kruskal-Wallis Test: RESP versus POPN Kruskal-Wallis Test on RESP POPN N Median Ave Rank Z Overall H = DF = 2 P = H = DF = 2 P = (adjusted for ties) MINITAB Stat > Nonparametrics > Kruskal-Wallis

> kruskal.test(hwages,factor(egroup)) Kruskal-Wallis rank sum test data: hwages and factor(egroup) Kruskal-Wallis chi-squared = , df = 2, p-value = R kruskal.test( )

SPSS