ANOVA example 4 Polychlorinated biphenyls (PCBs) previously used in the manufacture of large electrical transformers and capacitors, are extremely hazardous.

Slides:



Advertisements
Similar presentations
BPS - 5th Ed. Chapter 241 One-Way Analysis of Variance: Comparing Several Means.
Advertisements

Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
ANOVA: Analysis of Variation
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16.
Confidence Interval and Hypothesis Testing for:
© 2010 Pearson Prentice Hall. All rights reserved Single Factor ANOVA.
Copyright ©2011 Brooks/Cole, Cengage Learning Analysis of Variance Chapter 16 1.
Lecture 23: Tues., Dec. 2 Today: Thursday:
Part I – MULTIVARIATE ANALYSIS
Statistics 303 Chapter 12 ANalysis Of VAriance. ANOVA: Comparing Several Means The statistical methodology for comparing several means is called analysis.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Copyright © 2006 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide Are the Means of Several Groups Equal? Ho:Ha: Consider the following.
Hypothesis Testing. Introduction Always about a population parameter Attempt to prove (or disprove) some assumption Setup: alternate hypothesis: What.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #13.
Inferences About Process Quality
Copyright © 2010 Pearson Education, Inc. Chapter 24 Comparing Means.
5-3 Inference on the Means of Two Populations, Variances Unknown
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Chapter 12: Analysis of Variance
The Data Analysis Plan. The Overall Data Analysis Plan Purpose: To tell a story. To construct a coherent narrative that explains findings, argues against.
Copyright © 2009 Pearson Education, Inc. Chapter 28 Analysis of Variance.
F-Test ( ANOVA ) & Two-Way ANOVA
Copyright © 2010, 2007, 2004 Pearson Education, Inc. *Chapter 28 Analysis of Variance.
PS 225 Lecture 15 Analysis of Variance ANOVA Tables.
HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 14 Analysis.
Experimental Statistics - week 2
STAT 3130 Statistical Methods I Session 2 One Way Analysis of Variance (ANOVA)
Jeopardy Hypothesis Testing T-test Basics T for Indep. Samples Z-scores Probability $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.
Hypothesis testing – mean differences between populations
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Basics of ANOVA Why ANOVA Assumptions used in ANOVA Various forms of ANOVA Simple ANOVA tables Interpretation of values in the table Exercises.
5.1 Basic Estimation Techniques  The relationships we theoretically develop in the text can be estimated statistically using regression analysis,  Regression.
1 Tests with two+ groups We have examined tests of means for a single group, and for a difference if we have a matched sample (as in husbands and wives)
Copyright © Cengage Learning. All rights reserved. 10 Inferences Involving Two Populations.
STA291 Statistical Methods Lecture 31. Analyzing a Design in One Factor – The One-Way Analysis of Variance Consider an experiment with a single factor.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 23, Slide 1 Chapter 23 Comparing Means.
More About Significance Tests
One Sample Inf-1 If sample came from a normal distribution, t has a t-distribution with n-1 degrees of freedom. 1)Symmetric about 0. 2)Looks like a standard.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 24 Comparing Means.
 The idea of ANOVA  Comparing several means  The problem of multiple comparisons  The ANOVA F test 1.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
ANOVA (Analysis of Variance) by Aziza Munir
Basic concept Measures of central tendency Measures of central tendency Measures of dispersion & variability.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
Comparing Three or More Means ANOVA (One-Way Analysis of Variance)
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Analysis of Variance 1 Dr. Mohammed Alahmed Ph.D. in BioStatistics (011)
Two-Sample Hypothesis Testing. Suppose you want to know if two populations have the same mean or, equivalently, if the difference between the population.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.1 One-Way ANOVA: Comparing.
AP Statistics Chapter 24 Comparing Means.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 24 Comparing Means.
4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}
Copyright © Cengage Learning. All rights reserved. 12 Analysis of Variance.
Summary.
Objectives (IPS Chapter 12.1) Inference for one-way ANOVA  Comparing means  The two-sample t statistic  An overview of ANOVA  The ANOVA model  Testing.
Hypothesis test flow chart frequency data Measurement scale number of variables 1 basic χ 2 test (19.5) Table I χ 2 test for independence (19.9) Table.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Analysis of Variance STAT E-150 Statistical Methods.
Comparing Means Chapter 24. Plot the Data The natural display for comparing two groups is boxplots of the data for the two groups, placed side-by-side.
+ Unit 6: Comparing Two Populations or Groups Section 10.2 Comparing Two Means.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 23, Slide 1 Chapter 24 Comparing Means.
Statistical Inferences for Variance Objectives: Learn to compare variance of a sample with variance of a population Learn to compare variance of a sample.
F73DA2 INTRODUCTORY DATA ANALYSIS ANALYSIS OF VARIANCE.
The 2 nd to last topic this year!!.  ANOVA Testing is similar to a “two sample t- test except” that it compares more than two samples to one another.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Lecture notes 13: ANOVA (a.k.a. Analysis of Variance)
Chapter 13 f distribution and 0ne-way anova
Presentation transcript:

ANOVA example 4 Polychlorinated biphenyls (PCBs) previously used in the manufacture of large electrical transformers and capacitors, are extremely hazardous contaminants when released into the environment. Samples of fish were taken from each of four rivers and analyzed for PCB concentration (in ppm)

Question 4 Do the data provide sufficient evidence to indicate differences in the mean PCB concentration in fish for the four rivers? 4 Hypotheses: –H 0 : 1 = 2 = 3 = 4 –H A :the means are not all equal (at least one mean is not equal to the others.

First Step 4 Examine the data. What does this mean? 4 Boxplots 4 Histograms 4 Normal Quantile plots –Note command line language to do a grid of 4 probability plots at once. Go to File-->New--> Script File. Paste them into the script file window. Press F10 and the 4 plots are produced automatically. –OR: just paste these in the command line. par(mfrow=c(2,2)) for (i in 1:4) { qqnorm(PCBfish[,1][PCBfish[,2]==i],ylab="Data quantiles") title (paste("River ",i,sep=""))}

Can we do an ANOVA? What are the criteria? 4 Normally distributed 4 Equal standard deviations 4 Independent samples across treatments –What might this look like if it weren’t true? –Rivers connected? 4 Independent sample within treatments –What might this look like if it weren’t true? –Clustering?

Transformations (p. 65 & 69 of Sleuth) 4 Log transformation. –Why try this? –Ratio of largest to smallest > 10, data are skewed, and the group with the larger average has the larger spread 4 When do reciprocal –waiting times 4 When do square root? –Data are counts

Better? 4 Why or why not? 4 Standard deviations are much more similar

Do an ANOVA 4 Read table: –sum of squares –S pooled and s pooled 2 –F-value –p-value 4 What are your conclusions?

Conclusions 4 We can reject the null hypothesis of no difference in these group means. 4 At least one of the means is different from the others (is this statement the same as accepting the alternative hypothesis?) 4 “Convincing evidence exists that median PCB concentration of fish in these rivers is different (p-value of 0.002; analysis of variance F-test).”

Compare just two rivers... 4 Average and 95% CI for the difference in PCB in fish between Rivers 1 and Logged data, so… – = river2 - river1 =-0.43 –e =0.65 –The median concentration of PCB in fish in River 1 is 0.65 times that of fish in River 2.

Is this significant? 4 Two-sided, two-sample T-test: 4 Must do calculation of t-statistic (and p- value) by hand, because need to use s pooled to calculate SE. 4 S pool 4 SE:

Hypothesis test 4 Test the hypothesis that River1-River2=0 –Estimate/SE: –Suggestive only of a difference (in fact, at the 0.05 level, we would not reject the null), but we’ll still do a CI for practice

95% CI 4 95% CI for the difference in group means –qt(0.975,88); [1] –-0.43±(1.99)(0.28)-->(-0.98,0.13) –e =0.37;e 0.13 =1.14 –Fish in River 1 have between 0.39 to 1.14 times as much PCB in their muscle as fish in River 2. (Are we surprised that this covers 1?)

ANOVA Explanation 4 Reduced model=equal means model –All these rivers have the same mean PCB concentration in the fish: null hypothesis 4 How wrong are we for this hypothesis? –Residual error is how wrong we are –Large residuals here mean the null hypothesis fits poorly

Graph of PCB in Each River: Equal Means =1.64 } Residual for highest point in River 1 to Equal Means average

ANOVA by hand (conceptual)

Graph of PCB in Each River: Separate Means } Residual for highest point in River 1 to Separate Means Model

ANOVA by hand (conceptual)

Model Inaccuracy 4 If the null hypothesis is correct, –The two models should be about equal in their ability to explain the data –AND, the magnitudes of the residuals should be about the same 4 If the null hypothesis is incorrect –The magnitudes of the residuals from the equal- means model will tend to be larger –Their larger sizes reflect model inaccuracy

Residual Sum of Squares 4 We need a single summary of the residuals for a particular model.  Statisticians have chosen the sum of the squared residuals -- the residual sum of squares

Extra Sum of Squares 4 The error from your reduced (equal means) model - your error from your full (separate means) model is the difference in sizes of residuals from the full and reduced model. 4 This is called the Extra Sum of Squares 4 Another way to say this is: that the ESS measures the amount of unexplained variability in the reduced model that is explained by the full model. 4 How much better is it to say that each river has its own mean than to say that all the rivers have their own mean? 4 Thus: ESS=RSS reduced -RSS full

F-Statistic 4 How much difference in the models is enough to say it is significant (the same questions we’ve asked through t-tests, etc)? 4 We compare these two levels of unexplained variability in an F-test. 4 We take their difference, divide by the extra degrees of freedom, and scale them by the best estimate we have of variance

F-test (cont) 4 Large F-statistics are associated with large differences in the size of residuals from the two models. 4 This is evidence against the reduced model (null hyp) and in favor of the full model (different means). 4 This test is summarized by its p-value (based on an F-distribution).

ANOVA Table

S+ Printout 4 Residual standard error: Df Sum of Sq Mean Sq F Value Pr(F) river Residuals We can reject the null hypothesis of no difference in medians. At least one river has a different median PCB concentration 4 For some reason, S+ does not print out the reduced model information (total) that is on the ANOVA table we make by hand.