1 Always be mindful of the kindness and not the faults of others.

Slides:



Advertisements
Similar presentations
Chapter 12 ANALYSIS OF VARIANCE.
Advertisements

Analysis and Interpretation Inferential Statistics ANOVA
Nonparametric tests and ANOVAs: What you need to know.
Nested Designs Study vs Control Site. Nested Experiments In some two-factor experiments the level of one factor, say B, is not “cross” or “cross classified”
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
© 2010 Pearson Prentice Hall. All rights reserved Single Factor ANOVA.
Statistics for Managers Using Microsoft® Excel 5th Edition
Independent Sample T-test Formula
Part I – MULTIVARIATE ANALYSIS
Be humble in our attribute, be loving and varying in our attitude, that is the way to live in heaven.
Chapter 7 Analysis of ariance Variation Inherent or Natural Variation Due to the cumulative effect of many small unavoidable causes. Also referred to.
Chapter 11 Analysis of Variance
Lesson #23 Analysis of Variance. In Analysis of Variance (ANOVA), we have: H 0 :  1 =  2 =  3 = … =  k H 1 : at least one  i does not equal the others.
Chapter Topics The Completely Randomized Model: One-Factor Analysis of Variance F-Test for Difference in c Means The Tukey-Kramer Procedure ANOVA Assumptions.
Statistics for Managers Using Microsoft® Excel 5th Edition
Every achievement originates from the seed of determination. 1Random Effect.
PSY 307 – Statistics for the Behavioral Sciences
Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability.
Chapter 17 Analysis of Variance
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 15 Analysis of Variance.
Inferences About Process Quality
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 10-1 Chapter 10 Analysis of Variance Statistics for Managers Using Microsoft.
Chap 10-1 Analysis of Variance. Chap 10-2 Overview Analysis of Variance (ANOVA) F-test Tukey- Kramer test One-Way ANOVA Two-Way ANOVA Interaction Effects.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Chapter 12: Analysis of Variance
One-Factor Experiments Andy Wang CIS 5930 Computer Systems Performance Analysis.
INFERENTIAL STATISTICS: Analysis Of Variance ANOVA
© 2003 Prentice-Hall, Inc.Chap 11-1 Analysis of Variance IE 340/440 PROCESS IMPROVEMENT THROUGH PLANNED EXPERIMENTATION Dr. Xueping Li University of Tennessee.
© 2002 Prentice-Hall, Inc.Chap 9-1 Statistics for Managers Using Microsoft Excel 3 rd Edition Chapter 9 Analysis of Variance.
PSY 307 – Statistics for the Behavioral Sciences Chapter 16 – One-Factor Analysis of Variance (ANOVA)
One-Factor Analysis of Variance A method to compare two or more (normal) population means.
Chapter 10 Analysis of Variance.
Testing Multiple Means and the Analysis of Variance (§8.1, 8.2, 8.6) Situations where comparing more than two means is important. The approach to testing.
Analysis of Variance (ANOVA) Randomized Block Design.
One-Way Analysis of Variance … to compare 2 or population means.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 10-1 Chapter 10 Analysis of Variance Statistics for Managers Using Microsoft.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
One-Way Analysis of Variance
Be humble in our attribute, be loving and varying in our attitude, that is the way to live in heaven.
Lecture 9-1 Analysis of Variance
1 Analysis of Variance & One Factor Designs Y= DEPENDENT VARIABLE (“yield”) (“response variable”) (“quality indicator”) X = INDEPENDENT VARIABLE (A possibly.
Previous Lecture: Phylogenetics. Analysis of Variance This Lecture Judy Zhong Ph.D.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests and Nonparametric Tests Statistics for.
Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability.
Kruskal-Wallis H TestThe Kruskal-Wallis H Test is a nonparametric procedure that can be used to compare more than two populations in a completely randomized.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics S eventh Edition By Brase and Brase Prepared by: Lynn Smith.
Chapter 4 Analysis of Variance
CHAPTER 12 ANALYSIS OF VARIANCE Prem Mann, Introductory Statistics, 7/E Copyright © 2010 John Wiley & Sons. All right reserved.
Copyright © 2016, 2013, 2010 Pearson Education, Inc. Chapter 10, Slide 1 Two-Sample Tests and One-Way ANOVA Chapter 10.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
CHAPTER 10 ANOVA - One way ANOVa.
Formula for Linear Regression y = bx + a Y variable plotted on vertical axis. X variable plotted on horizontal axis. Slope or the change in y for every.
1 Love does not come by demanding from others, but it is a self initiation.
DSCI 346 Yamasaki Lecture 4 ANalysis Of Variance.
1 Two Factor Designs Consider studying the impact of two factors on the yield (response): Here we have R = 3 rows (levels of the Row factor), C = 4 (levels.
Chapter 11 Analysis of Variance
Chapter 14 Introduction to Multiple Regression
Chapter 12 Chi-Square Tests and Nonparametric Tests
Statistics for Managers Using Microsoft Excel 3rd Edition
The greatest blessing in life is in giving and not taking.
Chapter 10: Analysis of Variance: Comparing More Than Two Means
CHAPTER 12 ANALYSIS OF VARIANCE
Chapter 13 Simple Linear Regression
Chapter 11 Analysis of Variance
Always be mindful of the kindness and not the faults of others.
Be humble in our attribute, be loving and varying in our attitude, that is the way to live in heaven.
Presentation transcript:

1 Always be mindful of the kindness and not the faults of others.

2 One-way Anova: Inferences about More than Two Population Means Model and test for one- way anova Assumption checking Nonparamateric alternative

3 Analysis of Variance & One Factor Designs (One-Way ANOVA) Y= RESPONSE VARIABLE (of numerical type) (e.g. battery lifetime) X = EXPLANATORY VARIABLE (of categorical type) (A possibly influential FACTOR) (e.g. brand of battery) OBJECTIVE: To determine the impact of X on Y

Completely Randomized Design (CRD) 4 Goal: to study the effect of Factor X The same # of observations are taken randomly and independently from the individuals at each level of Factor X i.e. n 1 =n 2 =…n c (c levels)

5 Example: Y = LIFETIME (HOURS) BRAND 3 replications per level

Analysis of Variance 6

7 Statistical Model C “levels” OF BRAND R observations for each level Y 11 Y 12 Y 1R Y ij Y 21 Y cI 1 2 C 1 2 R Y ij =  +  i +  ij i = 1,....., C j = 1,....., R Y cR

8 Where  = OVERALL AVERAGE i = index for FACTOR (Brand) LEVEL j  = index for “replication”  i = Differential effect associated with i th level of X (Brand i) =  i –  and  ij = “noise” or “error” due to other factors associated with the (i,j) th data value.  i = AVERAGE associated with i th level of X (brand i)  = AVERAGE of  i ’s.

9 Y ij =  +  i +  ij By definition,   i = 0 C i=1 The experiment produces R x C Y ij data values. The analysis produces estimates of         c . (We can then get estimates of the  ij by subtraction).

10 Y =  Y i / C = “GRAND MEAN” (assuming same # data points in each column) (otherwise, Y = mean of all the data) i=1 c Let Y 1, Y 2, etc., be level means

11 MODEL: Y ij =  +  i +  ij Y estimates  Y i - Y estimates   i (=  i –  ) (for all i) These estimates are based on Gauss’ (1796) PRINCIPLE OF LEAST SQUARES and on COMMON SENSE

12 MODEL: Y ij =  +  j +  ij If you insert the estimates into the MODEL, (1) Y ij = Y + (Y j - Y ) +  ij. it follows that our estimate of  ij is (2)  ij = Y ij – Y j, called residual < <

13 Then, Y ij = Y + (Y i - Y ) + ( Y ij - Y i ) or, (Y ij - Y ) = (Y i - Y ) + (Y ij - Y i ) { { { (3) TOTAL VARIABILITY in Y = Variability in Y associated with X Variability in Y associated with all other factors +

14 If you square both sides of (3), and double sum both sides (over i and j), you get, [after some unpleasant algebra, but lots of terms which “cancel”]  (Y ij - Y ) 2 = R  (Y i - Y ) 2 +  (Y ij - Y i ) 2 C R i=1 j=1 { { i=1 CC R i=1 j=1 TSS TOTAL SUM OF SQUARES ==== SSB SUM OF SQUARES BETWEEN SAMPLES ++++ SSW (SSE) SUM OF SQUARES WITHIN SAMPLES ( ( ( ( ( (

15 ANOVA TABLE SOURCE OF VARIABILITY SSQDF Mean square (M.S.) Between samples (due to brand) Within samples (due to error) SSBC - 1 MSB SSB C - 1 SSW (R - 1) C SSW (R-1)C = MSW = TOTAL TSS RC -1

16 Example: Y = LIFETIME (HOURS) BRAND 3 replications per level SSB = 3 ( [ ] 2 + [ ] [ ] 2 ) = 3 (23.04) = 69.12

17 ( ) 2 =.64 ( ) 2 =.16 ( ) 2 = 2.56 ( ) 2 = 5.76 ( ) 2 =.64 ( ) 2 = 0 ( ) 2 = 2.56 ( ) 2 =.16 ( ) 2 = Total of ( ), SSW = SSW =?

18 ANOVA TABLE Source of Variability SSQ df M.S. BRAND ERROR = = 2 (8) TOTAL = (3 8) -1

19 We can show: E (MSB) =  2 + “V COL ” { MEASURE OF DIFFERENCES AMONG LEVEL MEANS R C-1  (  i -  ) 2 { ii ( ( E (MSW) =  2 (Assuming Y ij follows N(  j  2 ) and they are independent)

20 E ( MSB C ) =  2 + V COL E ( MSW ) =  2 This suggests that if MSB C MSW > 1, There’s some evidence of non- zero V COL, or “level of X affects Y” if MSB C MSW < 1, No evidence that V COL > 0, or that “level of X affects Y”

21 With H O : Level of X has no impact on Y H I : Level of X does have impact on Y, We need MSB C MSW > > 1 to reject H O.

22 More Formally, H O :  1 =  2 =  c = 0 H I : not all  j = 0 OR H O :  1 =  2 =  c H I : not all  j are EQUAL (All level means are equal)

23 The distribution of MSB MSW = “F calc ”, is The F - distribution with (C-1, (R-1)C) degrees of freedom Assuming H O true. C = Table Value 

24 In our problem: ANOVA TABLE Source of Variability SSQ df M.S. BRAND ERROR = F calc 3.38

25  =.05 C = F table: table 8 (7,16 DF)

26 Hence, at  =.05, Reject H o. (i.e., Conclude that level of BRAND does have an impact on battery lifetime.)

27 MINITAB INPUT lifebrand

28 ONE FACTOR ANOVA (MINITAB) Analysis of Variance for life Source DF SS MS F P brand Error Total MINITAB: STAT>>ANOVA>>ONE-WAY Estimate of the common variance  ^2

29

30 Assumptions MODEL: Y ij =  +  i +  ij 1.) the  ij are indep. random variables 2.) Each  ij is Normally Distributed E(  ij ) = 0 for all i, j 3.)  2 (  ij ) = constant for all i, j Normality plot & test Residual plot & test Run order plot

31 Diagnosis: Normality The points on the normality plot must more or less follow a line to claim “normal distributed”. There are statistic tests to verify it scientifically. The ANOVA method we learn here is not sensitive to the normality assumption. That is, a mild departure from the normal distribution will not change our conclusions much. Normal probability plot & normality test of residuals

32 Minitab: stat>>basic statistics>>normality test

33 Diagnosis: Constant Variances The points on the residual plot must be more or less within a horizontal band to claim “constant variances”. There are statistic tests to verify it scientifically. The ANOVA method we learn here is not sensitive to the constant variances assumption. That is, slightly different variances within groups will not change our conclusions much. Tests and Residual plot: fitted values vs. residuals

34 Minitab: Stat >> Anova >> One-way

35 Minitab: Stat>> Anova>> Test for Equal variances

36 Diagnosis: Randomness/Independence The run order plot must show no “systematic” patterns to claim “randomness”. There are statistic tests to verify it scientifically. The ANOVA method is sensitive to the randomness assumption. That is, a little level of dependence between data points will change our conclusions a lot. Run order plot: order vs. residuals

37 Minitab: Stat >> Anova >> One-way

38 KRUSKAL - WALLIS TEST (Non - Parametric Alternative) H O : The probability distributions are identical for each level of the factor H I : Not all the distributions are the same

39 Brand A B C BATTERY LIFETIME (hours) (each column rank ordered, for simplicity) Mean: (here, irrelevant!!)

40 H O : no difference in distribution among the three brands with respect to battery lifetime H I : At least one of the 3 brands differs in distribution from the others with respect to lifetime

41 Brand A B C 32 (29) 32 (29) 28 (24) 30 (26.5) 32 (29) 21 (18) 30 (26.5) 26 (22) 15 (10.5) 29 (25) 26 (22) 15 (10.5) 26 (22) 22 (19) 14 (7) 23 (20) 20 (16.5) 14 (7) 20 (16.5) 19 (14.5) 14 (7) 19 (14.5) 16 (12) 11 (3) 18 (13) 14 (7) 9 (2) 12 (4) 14 (7) 8 (1) T 1 = 197 T 2 = 178 T 3 = 90 n 1 = 10 n 2 = 10 n 3 = 10 Ranks in ( )

42 TEST STATISTIC: H = 12 N (N + 1)  (T j 2 /n j ) - 3 (N + 1) n j = # data values in column j N =  n j K = # Columns (levels) T j = SUM OF RANKS OF DATA ON COL j When all DATA COMBINED (There is a slight adjustment in the formula as a function of the number of ties in rank.) K j = 1 K

43 H = [ (31) [ - 3 (31) = 8.41 (with adjustment for ties, we get 8.46)

44 We can show that, under H O, H is well approximated by a  2 distribution with df = K - 1. What do we do with H? Here, df = 2, and at  =.05, the critical value = 5.99  2  df df F   df, = = H  =.05 Reject H O ; conclude that mean lifetime NOT the same for all 3 BRANDS 8

45 Kruskal-Wallis Test: life versus brand Kruskal-Wallis Test on life brand N Median AveRank Z Overall H = DF = 7 P = H = DF = 7 P = (adjusted for ties) Minitab: Stat >> Nonparametrics >> Kruskal- Wallis