F73DA2 INTRODUCTORY DATA ANALYSIS ANALYSIS OF VARIANCE.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Forecasting Using the Simple Linear Regression Model and Correlation
Inference for Regression
Regression Analysis Simple Regression. y = mx + b y = a + bx.
Design of Experiments and Analysis of Variance
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
© 2010 Pearson Prentice Hall. All rights reserved Single Factor ANOVA.
1 1 Slide © 2009, Econ-2030 Applied Statistics-Dr Tadesse Chapter 10: Comparisons Involving Means n Introduction to Analysis of Variance n Analysis of.
Copyright ©2011 Brooks/Cole, Cengage Learning Analysis of Variance Chapter 16 1.
Chapter 10 Simple Regression.
Lecture 23: Tues., Dec. 2 Today: Thursday:
Statistics Are Fun! Analysis of Variance
Lesson #23 Analysis of Variance. In Analysis of Variance (ANOVA), we have: H 0 :  1 =  2 =  3 = … =  k H 1 : at least one  i does not equal the others.
Chapter 3 Analysis of Variance
1 Pertemuan 13 Uji Koefisien Korelasi dan Regresi Matakuliah: A0392 – Statistik Ekonomi Tahun: 2006.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter Topics Types of Regression Models
8. ANALYSIS OF VARIANCE 8.1 Elements of a Designed Experiment
13-1 Designing Engineering Experiments Every experiment involves a sequence of activities: Conjecture – the original hypothesis that motivates the.
Simple Linear Regression Analysis
One-way Between Groups Analysis of Variance
Stat Today: Multiple comparisons, diagnostic checking, an example After these notes, we will have looked at (skip figures 1.2 and 1.3, last.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Chapter 7 Forecasting with Simple Regression
Introduction to Regression Analysis, Chapter 13,
F-Test ( ANOVA ) & Two-Way ANOVA
Regression Analysis (2)
1 1 Slide © 2006 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2005 Thomson/South-Western Chapter 13, Part A Analysis of Variance and Experimental Design n Introduction to Analysis of Variance n Analysis.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
1 1 Slide Analysis of Variance Chapter 13 BA 303.
PROBABILITY & STATISTICAL INFERENCE LECTURE 6 MSc in Computing (Data Analytics)
© 1998, Geoff Kuenning General 2 k Factorial Designs Used to explain the effects of k factors, each with two alternatives or levels 2 2 factorial designs.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Chapter 10 Analysis of Variance.
Psychology 301 Chapters & Differences Between Two Means Introduction to Analysis of Variance Multiple Comparisons.
1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.
Copyright © 2004 Pearson Education, Inc.
Testing Hypotheses about Differences among Several Means.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Section 9-1: Inference for Slope and Correlation Section 9-3: Confidence and Prediction Intervals Visit the Maths Study Centre.
Statistical models in R - Part II  R has several statistical functions packages  We have already covered a few of the functions  t-tests (one- and two-sample,
Simple Linear Regression ANOVA for regression (10.2)
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 13 Multiple Regression
Previous Lecture: Phylogenetics. Analysis of Variance This Lecture Judy Zhong Ph.D.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Comparing Counts.  A test of whether the distribution of counts in one categorical variable matches the distribution predicted by a model is called a.
VI. Regression Analysis A. Simple Linear Regression 1. Scatter Plots Regression analysis is best taught via an example. Pencil lead is a ceramic material.
Chapter Seventeen. Figure 17.1 Relationship of Hypothesis Testing Related to Differences to the Previous Chapter and the Marketing Research Process Focus.
Lecture 10: Correlation and Regression Model.
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics S eventh Edition By Brase and Brase Prepared by: Lynn Smith.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
Econ 3790: Business and Economic Statistics Instructor: Yogesh Uppal
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Example x y We wish to check for a non zero correlation.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
1 Chapter 5.8 What if We Have More Than Two Samples?
Econ 3790: Business and Economic Statistics
Prepared by Lee Revere and John Large
Chapter 10 – Part II Analysis of Variance
One-way Analysis of Variance
Presentation transcript:

F73DA2 INTRODUCTORY DATA ANALYSIS ANALYSIS OF VARIANCE

regression: x is a quantitative explanatory variable

type is a qualitative variable (a factor)

Company 1: Company 2: Company 3: Illustration

Explanatory variable qualitative i.e. categorical - a factor Analysis of variance  linear models for comparative experiments

► The display is different if “type” is declared as a factor. Using Factor Commands

► We could check for significant differences between two companies using t tests. ► t.test(company1,company2) ► This calculates a 95% Confidence Interval for difference between means

Includes 0 so no significant difference

Instead use an analysis of variance technique

Taking all the results together

We calculate the total variation for the system which is the sum of squares of individual values –

► We can also work out the sum of squares within each company This sums to

► The total sum of squares of the situation must be made up of a contribution from variation WITHIN the companies and variation BETWEEN the companies. ► This means that the variation between the companies equals

► This can all be shown in an analysis of variance table which has the format:

Source of variation Degrees of freedom Sum of squares Mean squares F Between treatments k  1 SS B SS B /(k  1) Residual (within treatments) n  k SS RES SS RES /(n  k) Total n  1 SS T

Source of variation Degrees of freedom Sum of squares Mean squares F Between treatments k  SS B /(k  1) Residual (within treatments) n  k SS RES /(n  k) Total n 

► Using the R package, the command is similar to that for linear regression

Data: y ij is the j th observation using treatment i where the errors  ij are i.i.d. N(0,s 2 ) Model : Theory

The response variables Y ij are independent Y ij ~ N(µ + τ i, σ 2 ) Constraint:

Derivation of least-squares estimators

The fitted values are the treatment means

Partitioning the observed total variation SST = SSB + SSRES SSTSSRES SSB

The following results hold The following results hold

Back to the example

Fitted values: Company 1: 320/10 = 32 Company 2: 225/8 = Company 3: 335/9 = Residuals: Company 1:  1j = y 1j - 32 Company 2:  2j = y 2j Company 3:  3j = y 3j

SS T = – /27 = SS B = (320 2 / / /9) – /27 = =  SS RES = – =

ANOVA table Source of Degrees of Sum Mean F variation freedom of squares squares Between treatments Residual Total

Testing H 0 : τ i = 0, i = 1,2,3 v H 1 : not H 0 (i.e. τ i  0 for at least one i) Under H 0, F = 3.83 on 2,24 df. P-value = P(F 2,24 > 3.83) = so we can reject H 0 at levels of testing down to 3.6%.

Conclusion Results differ among the three companies (P-value 3.6%)

The fit of the model can be investigated by examining the residuals: the residual for response yij is this is just the difference between the response and its fitted value (the appropriate sample mean).

Plotting the residuals in various ways may reveal ●a pattern (e.g. lack of randomness, suggesting that an additional, uncontrolled factor is present) ●non-normality (a transformation may help) ●heteroscedasticity (error variance differs among treatments – for example it may increase with treatment mean: again a transformation – perhaps log - may be required)

► In this example, samples are small, but one might question the validity of the assumptions of normality (Company 2) and homoscedasticity (equality of variances, Company 2 v Companies 1/3).

► plot(residuals(lm(company~type))~ fitted.values(lm(company~type)),pch=8)

► abline(h=0,lty=2)

► It is also possible to compare with an analysis using “type” as a qualitative explanatory variable ► type=c(rep(1,10),rep(2,8),rep(3,9)) ► No “factor” command

The equation is company = x type Note low R 2

Example A school is trying to grade 300 different scholarship applications. As the job is too much work for one grader, 6 are used.

Example A school is trying to grade 300 different scholarship applications. As the job is too much work for one grader, 6 are used. The scholarship committee would like to ensure that each grader is using the same grading scale, as otherwise the students aren't being treated equally. One approach to checking if the graders are using the same scale is to randomly assign each grader 50 exams and have them grade.

To illustrate, suppose we have just 27 tests and 3 graders (not 300 and 6 to simplify data entry). Furthermore, suppose the grading scale is on the range 1-5 with 5 being the best and the scores are reported as: grader grader grader

The 5% cut off for F distribution with 2,21 df is The null hypothesis cannot be rejected. No difference between markers.

ClassIIIIIIIVVVI Another Example

Source of variation dfSum of squares Mean squares F Between treatments Residual Total

Normality and homoscedasticity (equality of variance) assumptions both seem reasonable

We now wish to Calculate a 95% confidence interval for the underlying common standard deviation , using SSRES/  2 as a pivotal quantity with a  2 distribution.

It can easily be shown that the class III has the largest value of and that Class II has the smallest value of Consider performing a t test to compare these two classes

There is no contradiction between this and the ANOVA results. It is wrong to pick out the largest and the smallest of a set of treatment means, test for significance, and then draw conclusions about the set. Even if H 0 : "  all equal" is true, the sample means would differ and the largest and smallest sample means perhaps differ noticeably.