Factorial ANOVA More than one categorical explanatory variable STA305 Spring 2014.

Slides:



Advertisements
Similar presentations
STA305 Spring 2014 This started with excerpts from STA2101f13
Advertisements

 Population multiple regression model  Data for multiple regression  Multiple linear regression model  Confidence intervals and significance tests.
Factorial ANOVA More than one categorical explanatory variable.
Introduction to Regression with Measurement Error STA431: Spring 2015.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Three-dimensional tables (Please Read Chapter 3).
Sample Size Power and other methods. Non-central Chisquare.
4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.
Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.
Logistic Regression STA302 F 2014 See last slide for copyright information 1.
Lecture 23: Tues., Dec. 2 Today: Thursday:
Statistics 350 Lecture 16. Today Last Day: Introduction to Multiple Linear Regression Model Today: More Chapter 6.
ANalysis Of VAriance (ANOVA) Comparing > 2 means Frequently applied to experimental data Why not do multiple t-tests? If you want to test H 0 : m 1 = m.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Chapter 19 Data Analysis Overview
Ch. 14: The Multiple Regression Model building
Introduction to Regression with Measurement Error STA431: Spring 2013.
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Log-linear Models For 2-dimensional tables. Two-Factor ANOVA (Mean rot of potatoes) Bacteria Type Temp123 1=Cool 2=Warm.
Introduction to Linear Regression and Correlation Analysis
Chapter 13: Inference in Regression
Chapter 14 Introduction to Multiple Regression Sections 1, 2, 3, 4, 6.
Normal Linear Model STA211/442 Fall Suggested Reading Faraway’s Linear Models with R, just start reading. Davison’s Statistical Models, Chapter.
One-Factor Experiments Andy Wang CIS 5930 Computer Systems Performance Analysis.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.2 Estimating Differences.
SAS: The last of the great mainframe stats packages STA431 Winter/Spring 2015.
Introduction to Regression with Measurement Error STA302: Fall/Winter 2013.
Moderation & Mediation
Applications The General Linear Model. Transformations.
Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.
The Use of Dummy Variables. In the examples so far the independent variables are continuous numerical variables. Suppose that some of the independent.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Analyzing Data: Comparing Means Chapter 8. Are there differences? One of the fundament questions of survey research is if there is a difference among.
Logistic Regression STA2101/442 F 2014 See last slide for copyright information.
Lab 5 instruction.  a collection of statistical methods to compare several groups according to their means on a quantitative response variable  Two-Way.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Multiple Regression BPS chapter 28 © 2006 W.H. Freeman and Company.
Chapter 13 Multiple Regression
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Single-Factor Studies KNNL – Chapter 16. Single-Factor Models Independent Variable can be qualitative or quantitative If Quantitative, we typically assume.
Confirmatory Factor Analysis Part Two STA431: Spring 2013.
Categorical Independent Variables STA302 Fall 2013.
Experiment Design Overview Number of factors 1 2 k levels 2:min/max n - cat num regression models2k2k repl interactions & errors 2 k-p weak interactions.
Chapter 12 Introduction to Analysis of Variance PowerPoint Lecture Slides Essentials of Statistics for the Behavioral Sciences Eighth Edition by Frederick.
ANCOVA. What is Analysis of Covariance? When you think of Ancova, you should think of sequential regression, because really that’s all it is Covariate(s)
STA302: Regression Analysis. Statistics Objective: To draw reasonable conclusions from noisy numerical data Entry point: Study relationships between variables.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
General Linear Model.
ANOVA, Regression and Multiple Regression March
Handout Twelve: Design & Analysis of Covariance
Categorical Independent Variables STA302 Fall 2015.
STA302: Regression Analysis. Statistics Objective: To draw reasonable conclusions from noisy numerical data Entry point: Study relationships between variables.
ANCOVA.
ANOVA and Multiple Comparison Tests
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
The Sample Variation Method A way to select sample size This slide show is a free open source document. See the last slide for copyright information.
Logistic Regression For a binary response variable: 1=Yes, 0=No This slide show is a free open source document. See the last slide for copyright information.
1 Statistical Analysis Professor Lynne Stokes Department of Statistical Science Lecture #10 Testing the Statistical Significance of Factor Effects.
Chapter 12 Introduction to Analysis of Variance
Chapter 14 Repeated Measures and Two Factor Analysis of Variance PowerPoint Lecture Slides Essentials of Statistics for the Behavioral Sciences Seventh.
The simple linear regression model and parameter estimation
Poisson Regression STA 2101/442 Fall 2017
Interactions and Factorial ANOVA
Logistic Regression Part One
Multinomial Logit Models
CHAPTER 29: Multiple Regression*
Review of Hypothesis Testing
Presentation transcript:

Factorial ANOVA More than one categorical explanatory variable STA305 Spring 2014

Optional Background Reading Chapter 7 of Data analysis with SAS 2

Factorial ANOVA More than one categorical explanatory variable. Categorical explanatory variables are called factors. More than one at a time If there are observations at all combinations of explanatory variable values, it’s called a complete factorial design (as opposed to a fractional factorial). 3

The potato study Cases are potatoes Inoculate with bacteria, store for a fixed time period. Response variable is rotten surface area in mm. Two explanatory variables, randomly assigned –Bacteria Type (1, 2, 3) –Temperature (1=Cool, 2=Warm) 4

Two-factor design Bacteria Type Temp123 1=Cool 2=Warm Six treatment conditions 5

Factorial experiments Allow more than one factor to be investigated in the same study: Efficiency! Allow the scientist to see whether the effect of an explanatory variable depends on the value of another explanatory variable: Interactions Thank you again, Mr. Fisher. 6

Normal with equal variance and conditional (cell) means Bacteria Type Temp123 1=Cool 2=Warm 7

Tests Main effects: Differences among marginal means Interactions: Differences between differences (What is the effect of Factor A? It depends on the level of Factor B.) 8

To understand the interaction, plot the means 9

Either Way 10

Non-parallel profiles = Interaction 11

Main effects for both variables, no interaction 12

Main effect for Bacteria only 13

Main Effect for Temperature Only 14

Both Main Effects, and the Interaction 15

Should you interpret the main effects? 16

Testing Contrasts Differences between marginal means are definitely contrasts Interactions are also sets of contrasts 17

Interactions are sets of Contrasts 18

Interactions are sets of Contrasts 19

Equivalent statements The effect of A depends upon B The effect of B depends on A 20

Three factors: A, B and C There are three (sets of) main effects: One each for A, B, C There are three two-factor interactions –A by B (Averaging over C) –A by C (Averaging over B) –B by C (Averaging over A) There is one three-factor interaction: AxBxC 21

Meaning of the 3-factor interaction The form of the A x B interaction depends on the value of C The form of the A x C interaction depends on the value of B The form of the B x C interaction depends on the value of A These statements are equivalent. Use the one that is easiest to understand. 22

To graph a three-factor interaction Make a separate mean plot (showing a 2-factor interaction) for each value of the third variable. In the potato study, a graph for each type of potato 23

Four-factor design Four sets of main effects Six two-factor interactions Four three-factor interactions One four-factor interaction: The nature of the three-factor interaction depends on the value of the 4th factor There is an F test for each one And so on … 24

As the number of factors increases The higher-way interactions get harder and harder to understand All the tests are still tests of sets of contrasts (differences between differences of differences …) But it gets harder and harder to write down the contrasts Effect coding becomes easier 25

Effect coding BactB1B1 B2B TemperatureT 1=Cool 1 2=Warm 26

Interaction effects are products of dummy variables The A x B interaction: Multiply each dummy variable for A by each dummy variable for B Use these products as additional explanatory variables in the multiple regression The A x B x C interaction: Multiply each dummy variable for C by each product term from the A x B interaction Test the sets of product terms simultaneously 27

Make a table BactTempB1B1 B2B2 TB1TB1TB2TB2T

Cell and Marginal Means Bacteria Type Tmp123 1=C 2=W 29

We see Intercept is the grand mean Regression coefficients for the dummy variables are deviations of the marginal means from the grand mean What about the interactions? 30

A bit of algebra shows 31

Factorial ANOVA with effect coding is pretty automatic You don’t have to make a table unless asked. It always works as you expect it will. Hypothesis tests are the same as testing sets of contrasts. 32

Again Intercept is the grand mean. Regression coefficients for the dummy variables are deviations of the marginal means from the grand mean. Test of main effect(s) is test of the dummy variables for a factor. Interaction effects are products of dummy variables. 33

Balanced vs. Unbalanced Experimental Designs Balanced design: Cell sample sizes are proportional (maybe equal). Explanatory variables have zero relationship to one another. Numerator SS in ANOVA are independent (because contrasts are orthogonal). Everything is nice and simple Most experimental studies are designed this way. As soon as somebody drops a test tube, it’s no longer balanced. 34

Analysis of unbalanced data When explanatory variables are related, there is potential ambiguity. A is related to Y, B is related to Y, and A is related to B. Who gets credit for the portion of variation in Y that could be explained by either A or B? With a regression approach, whether you use contrasts or dummy variables (equivalent), the answer is nobody. Think of full, reduced models. Equivalently, general linear test 35

Some software is designed for balanced data The special purpose formulas are much simpler. Very useful in the past. Since most data are at least a little unbalanced, a recipe for trouble. Most textbook data are balanced, so they cannot tell you what your software is really doing. R’s anova and aov functions are designed for balanced data, though anova applied to lm objects can give you what you want if you use it with care. SAS proc glm is much more convenient. SAS proc anova is for balanced data. 36

Copyright Information This slide show was prepared by Jerry Brunner, Department of Statistics, University of Toronto. It is licensed under a Creative Commons Attribution - ShareAlike 3.0 Unported License. Use any part of it as you like and share the result freely. These PowerPoint slides will be available from the course website: 37