Statistical Data Analysis - Lecture /04/03

Slides:



Advertisements
Similar presentations
 Population multiple regression model  Data for multiple regression  Multiple linear regression model  Confidence intervals and significance tests.
Advertisements

SPH 247 Statistical Analysis of Laboratory Data 1April 2, 2013SPH 247 Statistical Analysis of Laboratory Data.
Topic 12 – Further Topics in ANOVA
Factorial ANOVA More than one categorical explanatory variable.
Polynomial Regression and Transformations STA 671 Summer 2008.
Classical Regression III
MAE 552 Heuristic Optimization Instructor: John Eddy Lecture #19 3/8/02 Taguchi’s Orthogonal Arrays.
Lecture 6: Multiple Regression
Biol 500: basic statistics
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
One-Factor Experiments Andy Wang CIS 5930 Computer Systems Performance Analysis.
9/14/ Lecture 61 STATS 330: Lecture 6. 9/14/ Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.
Analysis of Variance: Some Review and Some New Ideas
STA291 Statistical Methods Lecture 31. Analyzing a Design in One Factor – The One-Way Analysis of Variance Consider an experiment with a single factor.
Psych 5500/6500 Other ANOVA’s Fall, Factorial Designs Factorial Designs have one dependent variable and more than one independent variable (i.e.
Exercise 1 You have a clinical study in which 10 patients will either get the standard treatment or a new treatment Randomize which 5 of the 10 get the.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
ETM U 1 Analysis of Variance (ANOVA) Suppose we want to compare more than two means? For example, suppose a manufacturer of paper used for grocery.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 11: Models Marshall University Genomics Core Facility.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 10: Comparing Models.
Multivariate vs Univariate ANOVA: Assumptions. Outline of Today’s Discussion 1.Within Subject ANOVAs in SPSS 2.Within Subject ANOVAs: Sphericity Post.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Two-Way ANOVA Interactions. What we will cover Two-way ANOVA: Family of ANOVA tests More examples in R Looking at interaction plots How to interpret the.
Stats Methods at IC Lecture 3: Regression.
ANOVA: Analysis of Variation
GS/PPAL Section N Research Methods and Information Systems
ANOVA: Analysis of Variation
Step 1: Specify a null hypothesis
Regression and Correlation
Chapter 7 Confidence Interval Estimation
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 20th February 2014  
Lecture Slides Elementary Statistics Twelfth Edition
Two-way ANOVA with significant interactions
Two-Way Independent ANOVA (GLM 3)
Statistics for the Social Sciences
Statistical Data Analysis - Lecture /04/03
Comparing Three or More Means
Interactions and Factorial ANOVA
PCB 3043L - General Ecology Data Analysis.
Statistical Data Analysis - Lecture10 26/03/03
Analysis of Covariance (ANCOVA)
Chapter 25 Comparing Counts.
Inverse Transformation Scale Experimental Power Graphing
IE-432 Design Of Industrial Experiments
Correlation and Regression
CHAPTER 29: Multiple Regression*
CONCEPTS OF ESTIMATION
Chapter 11: The ANalysis Of Variance (ANOVA)
Analysis of Variance: Some Review and Some New Ideas
EQ: How well does the line fit the data?
Statistics for the Social Sciences
Analysis of Variance (ANOVA)
15.1 Goodness-of-Fit Tests
Randomized Complete Block and Repeated Measures (Each Subject Receives Each Treatment) Designs KNNL – Chapters 21,
Chapter 13 Group Differences
Paired Samples and Blocks
Chapter 26 Comparing Counts.
Chapter 26 Comparing Counts Copyright © 2009 Pearson Education, Inc.
One-Factor Experiments
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Analysing Means I: (Extending) Analysis.
Chapter 26 Comparing Counts.
CHAPTER 4 Designing Studies
BUS-221 Quantitative Methods
Analysis of Variance (ANOVA)
MGS 3100 Business Analysis Regression Feb 18, 2016
STATISTICS INFORMED DECISIONS USING DATA
F test for Lack of Fit The lack of fit test..
Presentation transcript:

Statistical Data Analysis - Lecture16 - 09/04/03 A case study The following data come from an experiment designed to measure the accuracy of eleven laboratories. Each laboratory was given three samples for each of two different types of chalk. The laboratories were then asked to take readings on the bulk density of precipitated chalk. In this experiment, The response is bulk density The factors are CHALK and LAB The factor CHALK has two levels A and B The factor LAB has eleven levels corresponding to the different laboratories. Statistical Data Analysis - Lecture16 - 09/04/03

Statistical Data Analysis - Lecture16 - 09/04/03 Here’s the raw data. How do we get it into a form that can be analysed Statistical Data Analysis - Lecture16 - 09/04/03

Statistical Data Analysis - Lecture16 - 09/04/03 Data manipulation Often, up to 90% of your time in any analysis is getting the data into a format that is convenient for the analysis you wish to do This data set is no exception. There is no single way of doing this. I used Microsoft Excel (good for data manipulation not so good for statistical analysis) because of its ease of handling of columnar data Our ultimate goal is to get the data into R ready for a two way ANOVA. The format R expects (as do most stats packages) is to have the response in one column, and appropriate factor levels in adjacent columns Statistical Data Analysis - Lecture16 - 09/04/03

Statistical Data Analysis - Lecture16 - 09/04/03 Data manipulation Using Excel, I copied each block chalk data and pasted the transpose of the data in my worksheet. Transposing turns the rows into columns, so for each chalk type I went from a block of 11 rows and 3 columns to 3 rows and eleven columns. This makes it easier to stack the results from the different laboratories on top of each other After turning each block into a column I stacked those columns on top of each other leaving me with one column Now, all the data are in one column. Statistical Data Analysis - Lecture16 - 09/04/03

Statistical Data Analysis - Lecture16 - 09/04/03 Coding the factors The 1st 33 observations are experiments done with Chalk A and the 2nd 33 observations are experiments done with Chalk B Therefore in R we need to make a vector with 33 A’s and 33 B’s (to represent the factor levels for CHALK) We can do this with chalk<-as.factor(rep(c(“A”,”B”),c(33,33))) Statistical Data Analysis - Lecture16 - 09/04/03

Statistical Data Analysis - Lecture16 - 09/04/03 Coding the factors We know that in each block of 33 experiments that there are 3 observations from each lab This means we need a sequence that represents the idea that the 1st 3 observations where done on chalk A by lab 1, the 2nd 3 observations where done on chalk A by lab 2 and so on We’ve taken care of the CHALK coding. To code the LAB factor we label each observation with the lab it came from. This means we need a vector of 3 “ones”, 3 “twos” and so on. And it needs to be repeated for Chalk B Therefore, we use lab<-as.factor(rep(rep(1:11,rep(3,11)),2)) Statistical Data Analysis - Lecture16 - 09/04/03

Statistical Data Analysis - Lecture16 - 09/04/03 Fitting the model We fit a standard two-way ANOVA model to the data In this case i = A, B , j = 1,…,11 and k = 1,…,3 This is a balanced design because =N/IJ=66/211=3 What do we expect to see before we do any fitting? We know the chalk types are different, so the factor CHALK should be significant We expect the labs to perform about the same so we hope that the factor LAB is not significant – if it is this means that the quality of some of the labs is lower We hope there is no difference in the quality of the results on the basis of chalk type – i.e. we hope there is no significant interaction between CHALK and LAB Statistical Data Analysis - Lecture16 - 09/04/03

Statistical Data Analysis - Lecture16 - 09/04/03 The fitted model In the interests of numerical stability we multiple the responses by 1000. This multiplies the group means by 1,000 and the group variances and sums of squares by 1,000,000 The results are still easy to understand, but we don’t need to worry as much about rounding error We need to remember to undo this change if we wish to say anything in particular about the numerical value of the results Statistical Data Analysis - Lecture16 - 09/04/03

Statistical Data Analysis - Lecture16 - 09/04/03 Analysis of Variance Table Response: Density Df Sum Sq Mean Sq F value Pr(>F) chalk 1 503215 503215 63503.1912 < 2.2e-16 *** lab 10 5223 522 65.9132 < 2.2e-16 *** chalk:lab 10 469 47 5.9247 1.313e-05 *** Residuals 44 349 8 --- We can see that we have some problems CHALK is significant as we predicted BUT… so are the LAB effects and the CHALK*LAB interaction is significant as well What does this mean? Maybe a plot will help Statistical Data Analysis - Lecture16 - 09/04/03

Statistical Data Analysis - Lecture16 - 09/04/03

Interpreting the interaction plot The interaction plot is interesting It seems to offer contrary findings to our ANOVA table Remember, if an interaction is significant, then the lines will generally overlap or not be parallel The lines here seem to be mostly parallel In fact the plot is dominated by the difference between the chalks This fact is key to our interpretation Let’s go back to the ANOVA table Statistical Data Analysis - Lecture16 - 09/04/03

Percentage of variation explained When we’re modelling data, our aim is to explain the data In statistics, we measure how well we’ve explained the data by the percentage or proportion of variation in the data that the model accounts for. If the model only explains a small amount of the variation, then the model does not explain the data well, i.e. a poor fit. Conversely if the model explains a large amount of the variation, then USUALLY the model does explain the data well, i.e. a good fit. The reason we don’t automatically say the model is a good fit is because addition of model parameters will always improve fit Statistical Data Analysis - Lecture16 - 09/04/03

Percentage of variation explained When we work out the percentage of the total sums of squares (TSS) attributed to each of the model terms we see that 98.8% comes from the difference between the chalks Because the sums of squares are a measure of total variation we can treat this as a measure of variation explained It is fairly obvious that there is little increase in the relative quality of the fit with the addition of the labs and interaction terms Furthermore, our interaction plot says that we’re unlikely to pick a different lab to do an analysis on the basis of the chalk type we’re looking at Statistical Data Analysis - Lecture16 - 09/04/03

Statistical Data Analysis - Lecture16 - 09/04/03 Refitting the model Having convinced ourselves that the additive model will explain the data well enough, we fit the reduced model Analysis of Variance Table Response: Density Df Sum Sq Mean Sq F value Pr(>F) chalk 1 503215 503215 33213.399 < 2.2e-16 *** lab 10 5223 522 34.474 < 2.2e-16 *** Residuals 54 818 15 Statistical Data Analysis - Lecture16 - 09/04/03

Statistical Data Analysis - Lecture16 - 09/04/03 Additive model Examining the ANOVA table we can see that the main effects are still significant even though we haven’t accounted for the interaction Remember that the aim of this experiment was not to prove that there is no difference between the chalks (we know there is), but to look at differences in accuracy between labs A main effects plot shows us the effects due to to each effect. The group means are plotted on separated plots for each factor In our example we have a plot with for the chalk means and another plot for the lab means Statistical Data Analysis - Lecture16 - 09/04/03

Statistical Data Analysis - Lecture16 - 09/04/03 Main effects plot Statistical Data Analysis - Lecture16 - 09/04/03

Further considerations in Twoway ANOVA (not examinable) Linear Contrast and Confidence intervals for interaction effects Similar to those for one-way ANOVA Twoway ANOVA with one replicate, We can not fit a model with an interaction term Since there is only one replicate, we can in fact drop the subscript k Tukey’s test for non-addivity, assumes our interaction is proportional to the product of the two main effects Statistical Data Analysis - Lecture16 - 09/04/03