RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr, November 20, 2008 Analysis of Variance.

Slides:



Advertisements
Similar presentations
ANOVA (Analysis of Variance)
Advertisements

Chapter 11 Analysis of Variance
Analysis of variance (ANOVA)-the General Linear Model (GLM)
ANOVA: Analysis of Variation
© 2010 Pearson Prentice Hall. All rights reserved The Complete Randomized Block Design.
Analysis of variance (ANOVA)-the General Linear Model (GLM)
1 Multifactor ANOVA. 2 What We Will Learn Two-factor ANOVA K ij =1 Two-factor ANOVA K ij =1 –Interaction –Tukey’s with multiple comparisons –Concept of.
C82MST Statistical Methods 2 - Lecture 7 1 Overview of Lecture Advantages and disadvantages of within subjects designs One-way within subjects ANOVA Two-way.
Nemours Biomedical Research Statistics April 16, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Part I – MULTIVARIATE ANALYSIS
Statistics for Managers Using Microsoft® Excel 5th Edition
Experimental Design Terminology  An Experimental Unit is the entity on which measurement or an observation is made. For example, subjects are experimental.
Lecture 9: One Way ANOVA Between Subjects
8. ANALYSIS OF VARIANCE 8.1 Elements of a Designed Experiment
RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.
Analysis of Variance & Multivariate Analysis of Variance
Repeated Measures ANOVA Used when the research design contains one factor on which participants are measured more than twice (dependent, or within- groups.
Two-Way Analysis of Variance STAT E-150 Statistical Methods.
Chapter 12: Analysis of Variance
Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
ANOVA Chapter 12.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Repeated Measures ANOVA
QNT 531 Advanced Problems in Statistics and Research Methods
11. Analysis of Variance (ANOVA). Analysis of Variance Review of T-Test ✔ The basic ANOVA situation How ANOVA works One factor ANOVA model ANCOVA and.
SPSS Series 1: ANOVA and Factorial ANOVA
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Comparing Three or More Means 13.
A statistical method for testing whether two or more dependent variable means are equal (i.e., the probability that any differences in means across several.
Chapter 11 HYPOTHESIS TESTING USING THE ONE-WAY ANALYSIS OF VARIANCE.
 The idea of ANOVA  Comparing several means  The problem of multiple comparisons  The ANOVA F test 1.
t(ea) for Two: Test between the Means of Different Groups When you want to know if there is a ‘difference’ between the two groups in the mean Use “t-test”.
Chapter 10 Analysis of Variance.
ANOVA (Analysis of Variance) by Aziza Munir
Basic concept Measures of central tendency Measures of central tendency Measures of dispersion & variability.
Psychology 301 Chapters & Differences Between Two Means Introduction to Analysis of Variance Multiple Comparisons.
Repeated Measures  The term repeated measures refers to data sets with multiple measurements of a response variable on the same experimental unit or subject.
Repeated Measurements Analysis. Repeated Measures Analysis of Variance Situations in which biologists would make repeated measurements on same individual.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 10-1 Chapter 10 Analysis of Variance Statistics for Managers Using Microsoft.
Inferential Statistics
Chapter 19 Analysis of Variance (ANOVA). ANOVA How to test a null hypothesis that the means of more than two populations are equal. H 0 :  1 =  2 =
Analysis of Variance 1 Dr. Mohammed Alahmed Ph.D. in BioStatistics (011)
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Within Subjects Analysis of Variance PowerPoint.
ANOVA: Analysis of Variance. The basic ANOVA situation Two variables: 1 Nominal, 1 Quantitative Main Question: Do the (means of) the quantitative variables.
Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics ANalysis Of VAriance: ANOVA.
ANOVA: Analysis of Variance.
Lecture 9-1 Analysis of Variance
Analysis of Variance (One Factor). ANOVA Analysis of Variance Tests whether differences exist among population means categorized by only one factor or.
Experimental Research Methods in Language Learning
1 ANALYSIS OF VARIANCE (ANOVA) Heibatollah Baghi, and Mastee Badii.
Chapter 12 Introduction to Analysis of Variance PowerPoint Lecture Slides Essentials of Statistics for the Behavioral Sciences Eighth Edition by Frederick.
Chapter 13 Repeated-Measures and Two-Factor Analysis of Variance
Analysis of Variance STAT E-150 Statistical Methods.
ANOVA Overview of Major Designs. Between or Within Subjects Between-subjects (completely randomized) designs –Subjects are nested within treatment conditions.
Analysis of variance Tron Anders Moger
Nemours Biomedical Research Statistics April 9, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
1 Chapter 5.8 What if We Have More Than Two Samples?
Chapter 12 Introduction to Analysis of Variance
ANOVA: Analysis of Variation
Analysis of Variance l Chapter 8 l 8.1 One way ANOVA
ANOVA: Analysis of Variation
ANOVA: Analysis of Variation
ANOVA: Analysis of Variation
Factorial Experiments
ANOVA Econ201 HSTS212.
An Introduction to Two-Way ANOVA
Comparing Three or More Means
Chapter 10: Analysis of Variance: Comparing More Than Two Means
STATISTICS INFORMED DECISIONS USING DATA
Presentation transcript:

RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr, November 20, 2008 Analysis of Variance

Experimental Design Terminology An Experimental Unit is the entity on which measurement or an observation is made. For example, subjects are experimental units in most clinical studies. Homogeneous Experimental Units: Units that are as uniform as possible on all characteristics that could affect the response. A Block is a group of homogeneous experimental units. For example, if an investigator had reason to believe that age might be a significant factor in the effect of a given medication, he might choose to first divide the experimental subjects into age groups, such as under 30 years old, years old, and over 60 years old.

Experimental Design Terminology A Factor is a controllable independent variable that is being investigated to determine its effect on a response. E.g. treatment group is a factor. Factors can be fixed or random – Fixed -- the factor can take on a discrete number of values and these are the only values of interest. – Random -- the factor can take on a wide range of values and one wants to generalize from specific values to all possible values. Each specific value of a factor is called a level.

Experimental Design Terminology A covariate is an independent variable not manipulated by the experimenter but still affecting the response. E.g. in many clinical experiments, the demographic variables such as race, gender, age may influence the response variable significantly even though these are not the variables of interest of the study. These variables are termed as covariate. Effect is the change in the average response between two factor levels. That is, factor effect = average response at one level – average response at a second level.

Experimental Design Terminology Interaction is the joint factor effects in which the effect of one factor depends on the levels of the other factors. No interaction effect of factor A and B Interaction effect of factor A and B

Experimental Design Terminology Randomization is the process of assigning experimental units randomly to different experimental groups. It is the most reliable method of creating homogeneous treatment groups, without involving potential biases or judgments.

Experimental Design Terminology A Replication is the repetition of an entire experiment or portion of an experiment under two or more sets of conditions. – Although randomization helps to insure that treatment groups are as similar as possible, the results of a single experiment, applied to a small number of experimental units, will not impress or convince anyone of the effectiveness of the treatment. – To establish the significance of an experimental result, replication, the repetition of an experiment on a large group of subjects, is required. – If a treatment is truly effective, the average effect of replicated experimental units will reflect it. – If it is not effective, then the few members of the experimental units who may have reacted to the treatment will be negated by the large numbers of subjects who were unaffected by it. – Replication reduces variability in experimental results and increases the significance and the confidence level with which a researcher can draw conclusions about an experimental factor.

Experimental Design Terminology A Design (layout) of the experiment includes the choice of factors and factor- levels, number of replications, blocking, randomization, and the assignment of factor –level combination to experimental units. Sum of Squares (SS): Let x 1, …, x n be n observations. The sum of squares of these n observations can be written as x x 2 2 +…. x n 2. In notations, ∑x i 2. In a corrected form this sum of squares can be written as  (xi -  x ) 2. Degrees of freedom (df): Number of quantities of the form – Number of restrictions. For example, in the following SS, we need n quantities of the form  (xi -  x ). There is one constraint  (xi -  x ) = 0. So the df for this SS is n – 1. Mean Sum of Squares (MSS): The SS divided by it’s df.

Experimental Design Terminology The analysis of variance (ANOVA) is a technique of decomposing the total variability of a response variable into: – Variability due to the experimental factor(s) and… – Variability due to error (i.e., factors that are not accounted for in the experimental design). The basic purpose of ANOVA is to test the equality of several means. A fixed effect model includes only fixed factors in the model. A random effect model includes only random factors in the model. A mixed effect model includes both fixed and random factors in the model.

The basic ANOVA situation Two type of variables: Quantitative response Categorical predictors (factors), Main Question: Do the (means of) the quantitative variable depend on which group (given by categorical variable) the individual is in? If there is only one categorical variable with only 2 levels (groups): 2-sample t-test ANOVA allows for 3 or more groups

One-way analysis of Variance One factor of k levels or groups. E.g., 3 treatment groups in a drug study. The main objective is to examine the equality of means of different groups. Total variation of observations (SST) can be split in two components: variation between groups (SSG) and variation within groups (SSE). Variation between groups is due to the difference in different groups. E.g. different treatment groups or different doses of the same treatment. Variation within groups is the inherent variation among the observations within each group. Completely randomized design (CRD) is an example of one-way analysis of variance.

One-way analysis of variance Consider a layout of a study with 16 subjects that intended to compare 4 treatment groups (G1-G4). Each group contains four subjects. S1S2S3S4 G1Y11Y12Y13Y14 G2Y21Y22Y23Y24 G3Y31Y32Y33Y34 G4Y41Y42Y43Y44

One-way Analysis: Informal Investigation  Graphical investigation: side-by-side box plots multiple histograms  Whether the differences between the groups are significant depends on the difference in the means the standard deviations of each group the sample sizes  ANOVA determines P-value from the F statistic

One-way Analysis: Side by Side Boxplots

One-way analysis of Variance Model: Assumptions: – Observations y ij are independent. – e ij are normally distributed with mean zero and constant standard deviation. – The second (above) assumption implies that response variable for each group is normal (Check using normal quantile plot or histogram or any test for normality) and standard deviations for all group are equal (rule of thumb: ratio of largest to smallest are approximately 2:1).

One-way analysis of Variance Hypothesis: H o : Means of all groups are equal. H a : At least one of them is not equal to other. –doesn’t say how or which ones differ. –Can follow up with “multiple comparisons” Analysis of variance (ANOVA) Table for one way classified data Sources of Variation Sum of Squares dfMean Sum of Squares F-Ratio GroupSSGk-1MSG=SSG/k-1F=MSG/MSE ErrorSSEn-kMSE=SSE/n-k TotalSSTn-1 A large F is evidence against H 0, since it indicates that there is more difference between groups than within groups.

Pooled estimate for variance The pooled variance of all groups can be estimated as the weighted average of the variance of each group: so MSE is the pooled estimate of variance

Multiple comparisons If the F test is significant in ANOVA table, then we intend to find the pairs of groups are significantly different. Following are the commonly used procedures: –Fisher’s Least Significant Difference (LSD) –Tukey’s HSD method –Bonferroni’s method –Scheffe’s method –Dunn’s multiple-comparison procedure –Dunnett’s Procedure

One-way ANOVA - Demo MS Excel: – Put response data (hgt) for each groups (grp) in side by side columns (see next slides) – Select Tools/Data Analysis and select Anova: Single Factor from the Analysis Tools list. Click OK. – Select Input Range (for our example a1: c21), mark on Group by columns and again mark labels in first row. – Select output range and then click on ok.

One-way ANOVA MS-Excel Data Layout

One-way ANOVA MS-Excel output: height on treatment groups

One-way ANOVA - Demo SPSS: – Select Analyze > Compare Means > One –Way ANOVA – Select variables as Dependent List: response (hgt), and Factor: Group (grp) and then make selections as follows-click on Post Hoc and select Multiple comparisons (LSD, Tukey, Bonferroni, or Scheffe), click options and select Homogeneity of variance test, click continue and then Ok.

One-way ANOVA SPSS output: height on treatment groups

Analysis of variance of factorial experiment (Two or more factors) Factorial experiment: – The effects of the two or more factors including their interactions are investigated simultaneously. – For example, consider two factors A and B. Then total variation of the response will be split into variation for A, variation for B, variation for their interaction AB, and variation due to error.

Analysis of variance of factorial experiment (Two or more factors) Model with two factors (A, B) and their interactions: Assumptions: The same as in One-way ANOVA.

Analysis of variance of factorial experiment (Two or more factors) Null Hypotheses: H oa : Means of all groups of the factor A are equal. H ob : Means of all groups of the factor B are equal. H oab :(αβ) ij = 0, i. e. two factors A and B are independent

Analysis of variance of factorial experiment (Two or more factors) ANOVA for two factors A and B with their interaction AB.

Two-factor with replication - Demo MS Excel: – Put response data for two factors like in a lay out like in the next page. – Select Tools/Data Analysis and select Anova: Two Factor with replication from the Analysis Tools list. Click OK. – Select Input Range and input the rows per sample: Number of replications (excel needs equal replications for every levels). Replication is 2 for the data in the next page. – Select output range and then click on ok.

Two-factor ANOVA MS-Excel Data Layout

Two-factor ANOVA MS-Excel output: height on treatment group, shades, and their interaction

Two-factor ANOVA - Demo SPSS: – Select Analyze > General Linear Model > Univariate – Make selection of variables e.g. Dependent varaiable: response (hgt), and Fixed Factor: grp and shades. – Make other selections as follows-click on Post Hoc and select Multiple comparisons (LSD, Tukey, Bonferroni, or Scheffe), click options and select Homogeneity of variance test, click continue and then Ok.

Two-factor ANOVA SPSS output: height on treatment group, shades, and their interaction

Repeated Measures The term repeated measures refers to data sets with multiple measurements of a response variable on the same experimental unit or subject. In repeated measurements designs, we are often concerned with two types of variability: – Between-subjects - Variability associated with different groups of subjects who are treated differently (equivalent to between groups effects in one-way ANOVA) – Within-subjects - Variability associated with measurements made on an individual subject.

Repeated Measures Examples of Repeated Measures designs: A. Two groups of subjects treated with different drugs for whom responses are measured at six-hour increments for 24 hours. Here, DRUG treatment is the between-subjects factor and TIME is the within-subjects factor. B. Students in three different statistics classes (taught by different instructors) are given a test with five problems and scores on each problem are recorded separately. Here CLASS is a between-subjects factor and PROBLEM is a within-subjects factor. C. Volunteer subjects from a linguistics class each listen to 12 sentences produced by 3 text to speech synthesis systems and rate the naturalness of each sentence. This is a completely within-subjects design with factors SENTENCE (1-12) and SYNTHESIZER

Repeated Measures When measures are made over time as in example A we want to assess: – how the dependent measure changes over time independent of treatment (i.e. the main effect of time) – how treatments differ independent of time (i.e., the main effect of treatment) – how treatment effects differ at different times (i.e. the treatment by time interaction). Repeated measures require special treatment because: – Observations made on the same subject are not independent of each other. – Adjacent observations in time are likely to be more correlated than non-adjacent observations

Response Time Repeated Measures

Methods of repeated measures ANOVA – Univariate - Uses a single outcome measure. – Multivariate - Uses multiple outcome measures. – Mixed Model Analysis - One or more factors (other than subject) are random effects. We will discuss only univariate approach

Repeated Measures Assumptions: – Subjects are independent. – The repeated observations for each subject follows a multivariate normal distribution – The correlation between any pair of within subjects levels are equal. This assumption is known as sphericity.

Repeated Measures Test for Sphericity: – Mauchley’s test Violation of sphericity assumption leads to inflated F statistics and hence inflated type I error. Three common corrections for violation of sphericity: – Greenhouse-Geisser correction – Huynh-Feldt correction – Lower Bound correction All these three methods adjust the degrees of freedom using a correction factor called Epsilon. Epsilon lies between 1/k-1 to 1, where k is the number of levels in the within subject factor.

Repeated Measures - SPSS Demo Analyze > General Linear model > Repeated Measures Within-Subject Factor Name: e.g. time, Number of Levels: number of measures of the factors, e.g. we have two measurements PLUC.pre and PLUC.post. So, number of level is 2 for our example. > Click on Add Click on Define and select Within-Subjects Variables (time): e.g. PLUC.pre(1) and PLUC.pre(2) Select Between-Subjects Factor(s): e.g. grp

Repeated Measures ANOVA SPSS output:

Repeated measures ANOVA SPSS output

Questions