Biostatistics Case Studies 2010 Peter D. Christenson Biostatistician Session 3: Clustering and Experimental Replicates.

Slides:



Advertisements
Similar presentations
Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
Advertisements

Incomplete Block Designs. Randomized Block Design We want to compare t treatments Group the N = bt experimental units into b homogeneous blocks of size.
BPS - 5th Ed. Chapter 241 One-Way Analysis of Variance: Comparing Several Means.
C82MST Statistical Methods 2 - Lecture 7 1 Overview of Lecture Advantages and disadvantages of within subjects designs One-way within subjects ANOVA Two-way.
SADC Course in Statistics Comparing Means from Independent Samples (Session 12)
ANOVA Determining Which Means Differ in Single Factor Models Determining Which Means Differ in Single Factor Models.
Where we’ve been & where we’re going We can use data to address following questions: 1.Question:Is a mean = some number? Large sample z-test and CI Small.
Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression.
Lecture 9: One Way ANOVA Between Subjects
Chapter 9 - Lecture 2 Computing the analysis of variance for simple experiments (single factor, unrelated groups experiments).
Bootstrapping applied to t-tests
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
Statistics 11 Hypothesis Testing Discover the relationships that exist between events/things Accomplished by: Asking questions Getting answers In accord.
STAT 3130 Statistical Methods I Session 2 One Way Analysis of Variance (ANOVA)
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
1 Chapter 1: Introduction to Design of Experiments 1.1 Review of Basic Statistical Concepts (Optional) 1.2 Introduction to Experimental Design 1.3 Completely.
1 1 Slide © 2005 Thomson/South-Western Chapter 13, Part A Analysis of Variance and Experimental Design n Introduction to Analysis of Variance n Analysis.
Laws of Logic and Rules of Evidence Larry Knop Hamilton College.
BIOL 582 Lecture Set 10 Nested Designs Random Effects.
Comparing Two Population Means
January 31 and February 3,  Some formulae are presented in this lecture to provide the general mathematical background to the topic or to demonstrate.
Where are we?. What we have covered: - How to write a primary research paper.
Biostatistics Case Studies 2008 Peter D. Christenson Biostatistician Session 3: Replicates.
One-way ANOVA: - Inference for one-way ANOVA IPS chapter 12.1 © 2006 W.H. Freeman and Company.
Correlation and Regression Used when we are interested in the relationship between two variables. NOT the differences between means or medians of different.
ANOVA (Analysis of Variance) by Aziza Munir
Basic concept Measures of central tendency Measures of central tendency Measures of dispersion & variability.
Biostatistics in Practice Peter D. Christenson Biostatistician LABioMed.org /Biostat Session 6: Case Study.
Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician Session 3: Incomplete Data in Longitudinal Studies.
Psychology 301 Chapters & Differences Between Two Means Introduction to Analysis of Variance Multiple Comparisons.
Biostatistics Case Studies 2015 Youngju Pak, PhD. Biostatistician Session 1: Sample Size & Power for Inequality and Equivalence Studies.
T- and Z-Tests for Hypotheses about the Difference between Two Subsamples.
10/22/20151 PUAF 610 TA Session 8. 10/22/20152 Recover from midterm.
Biostatistics Case Studies 2008 Peter D. Christenson Biostatistician Session 5: Choices for Longitudinal Data Analysis.
Testing Hypotheses about Differences among Several Means.
1 G Lect 14a G Lecture 14a Examples of repeated measures A simple example: One group measured twice The general mixed model Independence.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
INTRODUCTION TO ANALYSIS OF VARIANCE (ANOVA). COURSE CONTENT WHAT IS ANOVA DIFFERENT TYPES OF ANOVA ANOVA THEORY WORKED EXAMPLE IN EXCEL –GENERATING THE.
1 Chapter 1: Introduction to Design of Experiments 1.1 Review of Basic Statistical Concepts (Optional) 1.2 Introduction to Experimental Design 1.3 Completely.
Biostatistics Case Studies 2008 Peter D. Christenson Biostatistician Session 2: Statistical Adjustment: How it Works.
Biostatistics in Practice Peter D. Christenson Biostatistician LABioMed.org /Biostat Session 4: Study Size and Power.
Biostatistics in Practice Peter D. Christenson Biostatistician Session 4: Study Size and Power.
Biostatistics Case Studies 2006 Peter D. Christenson Biostatistician Session 6: Discrepancies as Predictors: Discrepancy.
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 6: “Number Needed to Treat” to Prevent One Case.
Statistics for Differential Expression Naomi Altman Oct. 06.
Chapter Seventeen. Figure 17.1 Relationship of Hypothesis Testing Related to Differences to the Previous Chapter and the Marketing Research Process Focus.
Biostatistics in Practice Peter D. Christenson Biostatistician Session 3: Testing Hypotheses.
Biostatistics in Practice Peter D. Christenson Biostatistician Session 4: Study Size for Precision or Power.
Copyright © Cengage Learning. All rights reserved. 12 Analysis of Variance.
Biostatistics Case Studies 2006 Peter D. Christenson Biostatistician Session 2: Correlation of Time Courses of Simultaneous.
Experimental Statistics - week 3
1 Statistics 262: Intermediate Biostatistics Regression Models for longitudinal data: Mixed Models.
Biostatistics in Practice Peter D. Christenson Biostatistician LABioMed.org /Biostat Session 3: Testing Hypotheses.
1 Bandit Thinkhamrop, PhD.(Statistics) Dept. of Biostatistics & Demography Faculty of Public Health Khon Kaen University Overview and Common Pitfalls in.
1 Probability and Statistics Confidence Intervals.
Biostatistics Case Studies Peter D. Christenson Biostatistician Session 3: Missing Data in Longitudinal Studies.
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
Biostatistics Case Studies 2009 Peter D. Christenson Biostatistician Session 4: The Logic Behind Statistical Adjustment.
ANOVA and Multiple Comparison Tests
Today’s lesson (Chapter 12) Paired experimental designs Paired t-test Confidence interval for E(W-Y)
1 G Lect 10M Contrasting coefficients: a review ANOVA and Regression software Interactions of categorical predictors Type I, II, and III sums of.
MEASURES OF CENTRAL TENDENCY Central tendency means average performance, while dispersion of a data is how it spreads from a central tendency. He measures.
Dependent-Samples t-Test
Happiness comes not from material wealth but less desire.
Inferential Statistics
Some statistics questions answered:
Modeling Ordinal Associations Bin Hu
Presentation transcript:

Biostatistics Case Studies 2010 Peter D. Christenson Biostatistician Session 3: Clustering and Experimental Replicates

Question #1 Does this paper use individual offspring outcomes or litter means to compare treatments?

Question #1

Fig 6: Fat in Males 19.0 Δ= Strength of Treatment Effect: Signal:Noise Ratio t= 3.5 SD(1/N Ctrl + 1/N Ab ) 1/2 Are the Ns the # of dams or # of offspring? What is correct SD? Ctrl Ab

Question #1 Group N Mean SD Ctrl Ab Diff t-test: t Value Pr > |t| 6.50 < Paper ignores dams. Uses #s of offspring. Simulated data gives: Analysis assumes all 46 offspring give independent information. We explore the validity/necessity of that assumption. t = 6.50 = 3.14/(1.477(1/33 +1/13) 1/2 )

Question #2 Does Fig 1 express biological differences or measurement error, or both?

Question #2

Question #3 From Fig 1, is it possible (likely?) that litter- mates from a mother may respond more similarly than offspring from different mothers (who were treated the same)?

Question #3

Question #4 Suppose litter-mates do respond almost identically. Would an analysis, say a t-test, using individual offspring that ignores the mothers give about the same treatment difference as an analysis (again, say a t-test) using the mothers means of their offspring?

Question #5 Would the answer to question #4 change if some litters had 3 offspring and others had up to 8?

Question #6 Continuing question #4, would the analysis using individual offspring overstate or understate the evidence about the treatment difference (i.e., p-value too low or too high)?

Question #7 Suppose now that outcomes from litter- mates differ about the same as offspring from different mothers. Would that justify using individual offspring, rather than mothers, in the analysis, and hence more power with the larger N?

Question #7 Suppose now that outcomes from litter- mates differ about the same as offspring from different mothers. Would that justify using individual offspring, rather than mothers, in the analysis, and hence more power with the larger N? This requires the assumption of this equal variability, an expert opinion that may be valid, but the analysis could be faulty if that assumption is wrong. See the next question.

Question #8 Lastly, suppose that we don’t want to suppose as in questions #4-7. Can we use the data itself to measure relative intra- and inter-litter differences, and incorporate that into the treatment comparison? This is what hierarchical or mixed models accomplish. They estimate the correlations among the offspring so we do not have to make assumptions as in question #7. We now show how this is done.

Basic Issue for Using Offspring as Replicates Dams vary. Overall, offspring vary. Do offspring from a dam vary less than offspring from different dams (positive correlation)? Do offspring from a dam vary more than offspring from different dams (negative correlation)? What could cause this?

Intra-Dam Correlation Among Offspring Example: Four dams - A,B,C,D - with 2 offspring each: A A B B C C D D A A A B B B C C C D D D Offspring Fat Dam Means Strong Negative Correlation Strong Positive Correlation Overall Mean No Correlation A A B B C C D D

Intra-Dam and Inter-Dam Variation Example: Four dams - A,B,C,D - with 2 offspring each: A A B B C C D D A A A B B B C C C D D D Offspring Fat Overall Mean Correlation = Scaled V Inter - V Intra Can be calculated from the data. Denote correlation by r. A A B B C C D D V Inter V Intra

Correct SD Uses Both Variations Table 6: Fat in Males 19.0 Δ= Strength of Treatment Effect: Signal/Noise Ratio t= 3.5 SD(1/N Ctrl + 1/N Ab ) 1/2 Are the Ns the # of dams or # of offspring? What is correct SD? Ctrl Ab SD 2 = V(1 + (n-1)r), where n=# offspring/dam

Correct Analysis Ns are #s of offspring. Incorporate offspring correlation by using: SD 2 = V(1 + (n-1)r), where n=# offspring/dam Signal/Noise Ratio t= Δ SD(1/N Ctrl + 1/N Ab ) 1/2 If r=0, then SD 2 =V and same as t-test. If r>0, then SD 2 >V, so t-test overstates effect. If r<0, then SD 2 <V, so t-test understates effect.

Correct Analysis Thus, the reasoning is that the dams are clusters of correlated outcomes (offspring). If offspring were completely correlated (r=1), i.e., identical in a dam, then the correct analysis is the same as using dam means. [SD 2 = nV] If there is no correlation (r=0), the analysis is the same as ignoring dams and using offspring results. [SD 2 = V] If there is some correlation, then SD incorporates that correlation, i.e., relative intra- and inter-.

Correct Analysis in Software If we have the same # of offspring for every dam, we can use repeated measures ANOVA. Specify the dam as a “subject” and the offspring as the repeated values. Otherwise, use Mixed Model for Repeated Measures. Both of these methods consider the dams as clusters of correlated outcomes (offspring).

Numerical Illustrations 1. All Offspring for a Dam Identical 2. All Offspring for a Dam are Unique 3. Offspring for a Dam are Negatively Correlated We will generate data that has about the same means, but different correlations among littermates for these 3 examples.

1. All Offspring for a Dam Identical

Recall Paper Uses Offspring Group N Mean SD Ctrl Ab Diff t-test: t Value Pr > |t| 6.50 < Paper ignores dams. Uses #s of offspring. Simulated data with correlation=1 gives: Analysis assumes all 46 offspring give independent information. … which is wrong here. t = 6.50 = 3.14/(1.477(1/33 +1/13) 1/2 )

Analysis on Dam Means Group N Mean SD Ctrl Ab Diff t-test: t Value Pr > |t| 4.96 < Same data using dam means gives: t = 4.96 = 3.51/(1.477(1/9 +1/9) 1/2 ) So the previous analysis gave a signal:noise ratio t that was 6.5/4.96=1.3 times too large. It doesn’t matter here, but if the previous t-test gave p=0.05, then the correct p here would be 0.13.

Analysis using Calculated Correlation Same data using mixed model gives: CovParm Subject Estimate CS id Residual 1.365E-6 Num Den Effect DF DF F Value Pr > F group group Estimate Std Err Lower Upper Ctrl Ab Square root of is t = 4.96, same as analysis on means. R = 1 = ( )

2. All Offspring for a Dam are Unique

Second Set of Simulated Data Group N Mean SD Ctrl Ab Diff t-test: t Value Pr > |t| 7.13 < Paper ignores dams. Uses #s of offspring. Simulated data with correlation≈0 gives: Analysis assumes all 46 offspring give independent information. … which is correct here; I generated them to be so.

Analysis using Calculated Correlation Same data using mixed model gives: CovParm Subject Estimate CS id Residual Num Den Effect DF DF F Value Pr > F group group Estimate Std Err Lower Upper Ctrl Ab R = = ( ) Square root of is t = 7.50, close to t-test ignoring dams.

3. Offspring for a Dam are Negatively Correlated

Third Set of Simulated Data Group N Mean SD Ctrl Ab Diff t-test: t Value Pr > |t| 7.06 < Use 2 offspring/dam; N=32 and 12 to be even. Simulated data with correlation=-0.76 gives: Analysis assumes all 46 offspring give independent information. … which is wrong here.

Analysis using Calculated Correlation Same data using mixed model gives: CovParm Subject Estimate CS id Residual Num Den Effect DF DF F Value Pr > F group group Estimate Std Err Lower Upper Ctrl Ab R = = ( ) Square root of is t = 14.8, twice the t-test. But, with neg corr, probably would not have a 3.5 difference.