SPH 247 Statistical Analysis of Laboratory Data 1 April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data.

Slides:



Advertisements
Similar presentations
Objectives (BPS chapter 15)
Advertisements

BHS Methods in Behavioral Sciences I April 25, 2003 Chapter 6 (Ray) The Logic of Hypothesis Testing.
10-1 Introduction 10-2 Inference for a Difference in Means of Two Normal Distributions, Variances Known Figure 10-1 Two independent populations.
SADC Course in Statistics Comparing Means from Independent Samples (Session 12)
Research Study. Type Experimental study A study in which the investigator selects the levels of at least one factor Observational study A design in which.
BCOR 1020 Business Statistics
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Overview of Lecture Independent and Dependent Variables Between and Within Designs.
The Practice of Statistics
Today Concepts underlying inferential statistics
1 Experimental Design EPP 245 Statistical Analysis of Laboratory Data.
Tests of significance: The basics BPS chapter 15 © 2006 W.H. Freeman and Company.
Statistical Analysis. Purpose of Statistical Analysis Determines whether the results found in an experiment are meaningful. Answers the question: –Does.
Chapter 8 Introduction to Hypothesis Testing
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Statistical Analysis Statistical Analysis
Tests of significance: The basics BPS chapter 15 © 2006 W.H. Freeman and Company.
5-1 Introduction 5-2 Inference on the Means of Two Populations, Variances Known Assumptions.
QNT 531 Advanced Problems in Statistics and Research Methods
STA291 Statistical Methods Lecture 31. Analyzing a Design in One Factor – The One-Way Analysis of Variance Consider an experiment with a single factor.
Chapter 13 Notes Observational Studies and Experimental Design
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
Let’s flip a coin. Making Data-Based Decisions We’re going to flip a coin 10 times. What results do you think we will get?
Inference for a Single Population Proportion (p).
Chapter 7 Experimental Design: Independent Groups Design.
Introduction to inference Tests of significance IPS chapter 6.2 © 2006 W.H. Freeman and Company.
Parametric tests (independent t- test and paired t-test & ANOVA) Dr. Omar Al Jadaan.
Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.
ANALYSIS OF VARIANCE (ANOVA) BCT 2053 CHAPTER 5. CONTENT 5.1 Introduction to ANOVA 5.2 One-Way ANOVA 5.3 Two-Way ANOVA.
Essential Question:  How do scientists use statistical analyses to draw meaningful conclusions from experimental results?
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.3 Two-Way ANOVA.
Introduction to inference Tests of significance IPS chapter 6.2 © 2006 W.H. Freeman and Company.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Binomial Distribution and Applications. Binomial Probability Distribution A binomial random variable X is defined to the number of “successes” in n independent.
Tests of significance: The basics BPS chapter 14 © 2006 W.H. Freeman and Company.
SPH 247 Statistical Analysis of Laboratory Data 1April 2, 2010SPH 247 Statistical Analysis of Laboratory Data.
Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition1 DESCRIBE the shape, center, and spread of the.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
BIOL 582 Lecture Set 2 Inferential Statistics, Hypotheses, and Resampling.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Review Statistical inference and test of significance.
SPH 247 Statistical Analysis of Laboratory Data 1 March 31, 2015 SPH 247 Statistical Analysis of Laboratory Data.
Class Six Turn In: Chapter 15: 30, 32, 38, 44, 48, 50 Chapter 17: 28, 38, 44 For Class Seven: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 Read.
Inference for a Single Population Proportion (p)
Chapter 4: Designing Studies
CHAPTER 4 Designing Studies
INF397C Introduction to Research in Information Studies Spring, Day 12
Lecture Slides Elementary Statistics Twelfth Edition
Applied Business Statistics, 7th ed. by Ken Black
Comparing Three or More Means
Observational Studies and Experiments
Tests of significance: The basics
Statistical Reasoning December 8, 2015 Chapter 6.2
CHAPTER 4 Designing Studies
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Chapter 4: Designing Studies
STATISTICS INFORMED DECISIONS USING DATA
Presentation transcript:

SPH 247 Statistical Analysis of Laboratory Data 1 April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data

Basic Principles of Experimental Investigation Sequential Experimentation Comparison Manipulation Randomization Blocking Simultaneous variation of factors Main effects and interactions Sources of variability April 2, 2013SPH 247 Statistical Analysis of Laboratory Data2

Sequential Experimentation No single experiment is definitive Each experimental result suggests other experiments Scientific investigation is iterative. “No experiment can do everything; every experiment should do something,” George Box. April 2, 2013SPH 247 Statistical Analysis of Laboratory Data3

April 2, 2013SPH 247 Statistical Analysis of Laboratory Data4 Plan Experiment Perform Experiment Analyze Data from Experiment

Comparison Usually absolute data are meaningless, only comparative data are meaningful The level of mRNA in a sample of liver cells is not meaningful The comparison of the mRNA levels in samples from normal and diseased liver cells is meaningful April 2, 2013SPH 247 Statistical Analysis of Laboratory Data5

Internal vs. External Comparison Comparison of an experimental results with historical results is likely to mislead Many factors that can influence results other than the intended treatment Best to include controls or other comparisons in each experiment April 2, 2013SPH 247 Statistical Analysis of Laboratory Data6

Manipulation Different experimental conditions need to be imposed by the experimenters, not just observed, if at all possible The rate of complications in cardiac artery bypass graft surgery may depend on many factors which are not controlled (for example, characteristics of the patient), and may be hard to measure April 2, 2013SPH 247 Statistical Analysis of Laboratory Data7

April 2, 2013SPH 247 Statistical Analysis of Laboratory Data8

Randomization Randomization limits the difference between groups that are due to irrelevant factors Such differences will still exist, but can be quantified by analyzing the randomization This is a method of controlling for unknown confounding factors April 2, 2013SPH 247 Statistical Analysis of Laboratory Data9

Suppose that 50% of a patient population is female A sample of 100 patients will not generally have exactly 50% females Numbers of females between 40 and 60 would not be surprising In two groups of 100, the disparity between the number of females in the two groups can be as big as 20% simply by chance, but not much larger This also holds for factors we don’t know about April 2, 2013SPH 247 Statistical Analysis of Laboratory Data10

Randomization does not exactly balance against any specific factor To do that one should employ blocking Instead it provides a way of quantifying possible imbalance even of unknown factors Randomization even provides an automatic method of analysis that depends on the design and randomization technique. April 2, 2013SPH 247 Statistical Analysis of Laboratory Data11

The Farmer from Whidbey Island Visited the University of Washington with a Whalebone water douser 10 Dixie cups, 5 with water, 5 empty, each covered with plywood Placed in a random order defined by generating 10 random numbers and sorting the cups by the random number If he gets all 10 right, is chance a reasonable explanation? April 2, 2013SPH 247 Statistical Analysis of Laboratory Data12

The randomness is produced by the process of randomly choosing which 5 of the 10 are to contain water There are no other assumptions April 2, 2013 SPH 247 Statistical Analysis of Laboratory Data13

If the randomization had been to flip a coin for each of the 10 cups, then the probability of getting all 10 right by chance is different There are 2 10 = 1024 ways for the randomization to come out, only one of which is corresponds to the choices, so the chance is 1/1024 =.001 The method of randomization matters If the farmer could observe condensation on the cups, then this is still evidence of non-randomness, but not of the effectiveness of dousing! April 2, 2013SPH 247 Statistical Analysis of Laboratory Data14

Randomization Inference 20 tomato plants are divided 10 groups of 2 placed next to each other in the greenhouse (to control for temperature and insolation) In each group of 2, one is chosen using a random number table to receive fertilizer A; the other receives fertilizer B The yield of each plant in pounds of tomatoesis measured The null hypothesis is that the fertilizers are equal in promoting tomato growth April 2, 2013SPH 247 Statistical Analysis of Laboratory Data15

April 2, 2013SPH 247 Statistical Analysis of Laboratory Data A B diff Pounds of yield of tomatoes for 20 plants

The average yield for fertilizer A is pounds The average yield for fertilizer B is pounds The average difference is 4.1 Could this have happened by chance? Is it statistically significant? If A and B do not differ in their effects (null hypothesis is true), then the plants’ yields would have been the same either whether A or B is applied The difference would be the negative of what it was if the coin flip had come out the other way April 2, 2013SPH 247 Statistical Analysis of Laboratory Data17

April 2, 2013SPH 247 Statistical Analysis of Laboratory Data lb140 lb 132 lb Fert A Fert B ActualHypothetical ∆ = 8∆ = −8

In pair 1, the yields were 132 and 140. The difference was 8, but it could have been −8 With 10 coin flips, there are 2 10 = 1024 possible outcomes of + or − on the difference These outcomes are possible outcomes from our action of randomization, and carry no assumptions The measurements don’t have to be normally distributed or have the same variance April 2, 2013SPH 247 Statistical Analysis of Laboratory Data19

Of the 1024 possible outcomes that are all equally likely under the null hypothesis, only 3 had greater values of the average difference, and only four (including the one observed) had the same value of the average difference The likelihood of this happening by chance is [3+4/2]/1024 =.005 This does not depend on any assumptions other than that the randomization was correctly done April 2, 2013SPH 247 Statistical Analysis of Laboratory Data20

April 2, 2013SPH 247 Statistical Analysis of Laboratory Data A B diff

April 2, 2013SPH 247 Statistical Analysis of Laboratory Data22 Paired t-test

Randomization in practice Whenever there is a choice, it should be made using a formal randomization procedure, such as Excel’s rand() function. This protects against unexpected sources of variability such as day, time of day, operator, reagent, etc. April 2, 2013SPH 247 Statistical Analysis of Laboratory Data23

April 2, 2013SPH 247 Statistical Analysis of Laboratory Data24 Pair NumberFirst Sample Treatment 1A or B? A or B?

April 2, 2013SPH 247 Statistical Analysis of Laboratory Data25 Pair Num First Sample Treatment random number 1A or B? A or B? A or B? A or B? A or B? A or B? A or B? A or B? A or B? A or B?

=rand() in first cell Copy down the column Highlight entire column ^c (Edit/Copy) Edit/Paste Special/Values This fixes the random numbers so they do not recompute each time =IF(C3<0.5,"A","B") goes in cell C2, then copy down the column April 2, 2013SPH 247 Statistical Analysis of Laboratory Data26

April 2, 2013SPH 247 Statistical Analysis of Laboratory Data27 Plant Pair First Plant Treatment random number 1B B B A A B B B B A

To randomize run order, insert a column of random numbers, then sort on that column More complex randomizations require more care, but this is quite important and worth the trouble Randomization can be done in Excel, R, or anything that can generate random numbers April 2, 2013SPH 247 Statistical Analysis of Laboratory Data28

Randomization in R rand1 <- runif(10) treat <- rep("",10) rand2 <- order(rand1) < 5.5 treat[rand2] <- "Treatment A" treat[!rand2] <- "Treatment B" > rand1 [1] [7] > order(rand1) [1] > rand2 [1] TRUE TRUE FALSE TRUE FALSE FALSE TRUE TRUE FALSE FALSE > treat [1] "Treatment A" "Treatment A" "Treatment B" "Treatment A" "Treatment B" [6] "Treatment B" "Treatment A" "Treatment A" "Treatment B" "Treatment B" April 2, 2013SPH 247 Statistical Analysis of Laboratory Data29

Blocking If some factor may interfere with the experimental results by introducing unwanted variability, one can block on that factor In agricultural field trials, soil and other location effects can be important, so plots of land are subdivided to test the different treatments. This is the origin of the idea April 2, 2013SPH 247 Statistical Analysis of Laboratory Data30

If we are comparing treatments, the more alike the units are to which we apply the treatment, the more sensitive the comparison. Within blocks, treatments should be randomized Paired comparisons are a simple example of randomized blocks as in the tomato plant example April 2, 2013SPH 247 Statistical Analysis of Laboratory Data31

Simultaneous Variation of Factors The simplistic idea of “science” is to hold all things constant except for one experimental factor, and then vary that one thing This misses interactions and can be statistically inefficient Multi-factor designs are often preferable April 2, 2013SPH 247 Statistical Analysis of Laboratory Data32

Interactions Sometimes (often) the effect of one variable depends on the levels of another one This cannot be detected by one-factor-at-a-time experiments These interactions are often scientifically the most important April 2, 2013SPH 247 Statistical Analysis of Laboratory Data33

Experiment 1. I compare the room before and after I drop a liter of gasoline on the desk. Result: we all leave because of the odor. April 2, 2013SPH 247 Statistical Analysis of Laboratory Data34

Experiment 1. I compare the room before and after I drop a liter of gasoline on the desk. Result: we all leave because of the odor. Experiment 2. I compare the room before and after I drop a lighted match on the desk. Result: no effect other than a small scorch mark. April 2, 2013SPH 247 Statistical Analysis of Laboratory Data35

Experiment 1. I compare the room before and after I drop a liter of gasoline on the desk. Result: we all leave because of the odor. Experiment 2. I compare the room before and after I drop a lighted match on the desk. Result: no effect other than a small scorch mark. Experiment 3. I compare all four of ±gasoline and ±match. Result: we are all killed. Large Interaction effect April 2, 2013SPH 247 Statistical Analysis of Laboratory Data36

Statistical Efficiency Suppose I compare the expression of a gene in a cell culture of either keratinocytes or fibroblasts, confluent and nonconfluent, with or without a possibly stimulating hormone, with 2 cultures in each condition, requiring 16 cultures April 2, 2013SPH 247 Statistical Analysis of Laboratory Data37

I can compare the cell types as an average of 8 cultures vs. 8 cultures I can do the same with the other two factors This is more efficient than 3 separate experiments with the same controls, using 48 cultures Can also see if cell types react differently to hormone application (interaction) April 2, 2013SPH 247 Statistical Analysis of Laboratory Data38

Fractional Factorial Designs When it is not known which of many factors may be important, fractional factorial designs can be helpful With 7 factors each at 2 levels, ordinarily this would require 2 7 = 128 experiments This can be done in 8 experiments instead! Each two factors form a replicated two-by-two Some sets of three factors form an unrelicated two-by- two-by-two April 2, 2013SPH 247 Statistical Analysis of Laboratory Data39

April 2, 2013SPH 247 Statistical Analysis of Laboratory Data40 F1F2F3F4F5F6F7 1HHHHHHH 2HHLHLLL 3HLHLHLL 4HLLLLHH 5LHHLLHL 6LHLLHLH 7LLHHLLH 8LLLHHHL

April 2, 2013SPH 247 Statistical Analysis of Laboratory Data41 F1F2F3F4F5F6F7 1HHHHHHH 2HHLHLLL 3HLHLHLL 4HLLLLHH 5LHHLLHL 6LHLLHLH 7LLHHLLH 8LLLHHHL

April 2, 2013SPH 247 Statistical Analysis of Laboratory Data42 F1F2F3F4F5F6F7 1HHHHHHH 2HHLHLLL 3HLHLHLL 4HLLLLHH 5LHHLLHL 6LHLLHLH 7LLHHLLH 8LLLHHHL

April 2, 2013SPH 247 Statistical Analysis of Laboratory Data43 F1F2F3F4F5F6F7 1HHHHHHH 2HHLHLLL 3HLHLHLL 4HLLLLHH 5LHHLLHL 6LHLLHLH 7LLHHLLH 8LLLHHHL

Main Effects and Interactions Factors Cell Type (C), State (S), Hormone (H) Response is expression of a gene The main effect C of cell type is the difference in average gene expression level between cell types April 2, 2013SPH 247 Statistical Analysis of Laboratory Data44

For the interaction between cell type and state, compute the difference in average gene expression between cell types separately for confluent and nonconfluent cultures. The difference of these differences is the interaction. The three-way interaction CSH is the difference in the two way interactions with and without the hormone stimulant. April 2, 2013SPH 247 Statistical Analysis of Laboratory Data45

Sources of Variability in Laboratory Analysis Intentional sources of variability are treatments and blocks There are many other sources of variability Biological variability between organisms or within an organism Technical variability of procedures like RNA extraction, labeling, hybridization, chips, etc. April 2, 2013SPH 247 Statistical Analysis of Laboratory Data46

Replication Almost always, biological variability is larger than technical variability, so most replicates should be biologically different, not just replicate analyses of the same samples (technical replicates) However, this can depend on the cost of the experiment vs. the cost of the sample 2D gels are so variable that replication is required Expression arrays, PCR, RNA-Seq, Mass Spect and others do not usually require replication April 2, 2013SPH 247 Statistical Analysis of Laboratory Data47

Quality Control It is usually a good idea to identify factors that contribute to unwanted variability A study can be done in a given lab that examines the effects of day, time of day, operator, reagents, etc. This is almost always useful in starting with a new technology or in a new lab April 2, 2013SPH 247 Statistical Analysis of Laboratory Data48

Possible QC Design Possible factors: day, time of day, operator, reagent batch At two levels each, this is 16 experiments to be done over two days, with 4 each in morning and afternoon, with two operators and two reagent batches Analysis determines contributions to overall variability from each factor April 2, 2013SPH 247 Statistical Analysis of Laboratory Data49

References Statistics for Experimenters, Box, Hunter, and Hunter, John Wiley April 2, 2013SPH 247 Statistical Analysis of Laboratory Data50

Exercise 1 You have a clinical study in which 10 patients will either get the standard treatment or a new treatment Randomize which 5 of the 10 get the new treatment so that all possible combinations can result. Use Excel or R or another formal randomization method. Instead, randomize so that in each pair of patients entered by date, one has the standard and one the new treatment (blocked randomization). What are the advantages of each method? Why is randomization important? April 2, 2013SPH 247 Statistical Analysis of Laboratory Data51

Course Website April 2, 2013SPH 247 Statistical Analysis of Laboratory Data52