Introduction to Biostatistics and Bioinformatics Experimental Design.

Slides:



Advertisements
Similar presentations
Regulation of Consumer Tests in California AAAS Meeting June 1-2, 2009 Beatrice OKeefe Acting Chief, Laboratory Field Services California Department of.
Advertisements

Protein Quantitation II: Multiple Reaction Monitoring
LSU-HSC School of Public Health Biostatistics 1 Statistical Core Didactic Introduction to Biostatistics Donald E. Mercante, PhD.
From the homework: Distribution of DNA fragments generated by Micrococcal nuclease digestion mean(nucs) = bp median(nucs) = 110 bp sd(nucs+ = 17.3.
Estimation of Sample Size
Hypothesis testing Week 10 Lecture 2.
Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,
Topic 2: Statistical Concepts and Market Returns
Evaluating Hypotheses
Inferences About Process Quality
Sample Size Determination
Hypothesis Testing Using The One-Sample t-Test
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Re-Examination of the Design of Early Clinical Trials for Molecularly Targeted Drugs Richard Simon, D.Sc. National Cancer Institute linus.nci.nih.gov/brb.
Chapter 14 Inferential Data Analysis
Linear Regression/Correlation
Proteomics Informatics Workshop Part III: Protein Quantitation
Thoughts on Biomarker Discovery and Validation Karla Ballman, Ph.D. Division of Biostatistics October 29, 2007.
Proteomics Informatics – Molecular signatures (Week 11)
Proteomics Informatics – Data Analysis and Visualization (Week 13)
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 9. Hypothesis Testing I: The Six Steps of Statistical Inference.
Chapter 8 Introduction to Hypothesis Testing
STAT 5372: Experimental Statistics Wayne Woodward Office: Office: 143 Heroy Phone: Phone: (214) URL: URL: faculty.smu.edu/waynew.
2007 GeneSpring MS GeneSpring for Metabolite BioMarker Analysis using Mass Spectrometry data Agilent Q-TOF VIP Visit Jan 16-17, 2007 Santa Clara, CA Thon.
PARAMETRIC STATISTICAL INFERENCE
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
The Scientific Method Formulation of an H ypothesis P lanning an experiment to objectively test the hypothesis Careful observation and collection of D.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
Previous Lecture: Bioimage Informatics Agullo-Pascual E, Reid DA, Keegan S, Sidhu M, Fenyö D, Rothenberg E, Delmar M, "Super-resolution fluorescence microscopy.
ANALYSIS OF VARIANCE (ANOVA) BCT 2053 CHAPTER 5. CONTENT 5.1 Introduction to ANOVA 5.2 One-Way ANOVA 5.3 Two-Way ANOVA.
The Use of Predictive Biomarkers in Clinical Trial Design Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Using Predictive Classifiers in the Design of Phase III Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute.
Introduction to Biostatistics and Bioinformatics Regression and Correlation.
Copyright © Cengage Learning. All rights reserved. 12 Analysis of Variance.
Chapter 10 The t Test for Two Independent Samples
Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
RESEARCH & DATA ANALYSIS
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Organization of statistical research. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and.
© Copyright McGraw-Hill 2004
T Test for Two Independent Samples. t test for two independent samples Basic Assumptions Independent samples are not paired with other observations Null.
BIOSTATISTICS Lecture 2. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and creating methods.
BIOSTATISTICS Hypotheses testing and parameter estimation.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.
Chapter 13 Understanding research results: statistical inference.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
Inferential Statistics Psych 231: Research Methods in Psychology.
Strategies for Metabolomic Data Analysis Dmitry Grapov, PhD.
Uses of Diagnostic Tests Screen (mammography for breast cancer) Diagnose (electrocardiogram for acute myocardial infarction) Grade (stage of cancer) Monitor.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Statistical Core Didactic
i) Two way ANOVA without replication
Understanding Results
Statistics in Applied Science and Technology
Internal Validity – Control through
Proteomics Informatics David Fenyő
I. Statistical Tests: Why do we use them? What do they involve?
Topic: Medicine of the future Reading: Harbron, Chris (2006)
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
DESIGN OF EXPERIMENTS by R. C. Baker
Proteomics Informatics David Fenyő
Proteomics Informatics –
Presentation transcript:

Introduction to Biostatistics and Bioinformatics Experimental Design

Experimental Design by Christine Ambrosino

Experimental Design Overcoming the threat from chance and bias to the validity of conclusion.

Experimental Design Inputs Process Outputs Controllable Factors Uncontrollable Factors

Experimental Design Recognition and statement of the problem (e.g. testing a specific hypothesis or open ended discovery). Selecting a response variable. Choosing controllable factors and their range. Listing uncontrollable factors and estimate their effect. Choosing experimental design. Performing experiment. Statistical analysis of data. Designing the next experiment based on the results.

Exploring the Parameter Space One factor at a time Factor 1 Score Factor 2 Score Factor 3 Score Factor 1 Factor 2 2-factor factorial design3-factor factorial design k-factor factorial design (2 k experiments) k factors : 2k experiments 4 experiments 8 experiments For example, 7 factors: 128 experiments, 10 factors: 1,024 experiments

Randomization Statistical methods require that observations are independently distributed random variables. Randomization usually makes this assumption valid. Randomization guards against unknown and uncontrolled factors. Randomize with respect to analysis order, location, material etc. Order of Measurements p = 0.19p = 0.32 Not Randomized Randomized No change in sensitivity during measurement

Randomization Order of Measurements p = 0.19p = 0.32 Not Randomized Randomized Order of Measurements p = 5.7x10 -6 No change in sensitivity during measurement Change in sensitivity during measurement p = 0.20 Standard Deviation: 0.8, 0.8 Standard Deviation: 0.7, 0.9 Standard Deviation: 1.8, 1.3

Blocking Blocking is used to control for known and controllable factors. Randomized Complete Block Design - minimizing the effect of variability associated with e.g. location, operator, plant, batch, time. The Latin Square Design - minimizing the effect of variability associated with two independent factors The rows and columns represent two restrictions on randomization

Replication Replication is needed to estimate the variance in the measurements. Technical replicates (repeat measurements). Process replicates Biological replicates

Uncertainty in Determining the Mean ComplexNormalSkewedLong tails n=3 n=10 Mean n=100 n=3 n=10 n=100 n=3 n=10 n=100 n=10 n=100 n=1000

An example of bad experimental design

Protein Identification and Quantitation by Mass Spectrometry Mass Spectrometry m/z intensity Identity Quantity Samples Peptides

A proteomics example – no replicates three replicates Log 2 Sum Spectrum Count Log 2 Spectrum Count Ratio

Analytical Measuments: Precision and Accuracy Theoretical Concentration Measured Concentration

Testing multiple hypothesis Is the concentration of calcium/calmodulin-dependent protein kinase type II different between the two samples? What protein concentration are different between the two samples? p = 2x10 -6 The p-value needs to be corrected taking into account the we perform many tests. Bonferroni correction: multiply the p-value with The number of tests performed (n): p corr = p uncorr x n In this case where 3685 proteins are identified, so the Bonferroni corrected p-value for calcium/calmodulin-dependent protein kinase type II is p corr = 2x10 -6 x 3685 = 0.007

Testing multiple hypothesis The p-value distribution is uniform when testing differences between samples from the same distribution. Normal distribution Sample size = 10 p-value 1 0 # of test p-value 1 0 # of test p-value 1 0 # of test ,000 tests1,000 tests100 tests

Testing multiple hypothesis The p-value distribution is uniform when testing differences between samples from the same distribution. Normal distribution Sample size = tests from a distribution with a different mean (μ 1 -μ 2 >>σ) p-value 1 # of test p-value 1 # of test p-value 1 0 # of test ,000 tests1,000 tests100 tests 0 0

Testing multiple hypothesis Controlling for False Discovery Rate (FDR) Normal distribution Sample size = tests from a distribution with a different mean (μ 1 -μ 2 >>σ) p-value 1 False Rate p-value 1 False Rate p-value 1 0 False Rate False Discovery Rate False Discovery Rate False Discovery Rate 10,000 tests1,000 tests100 tests

Testing multiple hypothesis False Discovery Rate (FDR) and False Negative Rate (FNR) Normal distribution Sample size = tests 30 tests from a distribution with a different mean p-value 1 False Rate p-value 1 False Rate p-value 1 0 False Rate μ 1 -μ 2 =2σμ1-μ2=σμ1-μ2=σμ 1 -μ 2 =σ/2 False Discovery Rate False Negative Rate False Discovery Rate False Negative Rate False Discovery Rate False Negative Rate

Sampling – Gaussian Peak Retention Time Intensity

Sampling – Gaussian Peak

Definition of a molecular signature FDA calls them “in vitro diagnostic multivariate assays” A molecular signature is a computational or mathematical model that links high-dimensional molecular information to phenotype or other response variable of interest.

1.Models of disease phenotype/clinical outcome Diagnosis Prognosis, long-term disease management Personalized treatment (drug selection, titration) 2.Biomarkers for diagnosis, or outcome prediction Make the above tasks resource efficient, and easy to use in clinical practice 3.Discovery of structure & mechanisms (regulatory/interaction networks, pathways, sub- types) Leads for potential new drug candidates Uses of molecular signatures

Oncotype DX Breast Cancer Assay Developed by Genomic Health ( 21-gene signature to predict whether a woman with localized, ER+ breast cancer is at risk of relapse Independently validated in thousands of patients So far performed >100,000 tests Price of the test is $4,175 Not FDA approved but covered by most insurances including Medicare Its sales in 2010 reached $170M and with a compound annual growth rate is projected to hit $300M by 2015.

EF Petricoin III, AM Ardekani, BA Hitt, PJ Levine, VA Fusaro, SM Steinberg, GB Mills, C Simone, DA Fishman, EC Kohn, LA Liotta, "Use of proteomic patterns in serum to identify ovarian cancer", Lancet 359 (2002) 572–77

Check E., Proteomics and cancer: running before we can walk? Nature Jun 3;429(6991):496-7.

Example: OvaCheck Developed by Correlogic ( Blood test for the early detection of epithelial ovarian cancer Failed to obtain FDA approval Looks for subtle changes in patterns among the tens of thousands of proteins, protein fragments and metabolites in the blood Signature developed by genetic algorithm Significant artifacts in data collection & analysis questioned validity of the signature: -Results are not reproducible -Data collected differently for different groups of patients a.html

Main ingredients for developing a molecular signature

Base-Line Characteristics DF Ransohoff, "Bias as a threat to the validity of cancer molecular-marker research", Nat Rev Cancer 5 (2005)

How to Address Bias DF Ransohoff, "Bias as a threat to the validity of cancer molecular-marker research", Nat Rev Cancer 5 (2005)

Experimental Design - Summary Chance and bias is a threat to the conclusions from experiments Controllable and uncontrollable factors Randomization to guard against unknown and uncontrolled factors Replication (technical, process, and biological replicates) is used to estimate error in measurement and yields a more precise estimate. Blocking to control for known and controllable factors Multiple testing Molecular markers

Experimental Design - Summary Use your domain knowledge: using a designed experiment is not a substitute for thinking about the problem. Keep the design and analysis as simple as possible. Recognize the difference between practical and statistical significance. Design iterative experiments.