Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Immunity & Infection Research Centre Better Biomarkers in Transplantation. A Genome Canada Initiative for Human Health Rob McMaster, Gabriela Cohen-Freue,
Hypothesis Testing Steps in Hypothesis Testing:
Analysis of variance (ANOVA)-the General Linear Model (GLM)
Introduction to Regression Analysis
Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L10.1 CorrelationCorrelation The underlying principle of correlation analysis.
BHS Methods in Behavioral Sciences I April 25, 2003 Chapter 6 (Ray) The Logic of Hypothesis Testing.
Analysis of Variance. Experimental Design u Investigator controls one or more independent variables –Called treatment variables or factors –Contain two.
Differentially expressed genes
Multivariate Data Analysis Chapter 4 – Multiple Regression.
1 Test of significance for small samples Javier Cabrera.
Topic 3: Regression.
What Is Multivariate Analysis of Variance (MANOVA)?
Educational Research by John W. Creswell. Copyright © 2002 by Pearson Education. All rights reserved. Slide 1 Chapter 8 Analyzing and Interpreting Quantitative.
Today Concepts underlying inferential statistics
Chapter 14 Inferential Data Analysis
Elec471 Embedded Computer Systems Chapter 4, Probability and Statistics By Prof. Tim Johnson, PE Wentworth Institute of Technology Boston, MA Theory and.
Quantitative Business Analysis for Decision Making Multiple Linear RegressionAnalysis.
Inferential statistics Hypothesis testing. Questions statistics can help us answer Is the mean score (or variance) for a given population different from.
Validation of Analytical Method
1 Statistical Tools for Multivariate Six Sigma Dr. Neil W. Polhemus CTO & Director of Development StatPoint, Inc.
AM Recitation 2/10/11.
Chapter 4 Hypothesis Testing, Power, and Control: A Review of the Basics.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Copyright © 2008 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved. John W. Creswell Educational Research: Planning,
2007 GeneSpring MS GeneSpring for Metabolite BioMarker Analysis using Mass Spectrometry data Agilent Q-TOF VIP Visit Jan 16-17, 2007 Santa Clara, CA Thon.
Chapter 15 Data Analysis: Testing for Significant Differences.
Metrological Experiments in Biomarker Development (Mass Spectrometry—Statistical Issues) Walter Liggett Statistical Engineering Division Peter Barker Biotechnology.
© 2010 SRI International - Company Confidential and Proprietary Information Quantitative Proteomics: Approaches and Current Capabilities Pathway Tools.
Education Research 250:205 Writing Chapter 3. Objectives Subjects Instrumentation Procedures Experimental Design Statistical Analysis  Displaying data.
CJT 765: Structural Equation Modeling Class 7: fitting a model, fit indices, comparingmodels, statistical power.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Microarray data analysis David A. McClellan, Ph.D. Introduction to Bioinformatics Brigham Young University Dept. Integrative Biology.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Sample Size Considerations for Answering Quantitative Research Questions Lunch & Learn May 15, 2013 M Boyle.
ANOVA: Analysis of Variance.
Chapter Seventeen. Figure 17.1 Relationship of Hypothesis Testing Related to Differences to the Previous Chapter and the Marketing Research Process Focus.
Multivariate Data Analysis Chapter 1 - Introduction.
Automated Gating of Flow Cytometry Data using Rho Path Distance
Single-Factor Studies KNNL – Chapter 16. Single-Factor Models Independent Variable can be qualitative or quantitative If Quantitative, we typically assume.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.3 Two-Way ANOVA.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 22.
Introduction to Biostatistics and Bioinformatics Experimental Design.
For a specific gene x ij = i th measurement under condition j, i=1,…,6; j=1,2 Is a Specific Gene Differentially Expressed Differential expression.
Introducing Communication Research 2e © 2014 SAGE Publications Chapter Seven Generalizing From Research Results: Inferential Statistics.
Analyzing Statistical Inferences July 30, Inferential Statistics? When? When you infer from a sample to a population Generalize sample results to.
BIOSTATISTICS Hypotheses testing and parameter estimation.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Statistical Fundamentals: Using Microsoft Excel for Univariate and Bivariate Analysis Alfred P. Rovai Pearson Product-Moment Correlation Test PowerPoint.
Power Point Slides by Ronald J. Shope in collaboration with John W. Creswell Chapter 7 Analyzing and Interpreting Quantitative Data.
BHS Methods in Behavioral Sciences I May 9, 2003 Chapter 6 and 7 (Ray) Control: The Keystone of the Experimental Method.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Strategies for Metabolomic Data Analysis Dmitry Grapov, PhD.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
EQUIPMENT and METHOD VALIDATION
McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. Part Four ANALYSIS AND PRESENTATION OF DATA.
Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.1 Lecture 12: Generalized Linear Models (GLM) What are they? When do.
A new R package statTarget Hemi Luan Hong Kong Baptist University.
Genome Wide Association Studies using SNP
CJT 765: Structural Equation Modeling
Statistical Analysis and Design of Experiments for Large Data Sets
Simple Linear Regression
Chapter 18: The Chi-Square Statistic
Encoding of Stimulus Probability in Macaque Inferior Temporal Cortex
RES 500 Academic Writing and Research Skills
Presentation transcript:

Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007

Innovative Paths to Better Medicines Confidential Information – Do Not Reproduce or Distribute – page 2 Outline of Presentation Introduction: –Mass Spectrometry Data –Studies objectives and questions Statistical Processing of MS Data –Sample normalization –Removal of peak-specific batch and other temporal trends –Filtering of noisy peaks Design Considerations –Power calculations – for univariate biomarkers –Power calculations for multivariate biomarkers (regression)

Innovative Paths to Better Medicines Confidential Information – Do Not Reproduce or Distribute – page 3 Measurements: chemical compounds of different classes (proteins, lipids, polar and non-polar metabolites, amino acids, etc.) The variables constituting the data sets are peak intensities (peaks) identified by m/z and retention time. The peak intensities are proportional to the amount of analyte detected by the mass spectrometer. Note that p >> n! MS of Individual Peaks Total Ion Chromatogram Selected Ion Chromatogram Figure modified from: biological samplesQC samples Mass Spectrometry Data

Innovative Paths to Better Medicines Confidential Information – Do Not Reproduce or Distribute – page 4 Questions Design Experiment Statistical Processing Data Analysis Objectives Structure of a Molecular Biomarker Discovery Study

Innovative Paths to Better Medicines Confidential Information – Do Not Reproduce or Distribute – page 5 Questions Design Experiment Processing Analysis Objectives Questions DiagnosisElucidation of Mechanisms of Action (MoA) What is a minimal set of biomarkers? What are all the biomarkers? What are the molecular pathways? Questions Biomarker: A characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes, or pharmacologic response(s) to a therapeutic intervention. Studies Objectives and Questions

Innovative Paths to Better Medicines Confidential Information – Do Not Reproduce or Distribute – page 6 Outline of Presentation Introduction: –Mass Spectrometry Data –Studies objectives and questions Statistical Processing of MS Data –Sample normalization –Removal of peak-specific batch and other temporal trends –Filtering of noisy peaks Design Considerations –Power calculations – for univariate biomarkers –Power calculations for multivariate biomarkers (regression)

Innovative Paths to Better Medicines Confidential Information – Do Not Reproduce or Distribute – page 7 Sample normalization –correction of baseline differences between samples Removal of peak-specific batch and other temporal trends –due to instrument and processing limitations, samples are acquired sequentially in batches – peaks exhibit batch-to-batch variation; –instrument performance may become unstable over time, samples may undergo degradation. These are main causes for temporal variation observed in peak intensities. Filtering of noisy peaks –for each biological sample replicate measurements are obtained; –the estimated correlation between these replicates is used as a filter for noisy data. Statistical Processing Presented at IBC’s Biomarkers and Molecular Diagnostic conferences September 2006

Innovative Paths to Better Medicines Confidential Information – Do Not Reproduce or Distribute – page 8 Correction of baseline differences between samples. Based on Internal Standards. Internal Standards are known exogenous compounds, added to the biological samples in fixed amounts at the beginning of the sample preparation stage (same for all samples). Used to account for sample variability (e.g., pipetting errors) during sample preparation and acquisition. Sample Normalization

Innovative Paths to Better Medicines Confidential Information – Do Not Reproduce or Distribute – page 9 Typical Sample Profiles of IS Peaks – before Normalization

Innovative Paths to Better Medicines Confidential Information – Do Not Reproduce or Distribute – page 10 Normalization – the statistical procedure of multivariate scaling of samples based on (a subset of) IS peaks. Y = log(intensity); i = 1,…,I IS peak; j = 1,…,J sample. The sample-specific factors,, are estimated in this ANOVA model and removed from all peaks. Sample Normalization

Innovative Paths to Better Medicines Confidential Information – Do Not Reproduce or Distribute – page 11 Through normalization, temporal trends common to all peaks are removed. Typical Sample Profiles of IS Peaks – after Normalization

Innovative Paths to Better Medicines Confidential Information – Do Not Reproduce or Distribute – page 12 Typical Temporal Profiles of IS Peaks – before Normalization

Innovative Paths to Better Medicines Confidential Information – Do Not Reproduce or Distribute – page 13 Typical Temporal Profiles of IS Peaks – after Normalization

Innovative Paths to Better Medicines Confidential Information – Do Not Reproduce or Distribute – page 14 Sample normalization –correction of baseline differences between samples Removal of peak-specific batch and other temporal trends –due to instrument and processing limitations, samples are acquired sequentially in batches – peaks exhibit batch-to-batch variation; –instrument performance may become unstable over time, samples may undergo degradation. These are main causes for temporal variation observed in peak intensities. Filtering of noisy peaks –for each biological sample replicate measurements are obtained; –the estimated correlation between these replicates is used as a filter for noisy data. Statistical Processing

Innovative Paths to Better Medicines Confidential Information – Do Not Reproduce or Distribute – page 15 Peak-Specific Temporal Trends – after Normalization

Innovative Paths to Better Medicines Confidential Information – Do Not Reproduce or Distribute – page 16 The within and between batch patterns cause visible batch separations: If one does not account for these intrinsic experimental trends, important biological effects may be obscured. The Need for Batch Corrections PCA Plot: Data set after Normalization Colored by Batch first principal component second principal component

Innovative Paths to Better Medicines Confidential Information – Do Not Reproduce or Distribute – page 17 Based on QC samples (ideally) –QC samples: a pool of material from the biological samples in a study, aliquoted into a set of identical samples that are acquired at specific intervals in each batch of samples. Removal of Peak-Specific Temporal Trends

Innovative Paths to Better Medicines Confidential Information – Do Not Reproduce or Distribute – page 18 Temporal trend within batch b (b=1,…,B batches): estimated based on QC samples within batch b Removal of Peak-Specific Temporal Trends

Innovative Paths to Better Medicines Confidential Information – Do Not Reproduce or Distribute – page 19 Sample normalization –correction of baseline differences between samples Removal of peak-specific batch and other temporal trends –due to instrument and processing limitations, samples are acquired sequentially in batches – peaks exhibit batch-to-batch variation; –instrument performance may become unstable over time, samples may undergo degradation. These are main causes for temporal variation observed in peak intensities. Filtering of noisy peaks –for each biological sample replicate measurements are obtained; –the estimated correlation between these replicates is used as a filter for noisy data. Statistical Processing

Innovative Paths to Better Medicines Confidential Information – Do Not Reproduce or Distribute – page 20 When the same sample is measured several times, we require the measurements to correlate well. The correlation between replicates can be expressed as a tradeoff between the biological variance ( ) and the measurement error variance ( ). Ideal case: no measurement error . The estimated correlation,, can be used to filter noisy peaks. Correlations between Biological Replicates

Innovative Paths to Better Medicines Confidential Information – Do Not Reproduce or Distribute – page 21 Examples of Correlations (two extremes)

Innovative Paths to Better Medicines Confidential Information – Do Not Reproduce or Distribute – page 22 Outline of Presentation Introduction: –Mass Spectrometry Data –Studies objectives and questions Statistical Processing of MS Data –Sample normalization –Removal of peak-specific batch and other temporal trends –Filtering of noisy peaks Design Considerations –Power calculations – for univariate biomarkers –Power calculations for multivariate biomarkers (regression)

Innovative Paths to Better Medicines Confidential Information – Do Not Reproduce or Distribute – page 23 The power in biomarker discovery studies is a function of: –The sample size –The separation between the groups (e.g., MFC) –The proportion of biomarkers in the data set –The false discovery rate (FDR) allowed –The platform variability –The within-group variability –Other factors (e.g. other covariates in the model) ? Power Calculations Statistical power = probability to detect biomarkers

Innovative Paths to Better Medicines Confidential Information – Do Not Reproduce or Distribute – page 24 The power in biomarker discovery studies is a function of: –The sample size –The separation between the groups (e.g., MFC) –The proportion of biomarkers in the data set –The false discovery rate (FDR) allowed –The platform variability –The within-group variability –Other factors (e.g. other covariates in the model) ? Power Calculations Statistical power = probability to detect biomarkers

Innovative Paths to Better Medicines Confidential Information – Do Not Reproduce or Distribute – page 25 : MFC = 1.7 : MFC = 2.0 : MFC = 3.0 solid: FDR  0.1 dashed: FDR  0.2 Illustration I: Power Curves

Innovative Paths to Better Medicines Confidential Information – Do Not Reproduce or Distribute – page 26 : MFC = 1.7 : MFC = 2.0 : MFC = 3.0 solid: FDR  0.1 dashed: FDR  0.2 Illustration I: Power Curves

Innovative Paths to Better Medicines Confidential Information – Do Not Reproduce or Distribute – page 27 : MFC = 1.7 : MFC = 2.0 : MFC = 3.0 dotted: Estimated FDR There is no loss in power, (proportion of biomarkers discovered) BUT the FDR may be undesirable. FRD Power Curves Not Accounting for the FDR

Innovative Paths to Better Medicines Confidential Information – Do Not Reproduce or Distribute – page 28 Power Calculation for Multivariate Biomarkers (Regression) Classical Setting n > p Linear regression model Parametric (F) test of model significance Computationally inexpensive Biomarker Discovery Setting n << p Regression with constraints on parameters (elastic net) Dimensionality reduction needed (through cross- validation) Non-parametric (label permutations) test of model significance Computationally very expensive

Innovative Paths to Better Medicines Confidential Information – Do Not Reproduce or Distribute – page 29 Illustration: Power for Regression Model Multivariate biomarker Parameter of interest Test:  = 0 Power = proportion of times that this hypothesis is rejected

Innovative Paths to Better Medicines Confidential Information – Do Not Reproduce or Distribute – page 30 Power Calculation – Regression rho Number of Samples Power Biomarker with 10 Components (known in advance) …10 minutes to calculate Biomarker with 10 Components (buried among 90 other analytes) …days to calculate

Innovative Paths to Better Medicines Confidential Information – Do Not Reproduce or Distribute – page 31 Thank you!