1 Détection et élimination de l'erreur systématique lors du processus de criblage à haut débit Vladimir Makarenkov Université du Québec à Montréal (UQAM)

Slides:



Advertisements
Similar presentations
Errors in Chemical Analyses: Assessing the Quality of Results
Advertisements

1 Chapter 4 Experiments with Blocking Factors The Randomized Complete Block Design Nuisance factor: a design factor that probably has an effect.
Chapter 4 Randomized Blocks, Latin Squares, and Related Designs
Experimental Design, Response Surface Analysis, and Optimization
Analysis of variance (ANOVA)-the General Linear Model (GLM)
Microarray Normalization
Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners David Jensen and Jennifer Neville.
Using process knowledge to identify uncontrolled variables and control variables as inputs for Process Improvement 1.
Laboratory Quality Control
Evaluating Hypotheses
Chapter 28 Design of Experiments (DOE). Objectives Define basic design of experiments (DOE) terminology. Apply DOE principles. Plan, organize, and evaluate.
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions.
Experimental Evaluation
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Quality Assurance in the clinical laboratory
Chemometrics Method comparison
Hypothesis Testing:.
Hypothesis Testing.
Determining Sample Size
1 Chapter 1: Introduction to Design of Experiments 1.1 Review of Basic Statistical Concepts (Optional) 1.2 Introduction to Experimental Design 1.3 Completely.
بسم الله الرحمن الرحيم * this presentation about :- “experimental design “ * Induced to :- Dr Aidah Abu Elsoud Alkaissi * Prepared by :- 1)-Hamsa karof.
One-Factor Experiments Andy Wang CIS 5930 Computer Systems Performance Analysis.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Comparing Three or More Means 13.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
Measures of Variability In addition to knowing where the center of the distribution is, it is often helpful to know the degree to which individual values.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Yaomin Jin Design of Experiments Morris Method.
Chapter 12 The Analysis of Categorical Data and Goodness-of-Fit Tests.
1 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 12 The Analysis of Categorical Data and Goodness-of-Fit Tests.
Laboratory QA/QC An Overview.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
1 Chapter 1: Introduction to Design of Experiments 1.1 Review of Basic Statistical Concepts (Optional) 1.2 Introduction to Experimental Design 1.3 Completely.
Chapter 9 Three Tests of Significance Winston Jackson and Norine Verberg Methods: Doing Social Research, 4e.
Chapter 14 Repeated Measures and Two Factor Analysis of Variance
Copyright © Cengage Learning. All rights reserved. 12 Analysis of Variance.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Inferential Statistics Introduction. If both variables are categorical, build tables... Convention: Each value of the independent (causal) variable has.
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
Copyright © 2009 Pearson Education, Inc. 8.1 Sampling Distributions LEARNING GOAL Understand the fundamental ideas of sampling distributions and how the.
Vladimir Makarenkov Université du Québec à Montréal (UQAM)
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 5. Measuring Dispersion or Spread in a Distribution of Scores.
Tutorial I: Missing Value Analysis
1 Statistical Analysis Professor Lynne Stokes Department of Statistical Science Lecture 9 Review.
From the population to the sample The sampling distribution FETP India.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Lecture 8: Measurement Errors 1. Objectives List some sources of measurement errors. Classify measurement errors into systematic and random errors. Study.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
Lecture 7: Bivariate Statistics. 2 Properties of Standard Deviation Variance is just the square of the S.D. If a constant is added to all scores, it has.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Chapter 5: Errors in Chemical Analysis. Errors are caused by faulty calibrations or standardizations or by random variations and uncertainties in results.
Chapter 1: The Nature of Analytical Chemistry
Copyright © 2009 Pearson Education, Inc t LEARNING GOAL Understand when it is appropriate to use the Student t distribution rather than the normal.
STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent.
Estimating standard error using bootstrap
Comparing Three or More Means
PCB 3043L - General Ecology Data Analysis.
Laboratory Quality Control
Basic Training for Statistical Process Control
Basic Training for Statistical Process Control
Daniela Stan Raicu School of CTI, DePaul University
Daniela Stan Raicu School of CTI, DePaul University
Volume 5, Issue 4, Pages e4 (October 2017)
Objectives 6.1 Estimating with confidence Statistical confidence
Objectives 6.1 Estimating with confidence Statistical confidence
Presentation transcript:

1 Détection et élimination de l'erreur systématique lors du processus de criblage à haut débit Vladimir Makarenkov Université du Québec à Montréal (UQAM) Montréal, Canada

2 Talk outline What is HTS ? Hit selection Random and systematic error Statistical methods to correct HTS measurements:  Removal of the evaluated background  Well correction Comparison of methods for HTS data correction and hit selection Conclusions

3 What is HTS? High-throughput screening (HTS) is a large-scale process to screen thousands of chemical compounds to identify potential drug candidates (i.e. hits). Malo, N. et al., Nature Biotechnol. 24, no. 2, (2006).

4 A typical HTS/HCS procedure. Blue colored wells contain negative controls, red colored wells contain positive controls and yellow colored wells contain the library compounds.

5 What is HTS? An HTS procedure consists of running chemical compounds arranged in 2-dimensional plates through an automated screening system that makes experimental measurements.

6 HTS technology Automated plate handling system and plates of SSI Robotics

7 HTS – main features Samples are located in wells The plates are processed in sequence Screened samples can be divided into active (i.e. hits) and inactive ones Most of the samples are inactive The measured values for the active samples are substantially different from the inactive ones

8 Classical hit selection Compute the mean value  and standard deviation  of the observed plate (or of the whole assay) Identify samples whose values differ from the mean  by at least c  (where c is a preliminary chosen constant) For example, in case of an inhibition assay, by choosing c = 3, we select samples with values lower than  - 3 

9 Random and systematic error Random error : a variability that randomly deviates all measured values from their “true” levels. This is a result of random influences in each particular well on a particular plate (~ random noise). Usually, it cannot be detected or compensated. Systematic error : a systematic variability of the measured values along all plates of the assay. It can be detected and its effect can be removed from the data.

10 Batch effect Batch positional effect appearing in the McMaster Test assay screened during the McMaster Data Mining and Docking Competition. The first 24 plates of the assay are shown (12 original and 12 replicated plates; the plate number is indicated on each plate; the replicated plates are indicated by the letter R). Each original plate is followed by its replicate. Hits are shown in blue. Batch effect occurs in well (8,9) on each plate (i.e., well H10, according to the McMaster annotation) – see Caraus et al

11 Systematic error in HTS data – estimation using t-test Proportion of rows and columns affected by systematic bias in 41 experimental HTS assays (735 plates in total) aiming at the inhibition of the E.coli. Experimental data were extracted from ChemBank. The following hit selection thresholds were used to identify hits and establish hit distribution surfaces of the assays: µ-2.5σ (◊), µ-3σ (∆) and µ-3.5σ (○), where µ and σ are, respectively, the mean and standard deviation of the plate’s measurements - see Caraus et al

12 Sources of systematic error Various sources of systematic errors can affect experimental HTS data, and thus introduce a bias into the hit selection process (Heuer et al. 2003), including: Systematic errors caused by ageing, reagent evaporation or cell decay. They can be recognized as smooth trends in the plate means/medians. Errors in liquid handling and malfunction of pipettes can also generate localized deviations from expected values. Variation in incubation time, time drift in measuring different wells or different plates, and reader effects.

13 Random and systematic error Examples of systematic error (computed and plotted by HTS Corrector): HTS assay, Chemistry Department, Princeton University : inhibition of the glycosyltransferase MurG function of E. coli., 164 plates, 22 rows and 16 columns. Hit distribution by rows and columns of Assay 1 computed for 164 plates. Hits were selected with the threshold  -1 .

14 Hit distribution surface of Assay 1 (a) raw data (b) approximated data

15 Raw and evaluated background: an example Computed and plotted by HTS Corrector

16 Data normalization (called z-score normalization) Applying the following formula, we can normalize the elements of the input data: where x i - input element, x' i - normalized output element,  - mean value,  - standard deviation, and n – total number of elements in the plate. The output data conditions will be  x’ =0 and  x’ =1.

17 Evaluated background An assay background can be defined as the mean of normalized plate measurements, i.e. where : x' i,j - normalized value in well i of plate j, z i - background value in well i, N - total number of plates.

18 Removing the evaluated background: main steps Normalization of the HTS data by plate Elimination of the outliers Computation of the evaluated background Elimination of systematic errors by subtracting the evaluated background from the normalized data Selection of hits in the corrected data

19 Removing the evaluated background: hit distribution surface before and after the correction

20 Hit distribution after the correction HTS assay, Chemistry Department, Princeton University : inhibition of the glycosyltransferase MurG function of E. coli., 164 plates, 22 rows and 16 columns. Hit distribution by rows in Assay 1 (164 plates): (a) hits selected with the threshold  -1  ; (b) hits selected with the threshold  -2 

21 Well correction method: main ideas Once the data are plate normalized, it is possible to analyze their values in each particular well along the entire assay The distribution of inactive measurements along a fixed well should be zero-mean centered (if there is no systematic error) and the compounds are randomly distributed along this well Values along each well may have ascending and descending trends that can be detected via a polynomial approximation

22 Example of systematic error in the McMaster dataset Hit distribution surface for a McMaster dataset (1250 plates - a screen of compounds that inhibit the Escherichia coli dihydrofolate reductase). Values deviating from the plate means for more than 1 standard deviation (SD) were taken into account during the computation. Well, row and column positional effects are shown.

23 Well correction method: main steps Normalize the data within each plate (plate normalization) Compute the trend along each well using polynomial approximation Subtract the obtained trend from the plate normalized values Normalize the data along each well (well normalization) Reexamine the hit distribution surface

24 Well correction method: an example McMaster University HTS laboratory screen of compounds to inhibit the Escherichia coli dihydrofolate reductase plates with 8 rows and 10 columns.

25 Hit distribution for different thresholds for the raw and corrected McMaster data Hit distributions for the raw (a, c, and e) and well- corrected (b, d, and f) McMaster datasets (a screen of compounds that inhibit the Escherichia coli dihydrofolate reductase; 1250 plates of size 8x10) obtained for the thresholds: (a and b) :  (c and d) :  – 1.5  (e and f) :  - 2 

26 Simulations with random data: Constant noise Comparison of the hit selection methods Hit detection rateFalse positives □ - classical hit selection O - background subtraction method ∆ - well correction method

27 Comparison of methods for data correction in HTS (1) Method 1. Classical hit selection using the value  - c  as a hit selection threshold, where the mean value  and standard deviation  are computed separately for each plate and c is a preliminary chosen constant (usually c equals 3). Method 2. Classical hit selection using the value  - c  as a hit selection threshold, where the mean value  and standard deviation  are computed over all the assay values, and c is a preliminary chosen constant. This method can be chosen when we are certain that all plates of the given assay were processed under the “same testing conditions”.

28 Comparison of methods for data correction in HTS (2) Method 3. Median polish procedure (Tukey 1977) can be used to remove the impact of systematic error. Median polish works by alternately removing the row and column medians. In our study, Method 2 was applied to the values of the matrix of the obtained residuals in order to select hits. Method 4 (designed at Merck Frost). B score normalization procedure (Brideau et al., 2003) is designed to remove row/column biases in HTS (Malo et al., 2006). The residual (r ijp ) of the measurement for row i and column j on the p-th plate is obtained by median polish. The B score is calculated as follows: B score = where MAD p = median{|r ijp – median(r ijp )|}. To select hits, this computation was followed by Method 2, applied to the B score matrix.

29 Comparison of methods for data correction in HTS (3) Method 5. Well correction procedure followed by Method 2 applied to the well-corrected data. The well correction method consists of two main steps: 1. Least-squares approximation of the data carried out separately for each well of the assay. 2. Z score normalization of the data within each well across all plates of the assay

30 Simulation design Random standard normal datasets N(0, 1) were generated. Different percentages of hits (1% to 5%) were added to them. Systematic error following a standard normal distribution with parameters N(0, c), where c equals 0, 0.6 , 1.2 , 1.8 , 2.4 , or 3  was added to them. Two types of systematic error were : A. Systematic errors stemming from row x column interactions: Different constant values were applied to each row and each column of the first plate. The same constants were added to the corresponding rows and columns of all other plates. B. Systematic error stemming from changing row x column interactions: As in (A) but with the values of the row and column constants varying across plates.

31 Systematic error stemming from constant row x column interactions A. Systematic errors stemming from row x column interactions: Different constant values were applied to each row and each column of the first plate. The same constants were added to the corresponding rows and columns of all other plates.

32 Results for systematic error stemming from row x column interactions which are constant across plates (SD =  ) The results were obtained with the methods using plates’ parameters (i.e., Method 1, ◊), assay parameters (i.e., Method 2, □), median polish (x), B score (○), and well correction (  ). The abscissa axis indicates the noise factor (a and c - with fixed hit percentage of 1%) and the percentage of added hits (b and d - with fixed error rate of 1.2SD).

33 Systematic error stemming from varying row x column interactions B. Systematic error stemming from changing row x column interactions: As in (A) but with the values of the row and column constants varying across plates.

34 Results for systematic error stemming from row x column interactions which are varying across plates (SD =  ) The results were obtained with the methods using plates’ parameters (i.e., Method 1, ◊), assay parameters (i.e., Method 2, □), median polish (x), B score (○), and well correction (  ). The abscissa axis indicates the noise factor (a and c - with fixed hit percentage of 1%) and the percentage of added hits (b and d - with fixed error rate of 1.2SD).

35 Recommended data pre- processing and correction protocol to be performed prior to the hit identification step in high-throughput and high-content screening (Caraus et al. 2015).

36 HTS Corrector Software Visualization of HTS assays Data partitioning using k-means Evaluation of the background surface Correction of experimental datasets Hit selection using different methods Chi-square analysis of hit distribution Main features Contact: Download from:

37 Conclusions The well correction method rectifies the distribution of assay measurements by normalizing data within each considered well across all assay plates. Simulations suggest that the well correction procedure is a robust method that should be used prior to the hit selection process. When neither hits nor systematic errors were present in the data, the well correction method showed the performances similar to the traditional methods of hit selection. Well correction generally outperformed the Median polish and B score methods as well as the classical hit selection procedure. Simulation study confirmed that Method 2 based on the assay parameters was more accurate than Method 1 based on the plates’ parameters. Therefore, in case of identical testing conditions for all plates of the given assay, all assay measurements should be treated as a single batch.

38 References Brideau C., Gunter B., Pikounis W., Pajni N. and Liaw A. (2003) Improved statistical methods for hit selection in HTS. J. Biomol. Screen., 8, Caraus, I., Alsuwailem, A., Nadon, R. and Makarenkov, V. (2015), Detecting and overcoming systematic bias in high-throughput screening technologies: a comprehensive review of practical issues and methodological solutions, Briefings in Bioinformatics, 16 (6), Dragiev, P., Nadon, R. and Makarenkov, V. (2011). Systematic error detection in experimental high-throughput screening, BMC Bioinformatics, 12:25. Heyse S. (2002) Comprehensive analysis of high-throughput screening data. In Proc. of SPIE 2002, 4626, Kevorkov D. and Makarenkov V. (2005) Statistical analysis of systematic errors in HTS. J. Biomol. Screen., 10, Makarenkov V., Kevorkov D., Zentilli P., Gagarin A., Malo N. and Nadon R. (2006) HTS- Corrector: new application for statistical analysis and correction of experimental data, Bioinformatics, 22, Makarenkov V., Zentilli P., Kevorkov D., Gagarin A., Malo N. and Nadon R. (2007) An efficient method for the detection and elimination of systematic error in HTS, Bioinformatics, 23, Malo N., Hanley J.A., Cerquozzi S., Pelletier J. and Nadon R. (2006) Statistical practice in HTS data analysis. Nature Biotechnol., 24,