1 How much and how many? Guidance on the extent of validation/verification studies S L R Ellison Science for a safer world.

Slides:



Advertisements
Similar presentations
Sources of uncertainty and current practice for addressing them: analytical perspective Roy Macarthur
Advertisements

Statistics.  Statistically significant– When the P-value falls below the alpha level, we say that the tests is “statistically significant” at the alpha.
Hypothesis Testing Steps in Hypothesis Testing:
Hypothesis: It is an assumption of population parameter ( mean, proportion, variance) There are two types of hypothesis : 1) Simple hypothesis :A statistical.
Confidence Interval and Hypothesis Testing for:
Chapter 7: Statistical Applications in Traffic Engineering
Hypothesis : Statement about a parameter Hypothesis testing : decision making procedure about the hypothesis Null hypothesis : the main hypothesis H 0.
PSY 307 – Statistics for the Behavioral Sciences
BCOR 1020 Business Statistics Lecture 21 – April 8, 2008.
UNDERSTANDING RESEARCH RESULTS: STATISTICAL INFERENCE © 2012 The McGraw-Hill Companies, Inc.
Inferences About Process Quality
PSY 307 – Statistics for the Behavioral Sciences Chapter 19 – Chi-Square Test for Qualitative Data Chapter 21 – Deciding Which Test to Use.
Today Concepts underlying inferential statistics
On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Statistics made simple Modified from Dr. Tammy Frank’s presentation, NOVA.
Richard M. Jacobs, OSA, Ph.D.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
Chemometrics Method comparison
Statistical hypothesis testing – Inferential statistics I.
Inferential Statistics
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Statistical Inference Dr. Mona Hassan Ahmed Prof. of Biostatistics HIPH, Alexandria University.
AM Recitation 2/10/11.
Experimental Statistics - week 2
Testing Hypotheses I Lesson 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics n Inferential Statistics.
Chapter 8 Introduction to Hypothesis Testing
1/2555 สมศักดิ์ ศิวดำรงพงศ์
Copyright © Cengage Learning. All rights reserved. 10 Inferences Involving Two Populations.
+ Chapter 9 Summary. + Section 9.1 Significance Tests: The Basics After this section, you should be able to… STATE correct hypotheses for a significance.
CHAPTER 16: Inference in Practice. Chapter 16 Concepts 2  Conditions for Inference in Practice  Cautions About Confidence Intervals  Cautions About.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Hypothesis Testing: One Sample Cases. Outline: – The logic of hypothesis testing – The Five-Step Model – Hypothesis testing for single sample means (z.
Chapter 8 Introduction to Hypothesis Testing
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
t(ea) for Two: Test between the Means of Different Groups When you want to know if there is a ‘difference’ between the two groups in the mean Use “t-test”.
The Scientific Method Formulation of an H ypothesis P lanning an experiment to objectively test the hypothesis Careful observation and collection of D.
Measurement Uncertainty Information in
PSY 307 – Statistics for the Behavioral Sciences Chapter 16 – One-Factor Analysis of Variance (ANOVA)
ANOVA (Analysis of Variance) by Aziza Munir
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
1 STATISTICAL HYPOTHESIS Two-sided hypothesis: H 0 :  = 50H 1 :   only here H 0 is valid all other possibilities are H 1 One-sided hypothesis:
Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests.
Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.
ANALYSIS OF VARIANCE (ANOVA) BCT 2053 CHAPTER 5. CONTENT 5.1 Introduction to ANOVA 5.2 One-Way ANOVA 5.3 Two-Way ANOVA.
Why do we need to do it? What are the basic tools?
Academic Research Academic Research Dr Kishor Bhanushali M
Sample Size Determination Text, Section 3-7, pg. 101 FAQ in designed experiments (what’s the number of replicates to run?) Answer depends on lots of things;
Chapter 8 Parameter Estimates and Hypothesis Testing.
Three Broad Purposes of Quantitative Research 1. Description 2. Theory Testing 3. Theory Generation.
Chapter 9: Testing Hypotheses Overview Research and null hypotheses One and two-tailed tests Type I and II Errors Testing the difference between two means.
Copyright © Cengage Learning. All rights reserved. 12 Analysis of Variance.
Chapter 10 The t Test for Two Independent Samples
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics S eventh Edition By Brase and Brase Prepared by: Lynn Smith.
Chapter 12 Introduction to Analysis of Variance PowerPoint Lecture Slides Essentials of Statistics for the Behavioral Sciences Eighth Edition by Frederick.
Validation Defination Establishing documentary evidence which provides a high degree of assurance that specification process will consistently produce.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Chi-Square Goodness of Fit Test. In general, the chi-square test statistic is of the form If the computed test statistic is large, then the observed and.
T tests comparing two means t tests comparing two means.
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
Chapter 13 Understanding research results: statistical inference.
Hypothesis Tests u Structure of hypothesis tests 1. choose the appropriate test »based on: data characteristics, study objectives »parametric or nonparametric.
Hypothesis Tests. An Hypothesis is a guess about a situation that can be tested, and the test outcome can be either true or false. –The Null Hypothesis.
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 4 Investigating the Difference in Scores.
Copyright © 2009 Pearson Education, Inc t LEARNING GOAL Understand when it is appropriate to use the Student t distribution rather than the normal.
EQUIPMENT and METHOD VALIDATION
Chapter 12 Introduction to Analysis of Variance
Establishing by the laboratory of the functional requirements for uncertainty of measurements of each examination procedure Ioannis Sitaras.
Chapter 9 Introduction to the t Statistic
i) Two way ANOVA without replication
Presentation transcript:

1 How much and how many? Guidance on the extent of validation/verification studies S L R Ellison Science for a safer world

Introduction Performance characteristics –How many need to be examined? Experiment size –How many samples and replicates are needed? Minimising the workload: –Multiple characteristics from single studies –Maximising information with efficient experiments 2

How many performance characteristics need to be examined?

A validation puzzle Uncertainty Precision Bias/ Trueness Detection limits Linearity Ruggedness Selectivity Working range Statistics

Typical guidance on characteristics for study (ICH) 5 ICH Q2(R1) (1994)

Typical guidance on characteristics for study (Eurachem) 6

Typical guidance on characteristics for study (IUPAC)* 7 Performance Characteristics Previous validation Full 1 Full 1 New matrix Basic (Literature) Bias Repeatability Reproducibility Linearity?? Ruggedness-- Detection limitNot mentioned – depends on use * IUPAC Harmonised guidelines on single-laboratory validation Selected examples for quantitative analysis shown Note 1. “Full” validations includes collaborative study

Performance characteristic requirements Broadly similar across sectors Bias/trueness, precision and linearity always required for quantitative methods Detection capability usually examined Ruggedness requirements depend on sector –All agree that ruggedness can be useful in development –Some require ruggedness as part of a standardised validation suite 8

9 Experiment size How many observations?

Selected guidance on experiment size 10 Performance Characteristics Guidance document ICH Q2IUPAC SLVEurachem Bias/Trueness3 levels in triplicate-10 replicates** Repeatability3 levels in triplicate-6 – 15 replicates** Reproducibility--6 – 15 in duplicate** Linearity5 levels6 levels in duplicate6-10 levels 2-3 times each Detection limit-10 replicates Ruggedness*- † ǂ - ǂ ‘-’ No numerical guidance given * ‘Robustness’ in ICH guidance † Example conditions suggested ǂ Experimental designs suggested ** Per concentration/material studied

11 Test power for sample size calculation

12 Some nomenclature Type I error: Incorrect rejection of the null hypothesis –Concluding there is an effect when there is none. A false positive. Type II error: Incorrect acceptance of the null hypothesis –Failing to find a real effect; a false negative. Power (of a test): The probability of correctly rejecting the null hypothesis when it is false. –Equal to 1 minus the Type II error probability.

The concept of test power Zero effectEffect size of interest Critical value Error distribution around zero  Error distribution around effect of interest  Test power = 1 - 

n = 4 Power for some t tests

n = 16 Power for some t tests It takes 16 observations to find a bias as small as 1 standard deviation (with 95% power)

16 Calculating test power: Required information A calculation of minimum sample size for a given test power requires: a) The type of test (t-test, F-test etc.) and the details (One- or two-tailed? etc); b) The size of the effect that is of interest; c) The typical standard deviation s; d) The required level of significance for the test (the Type I error probability  ) and e) The desired test power, usually expressed in terms of the probability  of a Type II error. - Typically 80% or 95%

Test power basis for bias experiments 17  /s n Number of observations for 95% power at 95% confidence. NIST special publication 829: Use of NIST Standard Reference Materials for decisions on performance of analytical chemical methods and laboratories  = Size of bias to be detected s = Available precision n = Number of observations required

Power for precision experiments Can be calculated where a hypothesis test is intended –Chi-squared test for significantly exceeding required precision –F test for different precision in two groups Typical experiments are not very powerful for detecting excess dispersion –Detecting 40% excess dispersion* requires 7 replicates at 80% power and 18 at 95% power 18 * If the required precision is , true precision of 1.41  will give a positive chi-squared test result 80% of the time with 7 replicates

Caveats Power calculations rely on assumptions –Likely effect size –Available precision –Distribution under the ‘alternate’ hypothesis These assumptions may be quite poor Power analysis is very useful for comparing designs under similar assumptions –... but don’t over-interpret

Future directions Draft IUPAC guidance: Experiments for Single Laboratory Validation Of Methods of Analysis: Harmonized Guidelines Sets 3 levels of ‘stringency’ –Verification, validation, stringent validation Provides ‘model experiments’ Permits any other experiment that gives the same test power Gives guidance on number of materials, replication level, size of experiment and ‘stringency’ of validation 20

Draft IUPAC guidance – experiment size 21 Look out for IUPAC consultation

22 Part 2: Getting more for less

Strategies for reducing validation effort Get more than one performance characteristic from a single experiment Get more information from one experiment Use efficient designs to minimise experiment size 23

Example 1: Bias from a precision experiment UK MCERTS soil testing standards set limits for bias (+- 10%) and precision (5%) of test methods ‘11 x 2’ design recommended –11 days/runs, in duplicate 3 soil types, ideally using CRMs ANOVA used to determine repeatability and intermediate precision Bias checked using a modified t test 24

Example 1 cont. 25 mean

Example 1 interpretation Initial inspection –Mean Cd: (more than 10% bias) Significance test: is bias significantly greater than 10%? %-10% One-sided test P-value (one-sided, 11* df): 0.31 * Welch-Satterthwaite calculation on ANOVA MS

Example 2: SANCO precision and detection capability 3 runs of 7 observations 3 concentrations 27 Precision at 3 levels Bias at 3 levels Linearity review Detection capability using ISO 11843

28 Efficient experiments Experimental design

Efficient ruggedness designs Ruggedness typically requires examination of multiple effects Single-effect study needs n observations at at least 2 levels –6 effects -> 12n observations Factorial designs can be better for small studies –But 2 6 = 64 … 29

AOAC recommended ruggedness design Up to 7 effects in 8 runs Equivalent to n = 4 for 7 parameters

Example problem HPLC analysis of Tartrate for monitoring Method based on aqueous extraction, SPE cleanup and HPLC Factors of interest: –Sample size –SPE flow rate –Additional SPE cleanup stage (is it useful?) –LC flow rate –LC Column temperature –LC Buffer pH

Practical problems The basic AOAC design leaves no degrees of freedom –and the tartrate design only one LC Temperature and buffer pH cannot be changed randomly during a run –These four combinations must be in different runs “Quick” answer: –Four runs allows replication of SPE experiments and leaves a degree of freedom for the LC factors after allowing for run effects

Responses Primary interest: Measured tartrate or tartrate recovery Also of interest: –Is the chromatography likely to be stable? –Can we ‘measure’ chromatographic quality at the same time as tartrate? Solution: Monitor LC retention time and LC resolution (theoretical plate count) in the same experiment –We get the information essentially free

Results - Recovery

Results – Retention time

Results – LC Resolution

An unexpected bonus

Implications for the tartrate method DO use additional SPE cleanup DON’T increase the sample size greatly past 2g DO keep the LC flow low to keep resolution high DO consider checking LC resolution on each sample –if the resolution slips, the result may slip with it

Ruggedness test conclusions Ruggedness testing isn’t as simple as AOAC make it look Monitoring more than one ‘response’ is often simple... and can add a lot to the information available

Conclusions Extent of validation is still not harmonised across sectors Different guidance still leaves some experiment sizes unclear Draft IUPAC Guidance may assist – watch that space! Power calculations can help, especially in comparing experiments It is possible to ‘work smarter’ for method validation –More characteristics per experiment –Careful design –More ‘responses’ studied

41