Interlaboratory comparisons Ivo Leito Materials:
Ljubljana Interlaboratory comparisons Interlaboratory comparison (ILC) Organization, performance and statistical evaluation of the results of tests on the same or similar test items by two or more laboratories in accordance with predetermined conditions. JCGM 200:2012 International vocabulary of metrology — Basic and general concepts and associated terms (VIM 3)
Ljubljana Types of ILCs according to their objectives 1.ILC for assessing the competence of laboratories (proficiency testing, PT); –This for us the most important type of interlaboratory comparison 2.ILC for establishing the certified values of reference materials; 3.ILC for validating or standardizing an analysis procedure; 4.ILC for establishing traceability
4 Proficiency testing (PT)
Ljubljana Proficiency testing Proficiency testing (PT) Interlaboratory comparisons to assess the competence of laboratories The main advantage of PT compared to e.g. Analysis of CRM: it is independent! The most important standards on proficiency testing: ISO/IEC 17043:2010 Conformity assessment -- General requirements for proficiency testing, ISO, 2010 ISO 13528:2005. Statistical methods for use in proficiency testing by interlaboratory comparisons, ISO, 2005
Ljubljana Proficiency testing Proficiency testing (PT) is the most common type of intercomparison Ordinary laboratories participate primarily in PT –and rarely in any other ILC-s This course focuses its attention mainly on PT Other types of ILC will be mostly discussed in comparison to PT
Ljubljana Proficiency testing schemes Participation of a laboratory in PT-s should be regular Regularly organized proficiency tests are called Proficiency Testing Schemes (PTS) An individual intercomparison in a PT scheme is called a round
Ljubljana Proficiency testing schemes There are many benefits to organizing proficiency testing as regular schemes: –Permits laboratories to assess their competence on a regular basis –Also, the organizers of such schemes will obtain long term experience and extensive statistical data, the usefulness of which will be discussed below
Ljubljana Organizing proficiency tests 1.Preparation (Organizer) a)Choosing the organizer. Organizing proficiency tests demands fairly high levels of competence and administrative capacity b)Establishing objectives. c)Defining the matrix and the analyte. Intercomparisons can, for example, be the determinations of heavy metals in soil, pesticide residues in foodstuffs etc d)Preparing the test material e)Preparing sufficiently stable and homogenous samples from the material
Ljubljana Organizing proficiency tests f)Determining the stability and homogeneity of the samples. g)Establishing reference values for the samples (if possible) There are three ways: –Based on formulation –Based on measurements performed by independent laboratories in the preparatory stage. –Based on the results of the participants: at the data analysis stage of the intercomparison. h)Choosing the participants. In some cases all willing laboratories may participate, in other cases the number of participants is limited.
Ljubljana Organizing proficiency tests 2.Execution (organizers and participants) a)Sample distribution to the participants –Usually the samples are sent by post –If the analysis equipment is portable and/or the analysis can be performed on site, the organizers may invite the participants there for the intercomparison. b)Sample analysis by the participants –participants are usually given from several weeks to a few months to complete the analysis c)Participants report their results to the organizers
Ljubljana Organizing proficiency tests 3.Analyzing the data and drawing up the results (by the organizer) a)Data analysis –Often at this stage, the reference values of the intercomparison is established (if it is based on the consensus value of the participants) a)Making conclusions, summarizing the results and reporting back to the participants. 4.Application of corrective measures by the participants
Ljubljana Sample ILC final report Sample ILC final reports can be downloaded from: – Examples of documents: –
Ljubljana Organizing proficiency tests For established and regularly held proficiency testing schemes some of the stages are skipped: –New organizer is not chosen every time –The objectives are not redefined every time, etc. Below we shall look more closely at some of the stages in organizing a proficiency test.
Ljubljana The organizer Requirements: –Competence –Administrative ability –Organizers with long term experience are preferred –Preferably the organizer should have laboratory facilities for sample preparation and analysis of their stability and homogeneity But this is not an absolute requirement, if the organizer has just the competence and organisational capacity, it can organize very good PT schemes
Ljubljana Sample preparation Preparing the test material and samples is the critical stage in organizing a proficiency test The individual laboratories must have samples that ideally have identical analyte contents and identical matrices –This ideal can only be approached but never achieved –Similarity to preparation of CRM-s Maximum possible homogeneity of the test material is very important in achieving this Samples should be prepared in excess
Ljubljana Sample preparation To achieve homogeneity all of the test material is made at the same time (in one batch) and is then divided into samples In reality the analyte content will always vary to some degree between samples, but this variance must be significantly lower than the expected uncertainties of the measurement results of the participants
Ljubljana Obtaining the material to be analyzed Two main ways: Formulation –The material to be analyzed is made by mixing precisely measured amounts of substances and is then divided into samples Possible with: Alloys, water, polymers, … Using “real” materials –Some of the “real” material (soil, meat,...) is homogenized and divided into samples
Ljubljana Determining the homogeneity and stability of samples The variance of analyte contents between different samples must always be determined –This is as serious with PT-s as with CRM-s The stability of the samples must be determined if there is a reason to suspect that they are unstable –This is less serious with PT-s than with CRM-s
Ljubljana Determining the homogeneity and stability of samples For these determinations a number of samples are analyzed and the results are processed using statistical methods The resulting data are included in the estimate of the uncertainty of the reference value as the respective uncertainty components: u var and u stab
Ljubljana The reference values of intercomparison samples In chemical intercomparisons the reference value is generally the best possible estimate for the actual analyte content in the sample –The best estimate of the “true value” If there are several analytes then there are also several reference values
Ljubljana Types of reference values Reference values IndependentConsensus Determined from data of expert labs High reliability Has uncertainty and traceability Usually expensive Determined from the results of the participants Lower reliability Usually no uncertainty or traceability established Inexpensive
Ljubljana Independent vs consensus value
Ljubljana Establishing reference values Reference values IndependentConsensus Formulated Expert labs Participant results
Ljubljana Establishing reference values when the material was formulated Usually weighing data are used This enables highly reliable reference values to be established quite simply and cheaply The quality of the reference values is high As quite a sizeable amount of material is prepared, homogenization is critical This approach is only suitable for relatively simple matrices (water and beverages, oils,...)
Ljubljana Uncertainty of reference values when the material was formulated Calculating the uncertainty: –u formul is the uncertainty contribution of the formulation procedure – u var is the homogeneity contribution to uncertainty – u stab is the instability contribution to uncertainty –k is the coverage factor
Ljubljana Establishing reference values in expert laboratories Reference values are determined by highly competent expert laboratories independent of the participants Highly reliable reference values can be obtained Can be applied to samples of all types Drawbacks: –Takes time –Is expensive –Expert laboratories must be available
Ljubljana Uncertainty of reference values in expert laboratories Calculating the uncertainty : –u anal is the uncertainty contribution of the analysis procedure – u var is the homogeneity contribution to uncertainty – u stab is the stability contribution to uncertainty –k is the coverage factor
Ljubljana Uncertainty of reference values in expert laboratories Evaluating the uncertainty component u anal is somewhat complicated because: –Usually several laboratories participate –Their results always differ by some factor –Different methods are used, so the results have different uncertainties In measurements of this type no expert laboratory is usually excluded based on statistical criteria –A reason is always sought to explain the deviations
Ljubljana Case 1 (successful) These laboratories were found to have deficiencies in their measurement procedures (intrerference) There reference value: 345 ± 22 ppm
Ljubljana Case 2 (not very successful) Deficiencies in the procedures of these laboratories were not found There reference value: 366 ± 49 ppm
Ljubljana Establishing the consensus value based on the results of the participants The most common and widespread, the cheapest It generally impossible to obtain high-quality results from the results of the participants –Also rigorous uncertainty estimates are usually impossible We will discuss this case below, in the section dealing with data processing –and there will be a practical example on this
Ljubljana Inviting participants Generally proficiency tests are open to all who wish to participate –Participation is usually not free and can be quite expensive Sometimes the number of participants is limited –For example it can be open only to the members of the organization organizing the proficiency test
Ljubljana Inviting participants Potential participants invited to the test are given as much information about it as possible –The identity of the organizer –The type and objective of the proficiency test –Information on the samples Approximate composition, storage and handling requirements, sample preparation –It is especially important to specify to what extent the results of the participants will be made publicly available –The preliminary schedule of the intercomparison Registration date, approximate time until sample delivery, the deadline for submitting results and the expected date when the final report will be published
Ljubljana Handling samples (by the participants) This is often critical The participants must follow the instructions of the organizers as closely as possible when handling the samples If the sample is intrinsically unstable then a certain “reference day” (or range of days) may be set – the day when the analysis must be performed –This approach is based on the assumption that all of the samples given to the participants will “age” at about the same rate.
Ljubljana Performing the analysis It is normally a requirement in PT schemes that the participants must use the same procedures for carrying out the analysis as they do for their routine work. –Only in this case is it possible to evaluate the normal day-to-day performance of a laboratory –Ideally the analysis is be performed without the participants knowledge that the samples have been prepared for proficiency testing In some cases the organizer may specify a recommended (or even mandatory) sample preparation procedure
Ljubljana Data analysis The primary goal of data analysis is to assess the competence of the participants in performing the type of analysis or measurement addressed by the proficiency test The following are the most important ways of assessing competence: –Using the zeta-score as a criterion –Using the E n number as a criterion –Comparison with legal documentation or with the requirements in standards –Using the z-score as a criterion It is up to the organizer to choose, which of these to use
Ljubljana The zeta score Applicable if an independent reference value is available for the measured parameter Calculating the zeta score of participant i: –where X lab is the laboratory result –X ref is the independent reference value –u C_lab_i is the combined standard uncertainty of the laboratory result –u C_ref is the combined standard uncertainty of the reference value ISO 13528:2005. Statistical methods for use in proficiency testing by interlaboratory comparisons, ISO, 2005
Ljubljana Interpreting zeta-scores Assessing the quality of a laboratory result: zeta-scoreResultMeasures to be taken by the lab |zeta| ≤ 2Good ‑ 2 < |zeta| < 3WarningPreventative action |zeta| 3 UnacceptableCorrective action
Ljubljana The E n number Applicable if an independent reference value is available for the measured parameter Calculating the E n number of lab i: –where X lab_i is the participant result –X ref is the independent reference value –U lab_i is the expanded uncertainty of the participant result –U ref is the expanded uncertainty of the reference value ISO 13528:2005. Statistical methods for use in proficiency testing by interlaboratory comparisons, ISO, 2005
Ljubljana Interpreting E n numbers Assessing the quality of the participant result: E n numberResultMeasures to be taken by the lab |E n | ≤ 1Acceptable ‑ |E n | > 1UnacceptableCorrective action
Ljubljana Example
Ljubljana Pros and cons of the E n number and zeta score Pros –The uncertainties of the laboratory results and the reference value can both be taken into account –Thus it is a metrologically sound approach Cons –An independent reference value is a prerequisite –Does not give any direct information on how the results laboratories have obtained measure up to other (legislation, standards etc.) criteria –E n : The level of U is not specified
Ljubljana Comparison to legal requirements or to requirements in standards Large number of analyses is carried out to assess compliance with the requirements of legislation or standards Thus, it is natural to view the results from this aspect Prerequisites: –Legislation or standards that deal with the procedure in question must exist and also contain criteria to assess the quality of the result of an analysis –The intercomparison must be organized in a way that the results are comparable to the requirements in legislation or standards –Usually an independent reference value is necessary
Ljubljana The z-score The most widely used way If there is no independent reference and a consensus value is used instead, the z-score is the most common way or assessing performance Calculating the z-score of a participant: –X lab is the participant result –X cons is the consensus value –s target is the target standard deviation The target standard deviation is a quantity that describes the expected variance of the results of the participants
Ljubljana Interpreting z-scores Assessing the quality of a laboratory result: z-scoreResultMeasures to be taken by the lab |z| ≤ 2Good ‑ 2 < |z| < 3WarningPreventative action |z| 3 UnacceptableCorrective action
Ljubljana Pros and cons of the z-score Pros –The cheapest, simplest –Almost universally applicable, no independent reference value needed Cons –No uncertainties the participant results are taken into account –Performs poorly if there are few participants –Participants can be biased –Does not give any direct information on how the results laboratories have obtained measure up to other ( legislation, standards etc.) criteria
Ljubljana Determining the consensus value and the target standard deviation To use z-scores the consensus value and the target standard deviation must be evaluated If the intercomparison has no independent reference value then the consensus value must be evaluated from the participant data It is more flexible with the s target (see below) Unlike in the case of independent references, the evaluation must be postponed to the data analysis stage
Ljubljana Determining the consensus value The consensus value is re-evaluated in every round of an intercomparison: 1.First Outliers are eliminated before calculating the consensus value 2.Then the consensus value is found as the mean or median of the participant results –In the case of median elimination of outliers is less important
Ljubljana Eliminating outliers Very important: –It is not uncommon in proficiency tests to discover results that deviate by dozens of times from the mean that are caused by errors in the calculations (for example, forgetting to account for diluting the sample) –It is also not a rare sight to see errors in the usage of units. This traditionally leads to results off by three orders of magnitude (!) –Including such a result can easily influence the mean by enough to completely falsify the intercomparison results
Ljubljana Eliminating outliers Eliminating outliers is not a trivial task –In the case of proficiency tests only statistical means are usually applicable since it would be too expensive and time consuming to investigate the actual reasons that caused the deviation of the participant data –The elimination of an outlier must be based on correct statistical tests, not on intuition Unfortunately, currently there is no one single universally acknowledged way of determining outliers –Grubbs and Cochran tests –Algorithm A
Ljubljana Determining the target standard deviation In addition to the consensus value, the target standard deviation s target must be determined It may or may not be the true standard deviation of the participant results There are several approaches to finding the target standard deviation: –On the next slides –Approach 1 dominates
Ljubljana ApproachAdvantagesDisadvantages 1. New target standard deviation for each round calculated directly from participant data (after elimination of outliers) 1. Simple to use; 2. Applicable to any measurement 1. Will yield an good assessment of target standard deviation only if there are numerous participants in the round 2. Will make the results of the rounds incomparable with regard to general performance of the laboratories 2. The organizer’s experience can be used to evaluate target standard deviations for all rounds 1. Provides an opportunity to compare all rounds in a scheme; 2. It’s success is independent of the number of participants in a round 1. The organizer must have long term experience, so not applicable to new types of intercomparison; 2. The assumption is made that the samples in all rounds are similar in complexity to analyse.
Ljubljana Legislation or standards can be used to evaluate the target standard deviation 1. Provides an opportunity to compare all rounds in a scheme ; 2. The quality of the results a laboratory achieves are comparable to criteria in legislation or standards 1. Legislation: usually not applicable since there either is no legislation or it isn’t comprehensive enough; 2. All participants do not necessarily use the same standards or work according to the same legal acts 4. Using Horwitz‘s equation (s target is the relative (%), C is the analyte content in sample (mass/mass) Completely universal, the analyte content is the only variable 1. Doesn’t account for the more specific nature of an analysis - only applicable as a first estimate; 2. Tends to overestimate the target standard deviation
Ljubljana Horwitz plot
Ljubljana Reporting the results For the intercomparison to be correct, the results must be reported correctly All participants will receive a copy It is important to stick to the confidentiality clauses stipulated when the intercomparison was announced
Ljubljana How to find a PTS: The EPTIS database Finding a suitable PTS is not always easy The best information source is EPTIS (European Proficiency Testing Information System) It is available online: It is maintained by BAM (Bundesanstalt für Materialforschung und Prüfung)
58 Other types of intercomparisons We will review: - ILC for establishing reference values for Certified Reference Materials - ILC for validating and standardising analysis procedures
Ljubljana ILC to establish the certified values of reference materials Intercomparisons are among the best ways of establishing certified values for reference materials Reviewed in the RM lecture This type of intercoparison is similar to a proficiency test where only expert laboratories are invited to participate and a consensus value is calculated based on their results
Ljubljana Differences from proficiency tests Intercomparisons held for this purpose differ from proficiency tests in the following aspects: –The reference value is in the focus –Only expert laboratories are invited to participate –The participants are chosen so that they will use procedures based on different operating principles For example, if the analytes in the reference material are polycyclic aromatic hydrocarbons, it would be desirable to have some of the participants use GC-MS for the analysis and others HPLC with a fluoresence detector. Organizing the intercomparison in this way makes it easier to eliminate any systematic bias intrinsic to the methods.
Ljubljana Differences from proficiency tests –Participants are invited to techical discussion after the intercomparison –After detailed analysis the reference value will be established for the reference material. –The results are usually published, ie the certificate of a reference material often states explicitly what results a given participant reached.
Ljubljana Establishing reference values Either for ILCs or for CRMs Evaluating the uncertainty of the reference value can be complicated because: –Usually several laboratories participate –Their results always differ to some extent –Different methods are used, so the results have different uncertainties In measurements of this type no laboratory is excluded based on statistical criteria –A reason is always sought to explain the deviations
Ljubljana Case 1 (successful) These laboratories were found to have deficiencies in their measurement procedures (intrerference) There reference value: 345 ± 22 ppm
Ljubljana Case 2 (not very successful) Deficiencies in the procedures of these laboratories were not found There reference value: 366 ± 49 ppm
Ljubljana For an analysis procedure to be a standard procedure, so that it could be used by numerous laboratories, it must be as thoroughly investigated and tested as possible The procedure must be validated with extreme care The best way to do this is to organize an intercomparison and have the participants analyse the same samples using the procedure being validated ILC for Validation or standardisation of analysis procedures
Ljubljana Differences from proficiency tests Only invited laboratories get to participate and expert laboratories are preferred –The list of participants need not be quite as restricted as in the previous case, when certified reference values were sought. The participants must use the procedure specified and adhere to protocol as closely as possible –If possible a metrological reference value is also established by a group of independent expert laboratories using other methods
Ljubljana Differences from proficiency tests Once the intercomparison is over a meeting is usually held for the participants, where their results will be discussed in detail –The focus will be on the procedure’s suitability for analysing the type of samples used and also on comparing the results of the participants to the independent reference value –Serious consideration is given to any observations and recommendations made by the participants to modify the procedure Outliers are not eliminated based on statistical considerations
Ljubljana Thank you for your participation! The materials are available from: You are always welcome to contact me:
Ljubljana Excellence in Analytical Chemistry (EACH) Erasmus Mundus joint master’s programme with excellent scholarship scheme Students study first year in Tartu, and second in one of three outstanding universities 69 Fundamentals of analytical chemistry, metrology in chemistry, quality assurance, socio-economic aspects Organic and bioorganic analysis, advanced separation methods, mass spectrometry Industrial analysis, process control and monitoring Advanced analytical devices, sensors, miniaturization, electrochemistry