Confidence in Metrology: At the National Lab & On the Shop Floor Alan Steele, Barry Wood & Rob Douglas National Research Council Ottawa, CANADA National ResearchConseil national Council Canadade recherches
Slide 2 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Outline Measurements –Communications, Comparisons –Fluctuations, Predictions Confidence –Comparisons, Proficiency Tests, on the Shop Floor Probability Calculus –confidence intervals –confidence levels A Toolkit for Excel –some Visual Basic Code A Worked Example –with real comparison data Conclusions
Slide 3 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Measurement means Communication The sole purpose of measurement is to communicate an aspect of physical reality from one person*, place and time to another person*, place and time. * or autonomous system for which a person is responsible The two people must have in common an understanding of the measurand a system of numbers and units of measurement a means for describing measurement accuracy “Alas, my work is all in vain If it doesn’t get to Roundhead’s brain”
Slide 4 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Measurement means Comparison Any useful measurement is a comparison The world uses the SI to provide a network that can inter-relate most of these comparisons The implied inter-relationships are checked by special Comparisons for Quality Assurance Shop floor Calibrations (with NMIs) Proficiency Demonstrations (with NMIs) Bilateral Comparisons (between NMIs) Regional Comparisons (among NMIs) CIPM Key Comparisons (among NMIs) At NMIs, special definition-based comparisons are also required for the kelvin, second, kilogram … etc.
Slide 5 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Measurement means Fluctuations Usually, fluctuations can be observed in a measurement even when we try to keep everything as constant as possible - WE INCLUDE THIS Usually, larger fluctuations are observed as temperature, pressure, humidity… are allowed to vary - WE INCLUDE THIS Usually, we anticipate an even larger range of fluctuations if the measurement were to be made by other reasonable means - WE INCLUDE THIS IN STANDARD UNCERTAINTY
Slide 6 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Measurement means Prediction The most useful aspect of a measurement is its predictive ability, either explicit or implicit The results of past comparisons are used to infer results for future comparisons There is a challenge to relate environmental conditions, history and aging to the accuracy of a future comparison ?
Slide 7 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Confidence as a Commodity Measurement Confidence starts with CIPM, BIPM, CC’s, definition-based standards and realizations MRA, JCRB, Key Comparisons and Regional and Bilateral Comparisons demonstrate confidence Shared research and visits help develop Confidence in equivalence to the SI This system builds confidence at the National Lab level
Slide 8 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Confidence as a Commodity You can buy Confidence from your NMI (NRC, NIST…) as calibration reports and Round-robin proficiency tests You can multiply Confidence in a well-run lab (CLAS) or on your factory floor You can sell Confidence as a commodity within your organization, as well as to your organization’s clients To market Confidence, it should be technically rigorous and accessible to non-statisticians
Slide 9 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 False Confidence Any technically unjustified confidence claim is potentially very harmful to any calibration or testing laboratory’s reputation Overly strong or technically wrong confidence claims are potentially lethal or actionable Sometimes clients need protection from themselves “Why do you have to measure it? I just want a calibration certificate for it!!” Rigour and careful wording can avoid false confidence
Slide 10 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Overly Complicated Confidence “The equivalence study of eleven 10 Volt zeners showed a difference of Lab A - Lab B = 1.91 ppm with 230 degrees of freedom, where the 1.91 is the expanded uncertainty corresponding to approximately 95% confidence for a Student-t distribution with 230 degrees of freedom, k=1.97 times the pair standard uncertainty, 0.97 ppm, of the pair difference determined from the internal standard uncertainty statements of the measurements from the two laboratories ( 1.06 ppm for Lab A and 1.49 ppm for Lab B), with a correlation coefficient of accounting for a covariance of +1.2x The external standard deviation was also evaluated with 21 degrees of freedom and gave a Birge ratio of 1.2.” There is a very limited market for this type of Confidence Statement, which still requires the user to deal with the 2.7 ppm bias it reveals...
Slide 11 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Simple Confidence Statements “Lab A and Lab B are equivalent.” Not Rigorous “ V measurements from Lab A and Lab B can be expected to agree with each other within 4.3 ppm, 19 times out of 20.” Has potential
Slide 12 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Improved Confidence Statements The Mutual Recognition Arrangement formalizes the Key Comparison differences as the preferred means for generating confidence about equivalence New methods are being used to transform comparisons into statements of confidence like “On the basis of this Comparison, similar measurements made by Lab A and Lab B can be expected to agree with each other to within 4.3 ppm, with 95% confidence [or 19 times out of 20].” This clearer Confidence Statement has a wider market
Slide 13 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Communicating with your Clients Clarity is important to: Users of your measurements Your users’ management and QA managers Your users’ clients Your management Your NMI can help you to communicate confidence clearly
Slide 14 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Confidence from NMIs The methods used to create statements of confidence for Key Comparisons can be used for proficiency testing done by your NMI Some calibration reports can also be used to generate this type of confidence statement, provided that the travel uncertainty of the artefact is under proper statistical control.
Slide 15 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Confidence for your Clients The methods used to create statements of confidence for Key Comparisons can be used for proficiency testing done by you on your factory floor The statements are the simplest quantitative expressions about the equivalence of two measurement stations
Slide 16 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Proficiency Testing Accreditation bodies routinely specify that “proficiency testing” on a regularly scheduled basis is a requirement for maintaining accreditation Usually the Pilot Laboratory for the comparison is the National Metrology Institute Usually the Pilot Laboratory result is taken as the comparison reference value, and the participants’ are initially evaluated against this “truth” This is a time-consuming and expensive exercise!
Slide 17 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Proficiency Demonstrations A pilot lab measures and sends one or more artefacts around to be measured at other Labs Pilot re-measures artefact Pilot receives other Labs’ measurements, analyzes them in escrow as comparisons, assigns travel uncertainty and prepares a report. Pilot Lab
Slide 18 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Proficiency Demonstrations CIPM organizes them for NMIs NMIs (NRC) organizes them for you Do you organize them for yourself ? Do you organize them for your clients ?
Slide 19 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Proficiency Demonstrations: NMIs A pilot NMI measures and sends one or more artefacts around to be measured by NMIs Pilot NMI re-measures the artefact Pilot NMI receives other NMIs’ measurements, analyzes them in escrow as comparisons, assigns travel uncertainty and prepares a report, CC and CIPM approve report, results posted on internet. Pilot NMI NRC
Slide 20 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Proficiency Demonstrations: CLAS labs NRC measures and sends one or more artefacts around to be measured by CLAS labs NRC re-measures the artefact NRC receives CLAS labs’ measurements, analyzes them in escrow as comparisons, assigns travel uncertainty and prepares a report. NRC Your Lab
Slide 21 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Proficiency Demonstrations: Shop-Floor You measure and send one or more artefacts around to be measured by instruments you normally calibrate You re-measure the artefact You receive other workstations’ measurements, analyze them in escrow as comparisons, assign travel uncertainty and prepare a report. Your Lab
Slide 22 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Proficiency Demonstrations vs Calibrations Proficiency Demonstrations evaluate travel uncertainty better Proficiency Demonstrations evaluate everything affecting the best capabilities, including environment and operator... Proficiency Demonstrations can establish tighter equivalence Proficiency Demonstrations require more artefacts and more organization Proficiency Demonstrations have new statistical tools and toolkits available for evaluating comparisons
Slide 23 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Comparisons Measurement comparisons provide the main experimental evidence for “equivalence” In general, all participants measure a common artifact and their various results are analyzed from a single common perspective The participants may be different laboratories, or different measurement stations on your shop floor
Slide 24 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Key Comparisons and NMIs National Metrology Institutes have recently signed a “Mutual Recognition Arrangement” in which the validity of their Calibration and Measurement Capabilities is expressed The scientific underpinning for this arrangement is a series of “Key Comparisons” which are conducted at the very highest levels of metrology In practice, they are not much different from the proficiency tests already in general use among accredited laboratories around the world
Slide 25 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Reporting Results A metrologist reports a result in two parts –the mean value: m L –the uncertainty: u L The results are plotted as data points with error bars
Slide 26 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Uncertainty Budgets The ISO Guide to the Expression of Uncertainty of Measurement is widely used as the basis for formulating and publishing laboratory uncertainty statements regarding measurement capabilities “Error bars” are an intrinsically probabilistic description of our belief in “what will happen next time” based on what we have done in the past
Slide 27 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Uncertainty Budgets “Error bars” are intrinsically probabilistic The standard uncertainty interval contains ~68% of the events, or 68% of the histogrammed events, or 68% of the “probability density function”, in physical sciences often referred to as the probability distribution Flip x and y axes
Slide 28 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Probability Distributions An ISO Guide-compliant uncertainty statement means that the error bars represent the most expert opinion about the underlying normal (Gaussian) probability distribution The fancy name for working with these distributions is Probability Calculus In general, we are interested in integrals of the probability distribution Integration is only “fancy addition”
Slide 29 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Confidence Levels A confidence level is what we get upon integrating a probability distribution over a given range [a,b] The fractional probability of observing a value between a & b is the normalized integration of the probability distribution function in the range [a, b] This is just addition of all the ‘bits’ of the function between a & b 1 68%2 95%
Slide 30 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Confidence Intervals Remember: a confidence level is what we get by integrating the distribution over a given range [a,b] The confidence interval is the fancy name for the range associated with the confidence level The range [-1 ,+1 ] is the 68% confidence interval The range [-2 ,+2 ] is the 95% confidence interval 1 68%2 95%
Slide 31 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Why would you want to do this? Lots of time and energy (and expense!) is invested in creating a laboratory result in a comparison Getting the maximum amount of information from a measurement comparison is desirable You’d like to show off your “confidence” to colleagues (and auditors!) Quantifying things is what we do as metrologists Your clients may want specific quantified answers to questions of Demonstrated Equivalence based on your Proficiency Testing results
Slide 32 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 How hard is it to do this? With normal distributions, the arithmetic is pretty easy You can try this for yourself and really see how it works… …or you can let us do it for you! We have generated simple expressions to help evaluate normal confidence levels and normal confidence intervals, using well known statistical methods developed over the last hundred years or so We have put these expressions into a Toolkit for Excel
Slide 33 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 A Toolkit for Excel At NRC, we have written a Quantified Demonstrated Equivalence Toolkit for Microsoft Excel ® The Toolkit is freely available by contacting us at We’ll add you to our mailing list and send you a copy of the sample spreadsheet with the Toolkit, plus a “User’s Guide” in.pdf format
Slide 34 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Toolkit Functions and Macros The Toolkit contains Functions to: –calculate pair uncertainties (including correlations) –calculate weighted averages –calculate confidence levels –calculate confidence intervals The Toolkit contains Macros to: –generate bilateral “tables of equivalence” –generate bilateral “tables of confidence intervals” –generate bilateral “tables of confidence levels”
Slide 35 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Toolkit Philosophy and Operation Functions and Macros are built right in to the Spreadsheet, and work just like “regular” Excel components
Slide 36 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Toolkit Philosophy and Operation The code is written in Visual Basic You can examine the code to see how it works Long variableNames help to “self document” the programs You don’t have to look at the code or write your own functions to use the QDE Toolkit from NRC
Slide 37 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 A Worked Example 13 Laboratories participated in a Proficiency Test at 10 k
Slide 38 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Comparison to the NMI: E n One common measure of success in Proficiency Tests is the “Normalized Error” This is the ratio of the laboratory deviation to the expanded uncertainty: E n (k=2) = abs(m Lab - m Ref )/sqrt(U Lab 2 + U Ref 2 ) Generally, the Laboratory “passes” when E n < 1 E n is a dimensionless quantity
Slide 39 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Comparison to the NMI: QDC A quantified approach to Proficiency Tests is to ask the following question: What is the probability that a repeat comparison would yield results such that Lab 1’s 95% uncertainty interval encompasses the Pilot Lab value? We call this “Quantified Demonstrated Confidence” QDC is a dimensionless quantity expressed in %
Slide 40 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Comparison to the NMI: E n vs QDC and are both dimensionless quantities E n and its interpretation as an acceptance criterion are difficult to explain to non-metrologists QDC and its numerical value are easily explained to non-metrologists Note that when E n = 1 (and U Ref << U Lab ) QDC = 50% Normalized ErrorQuantified Demonstrated Confidence
Slide 41 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Comparison to the NMI: QDE 0.95 A different quantified approach to Proficiency Tests is to ask the following question: Within what confidence interval can I expect the Lab 1 value and the Pilot Lab value to agree, with a 95% confidence level? We call this “Quantified Demonstrated Equivalence” QDE 0.95 is a dimensioned quantity, same units as V
Slide 42 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Comparison between Labs: Agreement We can ask similar questions about agreement between any two participants in the experiment: Within what confidence interval (in ppm) can I expect the Lab 1 value and the Lab 2 value to agree, with a 95% confidence level?
Slide 43 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Comparison between Labs: Confidence What if we ask: What is the probability that a repeat comparison would yield results such that Lab 1’s 95% uncertainty interval encompasses Lab 2’s value? Or how about: What is the probability that a repeat comparison would yield results such that Lab 2’s 95% uncertainty interval encompasses Lab 1’s value?
Slide 44 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Comparison between Labs: Confidence The answers to these questions of Quantified Demonstrated Confidence are shown here
Slide 45 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Quantifying Equivalence What is the probability that a repeat comparison would have a Lab 2 value within Lab 1’s 95% uncertainty interval? Probability Calculus tells us the answer: QDC = 47% This is exactly the type of “awkward question” that a Client might ask! 95% interval
Slide 46 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Quantifying Equivalence What is the probability that a repeat comparison would have a Lab 1 value within Lab 2’s 95% uncertainty interval? Probability Calculus tells us the answer: QDC = 22% These subtly different “awkward” questions have very different “straightforward” answers! 95% interval
Slide 47 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Tricky things about Equivalence Equivalence is not transitive –Lab 1 and Lab 2 may both be “equivalent” to the Pilot, but not to each other! Equivalence is not commutative –we are asking two very different questions here! 95% interval QDC = 47% 95% interval QDC = 22%
Slide 48 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 Conclusions You are already doing quite a bit of Probability Calculus when you present your results The arithmetic for quantified calculations is very straightforward when we have Normal Distributions Adding Statistical Confidence explicitly into your Lab’s results helps you to explain them to non-metrologists, and to present precisely what Proficiency Testing has demonstrated for: –equivalence from different National Laboratories –accreditation assessment –your clients –your factory floor
Slide 49 Steele Wood and Douglas: Confidence NCSL Canada, September 2001 A Toolkit for Excel At NRC, we have written a Quantified Demonstrated Equivalence Toolkit for Microsoft Excel ® The Toolkit is freely available by contacting us at We’ll add you to our mailing list and send you a copy of the sample spreadsheet with the Toolkit, plus a “User’s Guide” in.pdf format