Download presentation
Presentation is loading. Please wait.
Published byQuentin Scott Modified over 9 years ago
1
Kristian Linnet, MD, PhD Linnet@post7.tele.dk Per Hyltoft Petersen, MSc Per.hyltoft.petersen@ouh.fyns-amt.dk Sverre Sandberg, MD, PhD Sverre.sandberg@isf.uib.no Statistics & graphics for the laboratory Linda Thienpont Linda.thienpont@ugent.be Dietmar Stöckl Dietmar@stt-consulting.com In cooperation with AQML : D Stöckl, L Thienpont & Applications Reference interval & Biological variation
2
Statistics & graphics for the laboratory 2 Prof Dr Linda M Thienpont University of Gent Institute for Pharmaceutical Sciences Laboratory for Analytical Chemistry Harelbekestraat 72, B-9000 Gent, Belgium e-mail: linda.thienpont@ugent.belinda.thienpont@ugent.be STT Consulting Dietmar Stöckl, PhD Abraham Hansstraat 11 B-9667 Horebeke, Belgium e-mail: dietmar@stt-consulting.comdietmar@stt-consulting.com Tel + FAX: +32/5549 8671 Copyright: STT Consulting 2007
3
Statistics & graphics for the laboratory 3 Content overview Reference interval Introduction Data presentation Histogram Normal probability plot & rankit-transformation Graphical interpretation of rankit-plots Partitioning Statistical estimation Parametric and non-parametric Biological variation Introduction Estimation (ANOVA application) Index-of-individuality Comparison of a result with a reference interval ("Grey-zone") Reference change value (RCV) Content
4
Statistics & graphics for the laboratory 4 Estimation of reference intervals – Overview REFERENCE INDIVIDUALS comprise a REFERENCE POPULATION from which is selected a REFERENCE SAMPLE GROUP on which are determined REFERENCE VALUES on which is observed a REFERENCE DISTRIBUTION from which are calculated REFERENCE LIMITS that may define REFERENCE INTERVALS that help with the interpretation of an OBSERVED VALUE Flowchart Introduction
5
Statistics & graphics for the laboratory 5 Inclusion criteria ( NORIP, Malmø 27/4-2004) The reference individual should be feeling subjectively well have reached the age of 18 not be pregnant or breast-feeding not been an in-patient in a hospital nor been subjectively dangerously ill during the last month not had more than 2 measures of alcohol (24 g) in the last 24 hours not given blood as a donor in the last five months not taken prescribed drugs other than the P-pill or estrogens (female sex hormone) during the last two weeks not smoked in the last hour prior to blood sampling Preanalytical conditions ( NORIP, Malmø 27/4-2004) Reference individual Sitting at least 15 min before sampling Sample collection Li-heparin plasma or serum, EDTA-blood for haematology Standard procedure Minimal stasis Sample handling (plasma and serum) Stored in the dark Storage in room temperature before centrifugation serum: 0.5-1.5 h, plasma: max 15 min Centrifugation: 10 min at min 1500 g Distributed to secondary tubes within 2 h Stored at -80 °C within 4 h Outliers Gross or slight deviation Check records Check results Re-analyse ? Include ? Omit Introduction
6
Statistics & graphics for the laboratory 6 Data presentation Tools for presentation and inspection of distributions Histogram Normal probability plot and "Rankit-transformation" Rankit-transformation Reference population: Gauss-distribution Hyltoft Petersen P, Hørder M. Influence of analytical quality on test results. Scand J Clin Lab Invest 1992;52 Suppl 208:65-87. The frequency distribution is transformed to the cumulated frequency distribution and then transformed to the Rankit- or Normal Probability Plot. Data presentation & inspection In the Normal Probability Plot, the values are plotted on the x-axis and their normalized deviation from the mean (z-value, or Rankit) on the y-axis. In the figure below, a second axis has been introduced where the corresponding probabilities (to the z-value) can be read. Note, this second axis is non-linear and needs to be introduced as picture. It cannot be created with EXCEL. The tick-marks, however, can be programmed into an EXCEL chart (see: NormalRankitPlot.xls). Use: Visual test for Normal distribution: data should fit a line.
7
Statistics & graphics for the laboratory 7 The rankit plot Triacylglyceride example Effect of imprecision (left Fig) and bias (right Fig) on the Normal Probability Plot An increase in imprecision (here 1.5 x) rotates the line clockwise and changes the probability at z = 1.65 from 95% to 84%. The introduction of a bias (here = 1) moves the line to the right and changes the probability at z = 1.65 from 95% to 74%. Data presentation & inspection NormalRankitPlot
8
Statistics & graphics for the laboratory 8 The rankit plot Bimodal situation: left population healthy, right population diseased Data presentation & inspection When we apply the plot in the bimodal situation, we can directly read the fase negatives (FN) and the false positives (FP). Note, the healthy are cumulated from right to left. Under the conditions chosen (diseased at a distance of +2 SD and cutoff = 1.28 SD), FN = 24% and FP = 10%.
9
Statistics & graphics for the laboratory 9 Data inspection – Examples Uric acid (µmol/l) – Simulation (distributions moved!) FemaleMale Mean250370 SD4040 n10001000 Depending on the bin-size, bimodal distributions may be hidden in histograms! Uric acid ~reality, but Normal distributed FemaleMale Mean250330 SD5565 n10001000 Graphical techniques are too weak to uncover bimodal situations where the population means are close together! Test for normal distributionP Chi-square0.836 Kolmogorov-Smirnov0.249 Anderson-Darling0.02 D'Agostino-Pearson0.016 Statistical techniques may uncover that "something is wrong" (not Normal) with the distribution. From that, one may consider to look for subgroups! However, different tests may have enourmously different power! Data presentation & inspection
10
Statistics & graphics for the laboratory 10 Calculations with logarithms Data transformation: Logarithms When the data are not normal distributed, one can try a transformation. Because, in nature, data are often log-normal distributed, logarithmic transformation of data can make them normal distributed. Test for normality: Triglycerides (See: Datasets.xls) n = 282; Lowest value: 0.3 mmol/L; Highest value: 3.2 mmol/L; Median: 0.92 mmol/L. CBstat Anderson Darling test:Anderson Darling test after logarithmic (natural) transformation P < 0.01 P = 0.13 data not normally distributed data log-normally distributed Normal Probability Plot (ln-transformed data Data are "on a line" Data are ln-Normal distributed Testing normality
11
Statistics & graphics for the laboratory 11 Working with logarithms Calculate the reference interval of a logarithmic distribution Triglycerides 1. Transform the original data to ln 2. Calculate the mean of the ln (x i ) values 3. Take the anti-ln of the mean of ln (x i ) This equals the geometric mean of the original population, which is close to its median. The anti-ln of the mean of the logged value e -0.0689 is equal to the geometric mean of the original distribution where the latter is given by [x 1 *x 2 …X n ] 1/n The anti-ln of the SD is meaningless. Calculation of 2.5 and 97.5% percentile Mean (ln transformed)-0.0689 SD (ln transformed)0.395 2.5 Percentile-0.0689 – 1.96*0.395 = - 0.843 97.5 percentile-0.0689 + 1.96*0.394 = 0.7053 Anti-ln of 2.5 & 97.5 perc0.43 – 2.02 Calculations with logarithms
12
Statistics & graphics for the laboratory 12 Partitioning of reference intervals Visual, on the basis of suspected differences (sex, race, age, …) The reference interval Frequency polygon Rankit-plot
13
Statistics & graphics for the laboratory 13 Example: Partitioning – Visual Comparison of oromucosid values: Caucasians and Indians in Leeds (Johnson et al. CCLM 2004;42:792-9). Statistical criteria for partitioning (Lahti et al. Clin Chem 2002;48:338-52) Difference between two upper or lower limits D <0,25*s: No partitioning D = 0,25 – 0.75*s: Variable D >0,75*s: Partitioning or percentage: Pb 0.9 and Pa 4.1 % The reference interval
14
Statistics & graphics for the laboratory 14 Statistical model for estimating a reference interval The statistical procedures assume random sampling in the target population. Traditionally: 2.5- and 97.5-percentiles are estimated with on average 95% of population included. In some contexts, one-sided: 95-, 97.5- or 99-percentiles are used. Statistical estimation procedures Parametric Assumes normal distribution or distribution that can be transformed to the normal distribution Nonparametric Model-free estimation of percentiles Partitioning Subdivision according to gender, age, race, etc. should be considered where relevant Reference interval & type of distribution Normal distributions can be expected for analytes with relatively narrow biological distribution, e.g. Electrolytes. The reference interval for Normal distributions ranges from the 2.5th to the 97.5th percentile (= mean+/-1.96 SD). The reference interval 95% Reference interval Upper reference limit Lower reference limit
15
Statistics & graphics for the laboratory 15 Skewed distributions Biological variation is very often skewed to the right, i.e. there is a tailing with high values. The theoretical background is many factors that has a multiplicative impact (an additive impact of many independent factors yields a normal distribution). Skewed distributions often can be modeled by the log-normal distribution. The log-normal type of distribution is actually constituted of a family of distributions with a spectrum of degrees of skewness determined by the parameter values (ratio between standard deviation and mean). Coefficient of skewness: C skew = [Σ(x i – x m ) 3 /N]/SD 3 Zero: symmetric distribution; Positive: skewed to the right; Negative: skewed to the left Nonparametric procedure Applicable to all types of distributions Simple procedures Based on ordering (ranking) of values according to size Refined procedures Weighted percentile estimation, smoothing techniques, resampling principle (bootstrap). The reference interval Coefficient of kurtosis: C kurt = [Σ(x i – x m ) 4 /N]/SD 4 – 3 Zero: Normal distribution; Positive: Peaked distribution; Negative: Flat distribution
16
Statistics & graphics for the laboratory 16 Simple nonparametric procedure(s) Approach Sort N reference values in increasing numerical order Assign rank numbers; lowest = 1; highest = N Rank number of 2.5-Percentile = 0.025 x (N+1) or 0.025 x (N) + 0.5 Rank number of 97.5-Percentile = 0.975 x (N+1) or 0.975 x (N) + 0.5 Lower reference limit = reference value corresponding to rank number of 2.5- Percentile Upper reference limit = reference value corresponding to rank number of 97.5- Percentile Remark – Estimation of 2.5 & 97.5 percentiles Procedure recommended by the IFCC and CLSI: 2.5-Percentile = Value of number: (0.025) x (N+1) 97.5-Percentile = Value of number: (0.975) x (N+1) Optimal procedure (slightly different from above): 2.5-Percentile = Value of number: (0.025) x (N) + 0.5 97.5-Percentile = Value of number: (0.975) x (N) + 0.5 (Linnet K. Clin Chem 2000;46:867-9) Triglycerides: n = 282 0.025 x (282 + 1) = 7.1 = Rank: 7 = 0.42 mmol/L 0.975 x (282 + 1) = 276 = Rank: 276 = 2.12 mmol/L Reference interval = 0.42 – 2.12 mmol/L The reference interval
17
Statistics & graphics for the laboratory 17 Sample size and precision of estimates Precision of percentiles of Normal distribution Can be expressed as a ratio between 90%-confidence intervals (90%-CI) and the width of the 95%-reference interval (e.g. ratios 0.3, 0.2 or 0.1 as outlined below). The necessary sample sizes are indicated: RatioParametric N (90% CI/95% RI) 0.323 0.250 0.1205 Precision of percentiles of normal distribution Comparison between parametric and non-parametric procedures. RatioParametric NNon-parametric N (90% CI/95% RI) 0.32356 0.250125 0.1205500 The reference interval
18
Statistics & graphics for the laboratory 18 Sample size and precision of estimates Coefficient of skewness: 0.75 RatioParametric NNon-parametric N (90% CI/95% RI) 0.390140 0.2200315 0.18001250 Coefficient of skewness: 1.5 RatioParametric NNon-parametric N (90% CI/95% RI) 0.3200315 0.2440695 0.117502740 The reference interval
19
Statistics & graphics for the laboratory 19 Bootstrap principle Repeated random re-sampling with replacement of observations. For a set of N observations: Each observation has the probability of 1/N of being re-sampled. A re-sampled set of N observations (a so-called pseudo-set of observations) may contain several copies of one observation and lack others. Origin of the name The bootstrap term refers to the phrase to pull oneself up by one´s bootstrap originating from the tale The Adventures of Baron Munchausen (by Rudolph Erich Raspe (1737-94)) in which ”The Baron had fallen to the bottom of a deep lake. Just when it looked like all was lost, he thought to pick himself up by his own bootstraps”. Calculation of estimates For each pseudo-set of N observations, the percentiles are computed by the simple nonparametric procedure. By repetition on a computer, e.g. 100 or more times, a distribution of estimated percentiles are obtained that mimicks the real sampling variation. The bootstrap estimates are the means of the pseudo-estimates. The bootstrap procedure is slightly (5-15%) more efficient than simple nonparametric estimation. Standard errors of estimates are provided. Limitations Too low coverage* at small sample sizes (N < 40) Modified versions with smoothing might improve coverage at small sample sizes Some bias problems with the bootstrap estimates at low sample sizes (N < 40) *Coverage: Expected percentage of times an estimated CI-interval includes the true value, i.e. ideally 90% for a supposed 90%-CI The reference interval
20
Statistics & graphics for the laboratory 20 Comparison of statistical procedures Can be studied theoretically and/or by simulation on the basis of specified model distributions, e.g. normal and log-normal types. In simulation, the procedure is repeated a large number of times in order to study bias (= difference between average of percentile estimates and true value) and precision (standard error: SE) of the estimation procedure (small SE: efficient procedure). SEs should reflect the real uncertainty so that estimated confidence intervals become correct. Tool: Root mean squared error (RMSE) RMSE: [Σ(x obs – x True ) 2 /N run ] 2 = [Bias 2 + SE 2 ] 0.5 (N run : no. of simulation runs) A combined error measure taking both systematic deviation and random error into account. Often used in statistics as an overall error measure allowing ranking of various statistical estimation procedures studied theoretically or by simulations. Model example Using a theoretical model distribution, e.g. a CHI-square-distribution, the true percentile values are known. By simulation, the performance of parametric and nonparametric procedures can be compared and the RMSE of the percentile estimates can be related to the sample size. Outcome The higher the sample size, the higher is the likelihood that the nonparametric procedure is the optimal approach (lowest RMSE at given sample size). The relationship relies in the general fact that a bias associated with parametric estimation is independent of sample size and will tend to dominate the RMSE at high sample sizes where the random error vanishes. The reference interval
21
Statistics & graphics for the laboratory 21 Statistical procedures – Summary Ranking of procedures according to efficiency 1. Parametric procedure 2. Bootstrap non-parametric – 3. Simple non-parametric – Non-parametric vs parametric About half as effective, i.e. about twice the sample size required to attain the same SE of the percentiles The difference in effectiveness is larger the more extreme the percentiles are (e.g. 99 vs 97.5 percentile) Simple non-parametric procedure N p +0.5 slightly better than N p +1 for both normal and skewed distributions Bootstrap non-parametric vs simple non-parametric Slightly more efficient (5-15% savings of sample size) Confidence intervals can be estimated for smaller sample sizes (for simple non- parametric N 120 for 90%-CI) The reference interval
22
Statistics & graphics for the laboratory 22 Example Example: Triglycerides with CBstat ProcedureCI Lower limitCI Upper limit Parametric direct0.08 – 0.231.79 – 1.94 Non-parametric0.34 – 0.521.92 – 2.60 Non-parametric bootstrap0.37 – 0.521.88 – 2.33 Parametric after log-transform0.40 – 0.461.90 – 2.16 Note: Direct parametric is not correct! Simulation of triacylglyceride data We simulate data that correspond to the triacylglyceride data: skew ~1.64. We do that with Worksheet LnNormal 3 (mean = 0; SD = 0.48; n = 1000). Copy the data in the file RefInt.xls. Adapt the digits to 2 after the point (precision as displayed). Sample 20 values from these data (Tools>Data Analysis>Sampling). Compare the 90% confidence intervals n = 20 with the respective ones for n = 1000. The reference interval DataGeneration
23
Statistics & graphics for the laboratory 23 Software & references CBstat A Windows program distributed by K. Linnet (via aaccdirect.org). Offers general statistical methods and procedures dedicated for clinical biochemistry Estimation of reference intervals: Simple nonparametric and bootstrap procedure Parametric direct Parametric after transformations –One-stage: log-, 3-parameter-log-, Box & Cox- and Manly- –Two-stage: 1) Correction of skewness; 2) Correction of kurtosis Normality testing with appropriate corrections after transformations Appropriate confidence intervals of percentiles after transformation Further information: www.cbstat.com References Linnet K. Nonparametric estimation of reference intervals by simple and bootstrap- based procedures. Clin Chem 2000;46:867-9. Linnet K. Two-stage transformation systems for normalization of reference distributions evaluated. Clin Chem 1987;33:381-6. IFCC. J Clin Chem Clin Biochem 1987;25:645-56. Linnet K. Testing normality of transformed distributions. Appl Statist 1988;37:180- 6. The reference interval
24
Statistics & graphics for the laboratory 24 Notes
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.