Limitations of Statistical Measures of Error in Assessing the Accuracy of Glucose Sensors Craig Kollman1, Darrell Wilson2, Tim Wysocki3, Rosanna Fiallo-Scharer4, Eva Tsalikian5, William Tamborlane6, Roy Beck1, Katrina Ruedy1, and the Diabetes Research In Children Network (DirecNet) Study Group. 1Jaeb Center for Health Research, Tampa, FL; 2Division of Pediatric Endocrinology and Diabetes, Stanford University, Stanford, CA; 3Nemours Children’s Clinic, Jacksonville, FL; 4Barbara Davis Center for Childhood Diabetes, University of Colorado, Denver, CO;5Department of Pediatrics, University of Iowa, Carver College of Medicine, Iowa City, IA; 6Department of Pediatrics, Yale University School of Medicine, New Haven, CT.
Abstract Limitations of Statistical Measures of Error in Assessing the Accuracy of Glucose Sensors. The appropriate set of accuracy measures to evaluate near continuous glucose monitoring remains to be developed. Traditional methods applied to glucose meters do not adequately capture the time dimension of glucose sensor data. Moreover, some of these methods have substantial limitations which should be understood in order to place analysis results in proper context. We highlight these limitations using data from an inpatient study conducted by the DirecNet Study Group. Error grid analyses and the area under the curve (AUC) for the detection of hypoglycemia are commonly cited statistics. The percentage of values within error grids A+B are usually quite high, even for inaccurate sensors, potentially giving a false sense of accuracy. When we simulated artificially inaccurate sensors by randomly shuffling paired sensor readings with laboratory reference glucose values, 76% and 78% of pairs still fell within zones A+B for the Clarke and modified error grids, respectively. The mean AUC value was 62%. Correlation is also frequently used to quantify the accuracy of glucose sensors. This measure, however, is sensitive to the variation in true glucose levels. Simulations were run distributing the “true” glucose levels uniformly over the ranges indicated below (N=10,000 per sensor) and adding a normally distributed error (standard deviation 25 mg/dL in each case) for the sensor value. These simulated sensors all have identical levels of accuracy, but their correlation values vary considerably. In summary, the use of zone A+B percentage in error grid analysis and the AUC statistic can give misleading notions of sensor accuracy. The correlation coefficient is not a consistent measure of sensor accuracy. Novel statistical approaches are needed to better characterize the near continuous nature of these sensors.
Statistical Methods for Assessing Glucose Accuracy Originally developed for glucose meters. Do not capture the near continuous nature of glucose sensors. Difficult to assess trends. How well do sensors characterize acute changes in glucose?
Traditional Measures of Accuracy Error Grid Analysis Receiver Operating Characteristics (ROC) Area Under the Curve (AUC) Correlation Differences between Reference and Sensor Glucoses Difference Absolute difference Relative difference Relative absolute difference (RAD)
Goals of Error Grid Analysis Want to distinguish clinically meaningful vs. less important errors in glucose measurements. When would an erroneous value lead to an incorrect treatment decision? Sensors not approved for real time treatment decisions. Divide measurement errors into zones to distinguish increasing clinical significance of errors.
Potential Problems with Error Grid Analysis Zones A and B are narrower from 50-70 mg/dL. Many studies include few or no glucose values in this range. Sensor accuracy often measured by the percentage of points falling in zones A+B. This percentage can appear high, even for very inaccurate sensors. Can give a misleading notion of sensor accuracy through chance agreement.
Area Under the Curve (AUC) Often used to measure how accurately hypoglycemia (or hyperglycemia) is detected. Receiver Operating Characteristics (ROC) analysis. Look at different alarm levels that could be used and assess the sensitivity/specificity trade-off
Simulation Experiment Use data from the DirecNet Inpatient Accuracy Study. Make sensors artificially inaccurate by randomly shuffling the parings with the reference glucose. Resulting “sensors” still have realistic glucose distribution. Look at error grid and AUC analyses on resulting simulated data set. Repeat 10,000 times.
Results for Shuffled Sensors Clarke Error Grid 76% Zone A+B Modified Error Grid 78% AUC 62% mean value
Remarks Zones A and B on error grids are large enough that even inaccurate sensors will hit them the majority of the time. Much of the ROC curve involves alarm levels that would not realistically be used in practice. Resulting AUC value puts too much weight on high alarm settings.
Correlation Statistical measure of association. Number between –1 and +1. Often used as a measure of sensor accuracy. Sensitive to the amount of variation in the true glucose.
Another Simulation Create 4 simulated sensors so that each has identical accuracy. Do this by taking the sensor value to be the “true” value plus a normally distributed error with standard deviation = 25 mg/dL. Average value of the “true” glucose is 200 mg/dL for all 4 simulated sensors. Vary the range of true glucose values for each sensor.
Simulated Sensors with Identical Accuracy (N = 10,000 data pairs per sensor) Range of Pearson Intraclass Sensor # True Glucose Correlation Correlation 1 175-225 0.50 0.40 2 150-250 0.76 0.73 3 100-300 0.92 0.91 4 50-350 0.96 0.96
Summary Error grid analysis and AUC values can give inflated notions of sensor accuracy. Important to understand that “baseline” values of these statistics from inaccurate sensors are already high. Correlation is not a consistent measure of sensor accuracy.
Further Research Develop statistical measures that can incorporate the near continuous nature of sensor values. Assess sensors’ ability to detect acute changes in glucose. Measure any time lag in glucose readings.
Stanford University Bruce Buckingham Darrell Wilson Jennifer Block Paula Clinton Yale University William Tamborlane Stuart Weinzimer Elizabeth Boland Kristen Sikes Amy Steffen Jaeb Center for Health Research Roy Beck Katrina Ruedy Craig Kollman Dongyuan Xing Cynthia Silvester Barbara Davis Center H. Peter Chase Rosanna Fiallo-Scharer Jennifer Fisher Barbara Tallant University of Iowa Eva Tsalikian Michael Tansey Linda Larson Julie Coffey Amy Sheehan Nemours Children’s Clinic Tim Wysocki Nelly Mauras Keisha Bird Kelly Lofton