國立雲林科技大學 National Yunlin University of Science and Technology Predicting adequacy of vancomycin regimens: A learning-based classification approach to improving clinical decision making Paul Jen-Hwa Hua, Chih-Ping Wei, Tsang-Hsiang Cheng, Jian-Xun Chen, Decision Support Systems, (article in press). Presenter : Wei-Shen Tai Advisor : Professor Chung-Chian Hsu 2006/5/10
N.Y.U.S.T. I. M. Outline Introduction Data and automated classification techniques Classification system and evaluation design Evaluation result and discussion Conclusion Comments
N.Y.U.S.T. I. M. Motivation Clinicians' drug regimen decision making Of particularly alarming salience are problems surrounding sub- or overtherapeutic doses of high-alert medications. Managing the clinical use of vancomycin It is challenging because of its narrow therapeutic index (decision problem) and significant, lasting, adverse effects on patients (derived problem).
N.Y.U.S.T. I. M. Objective Decision support system to predict the adequacy of a vancomycin regimen is desirable. enhance the efficacy of initial regimen estimations by the nomogram and complement the pharmacokinetic analysis.
N.Y.U.S.T. I. M. Classification method Supervised learning techniques in artificial intelligence. Decision-tree induction C4.5 and a back propagation neural network Extend with Bagging.
N.Y.U.S.T. I. M. Bagging Bootstrap sampling To generate multiple training data sets from the original overall training data set create a distinct data set that consists of the same number of training instances as appear in the original data set. Construct base classifiers Based on machine learning techniques (e.g., C4.5 or the backpropagation neural network). This process terminates after reaching a specified number of iterations. Majority-voting scheme Integrates all constructed base classifiers when a new (unseen) instance have been classified.
N.Y.U.S.T. I. M. Evaluation and results Overall accuracy Significantly higher than that of the benchmark one-compartment pharmacokinetic model. Bagging can significantly improve the performance of each system. Insensitive fairly insensitive to the size of its training data set. C4.5 vs. NN Computation and performance
N.Y.U.S.T. I. M. Vancomycin and its clinical use Existed problem Vancomycin consistently has been identified as one of the top three adverse effect-producing pharmaceutical drugs in Taiwan between 1998 and General solution Clinicians often must supplement the nomogram with their experiences and patient condition assessments to adjust the regimen recommendations properly.
N.Y.U.S.T. I. M. Related prior research Pharmacokinetic model represents a mathematical scheme is crucial for estimating the elimination (discharge) rate of an administered drug. In general, an administered drug initially is distributed into a central compartment before diffusing into the peripheral compartment. Its prediction accuracy of peak and trough concentrations is limited.
N.Y.U.S.T. I. M. Bootstrap – A re-sampling method Fundamental idea Compute measures of our inference uncertainty from that estimated sampling distribution of f. Re-sampling using some form of re-sampling with replacement from the actual data, x, to generate B bootstrap samples, x*. Often, the data (sample) consist of n independent units and it then suffices to take a simple random sample of size n. Goal From the set of results of sample size B we measure our inference uncertainties from sample to (conceptual) population (see figure). Caution The bootstrap can work well for large sample sizes (n), but may not be reliable for small n (say 5, 10 or even 20), regardless of how many bootstrap samples, B, are used.
N.Y.U.S.T. I. M. Case tutorial for Bootstrap Sample data Consider a sample of weights of 27 rats (n = 27); the data are The sample mean of these data = , standard deviation = with cv = For illustration, what if we wanted an estimate of the standard error of cv. Processes First, we draw a random subsample of size 27 with replacement. Thus, while a weight of 63 appears in the actual sample, perhaps it would not appear in the subsample; or is could appear more than once. Second, the whole process is repeated B times (where we will let B = 1,000 reps for this example). Thus, we generate 1000 resample data sets (b = 1, 2, 3,..., 1000) and from each of these we compute the cv and store these values. Third, we obtain the standard error of the cv by taking the standard deviation of the 1000 cv values (corresponding to the 1000 bootstrap samples). The process is simple. In this case, the standard error is
N.Y.U.S.T. I. M. Experiment design A DSS based on Weka an open-source machine learning software. Analysis items Rregimen adequacy, 2 output nodes: appropriate and inappropriate. Peak concentrations, 3 output nodes (i.e., low, on-target, and high) Trough concentrations, 2 output nodes (i.e., on- target and high).
N.Y.U.S.T. I. M. Conclusions A solution for the vancomycin usage A decision support systems based on promising learning-based classification techniques in AI. Performance improvement Superior to the benchmark one-compartment pharmacokinetic model in prediction of the adequacy of vancomycin regimens.
N.Y.U.S.T. I. M. Comments Bagging magic When the number of sample data set is poor, it can (maybe) improve the accuracy of classification. Insensitivity test problem Maybe 40%, 60% and 80% of entire cases own high consistency or bagging magic causes this result also. Parameter (optimization) finding cost The iteration time for bagging, optimal number of hidden nodes and adequate parameter tuning for NN. 80/20 training/testing strategy Whether it will be better than 10 fold cross validation in the result or not? It makes the distribution of training sample may be inconsistent with original data set.