Download presentation
Presentation is loading. Please wait.
Published byBruno Booker Modified over 6 years ago
1
Classifying Supplement Use Status in Clinical Notes
Yadan Fan, BS1, Lu He, BS2 Serguei V.S. Pakhomov, PhD1,3 Genevieve B. Melton, MD, PhD1,4 Rui Zhang, PhD1,4 1Institute for Health Informatics 2Department of Computer Science 3College of Pharmacy 4Department of Surgery University of Minnesota, Minneapolis, MN
2
Introduction Approximately 68% of the Americans take dietary supplements1 Adverse reactions2 From in-vivo or in-vitro studies, case report, post-market surveillance Under-reported Electronic Health Records (EHRs) Reliable patient information Supplements term coverage Great amount of information about supplements use is embedded in clinical notes 1. 2. Geller, Andrew I., et al. N Engl J Med (2015): Zhang, Rui, et al. AMIA Annual Symposium Proceedings. Vol
3
Objective To automatically classify use status of dietary supplements by applying text mining methods 25 supplements Alfalfa Ginkgo Bilberry Dandelion Kava Echinacea Ginseng Biotin Flax seed Lecithin Fish oil Melatonin Black cohosh Folic acid Milk thistle Garlic St. John’s Wort Coenzyme Q10 Glucosamine Saw palmetto Ginger Vitamin E Cranberry Glutamine Tumeric
4
7 feature sets with 5 classification algorithms
Overview Method clinical data repository notes Training Set (1000, ~77%) 10 supplements 100 for each supplement preprocessing Select the model Random Selection 10 supplements: alfalfa, echinacea, fish oil, garlic, ginger, ginkgo, ginseng, melatonin, St. John’s Wort, Vitamin E 7 feature sets with 5 classification algorithms 1300 sentences 15 supplements: bilberry, biotin, black cohosh, coenzyme Q10, cranberry, dandelion, flax seed, folic acid, glucosamine, glutamine, kava, lecithin, milk thistle, saw palmetto, and tumeric Test Set (300, ~23%) Annotation 15 supplements 20 for each supplement preprocessing Evaluate the model Gold Standard Continuing Discontinued Started Unclassified Performance evaluation
5
Data Collection Notes retrieval Data sets key words searching
lexical variations “ginkgo”, “gingko”, “ginko”, “ginkoba” Data sets Training set Compare 7 feature sets with 5 classification algorithms Test set Evaluate the optimal model in the training data
6
Development of Gold Standard
Annotation guideline Adapted from previous study* investigating drug use status Minor changes Apply on 20 randomly selected sentences Disagreement resolved by discussion *Pakhomov, Serguei V. et al. Proceedings of the AMIA Symposium. American Medical Informatics Association, 2002.
7
Annotation Guideline Use Status Definition Examples Continuing (C)
Patients continue on current supplements She continued on herbal supplements including echinacea. Increase the dose of garlic. Discontinued (D) Discontinuation the supplements Stopped taking her garlic two weeks ago. Pt will hold taking ginseng. Started (S) Initiation of new supplements or restarting supplements Start ginkgo to help memory. Begin melatonin 10mg 1 hour before bedtime Unclassified (U) Do no offer ample information about the use status, such as recommendation, education, negation Advised over-the-counter melatonin. Denies using st johns wort.
8
Development of Gold Standard
Annotation guideline Inter-annotator agreement 100 randomly selected sentences Cohen’s Kappa score: 0.93 Percentage agreement: 95% Equally split and annotated split and annotated the dataset among two reviewers
9
Gold Standard
10
Feature Set Type 0 – raw unigrams Bag-of-words representation method
11
Feature Set Type 0 – raw unigrams Type 1 – normalized unigrams
lexical variation generation (LVG) tool E.g.: “takes”, “taken”, “taking”, “took”: “take”
12
Feature Set Type 0 – raw unigrams Type 1 – normalized unigrams
Type 2 – normalized unigrams + bigrams
13
Feature Set Type 0 – raw unigrams Type 1 – normalized unigrams
Type 2 – normalized unigrams + bigrams Type 3 – indicator words only Semantic cues: She has increased alfalfa tables: Continuing Stopped taking her garlic two weeks ago : Discontinued Pt started taking ginkgo biloba : Started Melatonin is recommended for sleep aid: Unclassified A list of indicator words Pakhomov, Serguei V. et al. Proceedings of the AMIA Symposium. American Medical Informatics Association, 2002.
14
Indicator Keywords start start, starts, started, starting restart restart, restarts, restarted, restarting resume resume, resumed, resumes, resuming initiate initiate, initiates, initiated, initiating increase increase, increases, increased, increasing decrease decrease, decreases, decreased, decreasing reduce reduce, reduces, reduced, reducing lower lower, lowers, lowered, lowering take take, takes, took, taking, taken consume consume, consumes, consumed, consuming stop stop, stops, stopped, stopping hold hold, holds, held, holding advise advise, advises, advised, advising avoid avoid, avoids, avoided, avoiding deny deny, denies, denied, denying decline decline, declines, declined, declining refuse Refuse, refuses, refused, refusing neg no, not, never
15
Feature Set Type 0 – raw unigrams Type 1 – normalized unigrams
Type 2 – normalized unigrams + bigrams Type 3 – indicator words only Type 4 – normalized unigrams + indicators with distance Indicator is close to supplement mention He continues on Coumadin and also has recently started ginseng as he is concerned about the fatigue he will have during chemotherapy The optimal window size is 4 S
16
Feature Set Type 0 – raw unigrams Type 1 – normalized unigrams
Type 2 – normalized unigrams + bigrams Type 3 – indicator words only Type 4 – normalized unigrams + indicators with distance Type 5 – normalized unigrams + bigrams + indicator with distance
17
Feature Set Type 0 – raw unigrams Type 1 – normalized unigrams
Type 2 – normalized unigrams + bigrams Type 3 – indicator words only Type 4 – normalized unigrams + indicators with distance Type 5 – normalized unigrams + bigrams + indicator with distance Type 6 – nouns + verbs + adverbs Verbs hold more information (indicators) Stanford parser Nouns (NN/NNS/NNP/NNPS) Verbs (VB/VBG/VBP/VBZ/VBD/VBN) Some adverbs (RB): “no”, “not”, “never”
18
Feature Set Type 0 – raw unigrams Type 1 – normalized unigrams
Type 2 – normalized unigrams + bigrams Type 3 – indicator words only Type 4 – normalized unigrams + indicators with distance Type 5 – normalized unigrams + bigrams + indicator with distance Type 6 – nouns + verbs + adverbs
19
Training and Evaluation
Algorithms Support Vector Machine (SVM) Maximum Entropy Naive Bayes Decision Tree Random Forest Evaluation 10-fold cross validation Precision, Recall, and F-measure
20
Training Data Performance
Classifier SVM Maximum Entropy Naïve Bayes Decision Tree Random Forest *P *R *F P R F Type 0 0.771 0.751 0.748 0.778 0.762 0.760 0.757 0.726 0.721 0.738 0.718 0.717 0.789 0.763 0.753 Type 1 0.799 0.772 0.735 0.734 0.659 0.639 0.596 0.792 0.759 0.743 0.791 0.767 0.756 Type 2 0.839 0.838 0.813 0.794 0.786 0.635 0.579 0.497 0.804 0.790 0.785 0.747 Type 3 0.784 0.783 0.750 0.729 0.711 0.788 0.818 0.815 0.812 Type 4 0.798 0.793 0.761 0.678 0.612 0.541 0.745 0.816 Type 5 0.845 0.844 0.823 0.806 0.800 0.653 0.584 0.499 0.810 0.808 Type 6 0.829 0.828 0.749 0.681 0.647 0.613 0.787 *P: precision, R: recall, F: F-measure
21
Training Data Performance
Classifier SVM Maximum Entropy Naïve Bayes Decision Tree Random Forest *P *R *F P R F Type 0 0.771 0.751 0.748 0.778 0.762 0.760 0.757 0.726 0.721 0.738 0.718 0.717 0.789 0.763 0.753 Type 1 0.799 0.772 0.735 0.734 0.659 0.639 0.596 0.792 0.759 0.743 0.791 0.767 0.756 Type 2 0.839 0.838 0.813 0.794 0.786 0.635 0.579 0.497 0.804 0.790 0.785 0.747 Type 3 0.784 0.783 0.750 0.729 0.711 0.788 0.818 0.815 0.812 Type 4 0.798 0.793 0.761 0.678 0.612 0.541 0.745 0.816 Type 5 0.845 0.844 0.823 0.806 0.800 0.653 0.584 0.499 0.810 0.808 Type 6 0.829 0.828 0.749 0.681 0.647 0.613 0.787 *P: precision, R: recall, F: F-measure
22
Training Data Performance
Classifier SVM Maximum Entropy Naïve Bayes Decision Tree Random Forest *P *R *F P R F Type 0 0.771 0.751 0.748 0.778 0.762 0.760 0.757 0.726 0.721 0.738 0.718 0.717 0.789 0.763 0.753 Type 1 0.799 0.772 0.735 0.734 0.659 0.639 0.596 0.792 0.759 0.743 0.791 0.767 0.756 Type 2 0.839 0.838 0.813 0.794 0.786 0.635 0.579 0.497 0.804 0.790 0.785 0.747 Type 3 0.784 0.783 0.750 0.729 0.711 0.788 0.818 0.815 0.812 Type 4 0.798 0.793 0.761 0.678 0.612 0.541 0.745 0.816 Type 5 0.845 0.844 0.823 0.806 0.800 0.653 0.584 0.499 0.810 0.808 Type 6 0.829 0.828 0.749 0.681 0.647 0.613 0.787 *P: precision, R: recall, F: F-measure
23
Training Data Performance
Classifier SVM Maximum Entropy Naïve Bayes Decision Tree Random Forest *P *R *F P R F Type 0 0.771 0.751 0.748 0.778 0.762 0.760 0.757 0.726 0.721 0.738 0.718 0.717 0.789 0.763 0.753 Type 1 0.799 0.772 0.735 0.734 0.659 0.639 0.596 0.792 0.759 0.743 0.791 0.767 0.756 Type 2 0.839 0.838 0.813 0.794 0.786 0.635 0.579 0.497 0.804 0.790 0.785 0.747 Type 3 0.784 0.783 0.750 0.729 0.711 0.788 0.818 0.815 0.812 Type 4 0.798 0.793 0.761 0.678 0.612 0.541 0.745 0.816 Type 5 0.845 0.844 0.823 0.806 0.800 0.653 0.584 0.499 0.810 0.808 Type 6 0.829 0.828 0.749 0.681 0.647 0.613 0.787 *P: precision, R: recall, F: F-measure
24
Test Data Performance SVM with Type 5
Use Status Precision Recall F-measure Continuing (C) 0.869 0.946 0.906 Discontinued (D) 0.933 0.894 0.913 Started (S) 0.896 0.932 0.914 Unclassified (U) 0.786 0.657 0.715
25
Discussion SVM outperformed other algorithms
Performance of classifiers Lexical normalizations Bigrams Indication words Distance (window size) Best model SVM Normalized unigrams +bigrams+ indicator words with distance
26
Limitation & Future Work
Small corpus Incorporating more data to enlarge the dataset Abbreviations, acronyms and typos Incorporating existing abbreviation disambiguation methods
27
Conclusions The training model built from 10 supplements performed well on the test data on 15 supplements Applying text mining methods on clinical notes can extract supplements use information Knowing supplements use among patients can be further applied in clinical research
28
Acknowledgements Advisor: Rui Zhang, PhD
NIH/National Center for Complementary and Integrative Health (NCCIH) grant (R01AT009457) (Zhang) University of Minnesota Grant-In-Aid award (Zhang) Agency for Healthcare Research & Quality grant (R01HS022085) (Melton) National Center for Advancing Translational Sciences of the National Institutes of Health (UL1TR000114) (Blazar)
29
Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.