Presentation is loading. Please wait.

Presentation is loading. Please wait.

Classifying Supplement Use Status in Clinical Notes

Similar presentations


Presentation on theme: "Classifying Supplement Use Status in Clinical Notes"— Presentation transcript:

1 Classifying Supplement Use Status in Clinical Notes
Yadan Fan, BS1, Lu He, BS2 Serguei V.S. Pakhomov, PhD1,3 Genevieve B. Melton, MD, PhD1,4 Rui Zhang, PhD1,4 1Institute for Health Informatics 2Department of Computer Science 3College of Pharmacy 4Department of Surgery University of Minnesota, Minneapolis, MN

2 Introduction Approximately 68% of the Americans take dietary supplements1 Adverse reactions2 From in-vivo or in-vitro studies, case report, post-market surveillance Under-reported Electronic Health Records (EHRs) Reliable patient information Supplements term coverage Great amount of information about supplements use is embedded in clinical notes 1. 2. Geller, Andrew I., et al. N Engl J Med (2015): Zhang, Rui, et al. AMIA Annual Symposium Proceedings. Vol

3 Objective To automatically classify use status of dietary supplements by applying text mining methods 25 supplements Alfalfa Ginkgo Bilberry Dandelion Kava Echinacea Ginseng Biotin Flax seed Lecithin Fish oil Melatonin Black cohosh Folic acid Milk thistle Garlic St. John’s Wort Coenzyme Q10 Glucosamine Saw palmetto Ginger Vitamin E Cranberry Glutamine Tumeric

4 7 feature sets with 5 classification algorithms
Overview Method clinical data repository notes Training Set (1000, ~77%) 10 supplements 100 for each supplement preprocessing Select the model Random Selection 10 supplements: alfalfa, echinacea, fish oil, garlic, ginger, ginkgo, ginseng, melatonin, St. John’s Wort, Vitamin E 7 feature sets with 5 classification algorithms 1300 sentences 15 supplements: bilberry, biotin, black cohosh, coenzyme Q10, cranberry, dandelion, flax seed, folic acid, glucosamine, glutamine, kava, lecithin, milk thistle, saw palmetto, and tumeric Test Set (300, ~23%) Annotation 15 supplements 20 for each supplement preprocessing Evaluate the model Gold Standard Continuing Discontinued Started Unclassified Performance evaluation

5 Data Collection Notes retrieval Data sets key words searching
lexical variations “ginkgo”, “gingko”, “ginko”, “ginkoba” Data sets Training set Compare 7 feature sets with 5 classification algorithms Test set Evaluate the optimal model in the training data

6 Development of Gold Standard
Annotation guideline Adapted from previous study* investigating drug use status Minor changes Apply on 20 randomly selected sentences Disagreement resolved by discussion *Pakhomov, Serguei V. et al. Proceedings of the AMIA Symposium. American Medical Informatics Association, 2002.

7 Annotation Guideline Use Status Definition Examples Continuing (C)
Patients continue on current supplements She continued on herbal supplements including echinacea. Increase the dose of garlic. Discontinued (D) Discontinuation the supplements Stopped taking her garlic two weeks ago. Pt will hold taking ginseng. Started (S) Initiation of new supplements or restarting supplements Start ginkgo to help memory. Begin melatonin 10mg 1 hour before bedtime Unclassified (U) Do no offer ample information about the use status, such as recommendation, education, negation Advised over-the-counter melatonin. Denies using st johns wort.

8 Development of Gold Standard
Annotation guideline Inter-annotator agreement 100 randomly selected sentences Cohen’s Kappa score: 0.93 Percentage agreement: 95% Equally split and annotated split and annotated the dataset among two reviewers

9 Gold Standard

10 Feature Set Type 0 – raw unigrams Bag-of-words representation method

11 Feature Set Type 0 – raw unigrams Type 1 – normalized unigrams
lexical variation generation (LVG) tool E.g.: “takes”, “taken”, “taking”, “took”: “take”

12 Feature Set Type 0 – raw unigrams Type 1 – normalized unigrams
Type 2 – normalized unigrams + bigrams

13 Feature Set Type 0 – raw unigrams Type 1 – normalized unigrams
Type 2 – normalized unigrams + bigrams Type 3 – indicator words only Semantic cues: She has increased alfalfa tables: Continuing Stopped taking her garlic two weeks ago : Discontinued Pt started taking ginkgo biloba : Started Melatonin is recommended for sleep aid: Unclassified A list of indicator words Pakhomov, Serguei V. et al. Proceedings of the AMIA Symposium. American Medical Informatics Association, 2002.

14 Indicator Keywords start start, starts, started, starting restart restart, restarts, restarted, restarting resume resume, resumed, resumes, resuming initiate initiate, initiates, initiated, initiating increase increase, increases, increased, increasing decrease decrease, decreases, decreased, decreasing reduce reduce, reduces, reduced, reducing lower lower, lowers, lowered, lowering take take, takes, took, taking, taken consume consume, consumes, consumed, consuming stop stop, stops, stopped, stopping hold hold, holds, held, holding advise advise, advises, advised, advising avoid avoid, avoids, avoided, avoiding deny deny, denies, denied, denying decline decline, declines, declined, declining refuse Refuse, refuses, refused, refusing neg no, not, never

15 Feature Set Type 0 – raw unigrams Type 1 – normalized unigrams
Type 2 – normalized unigrams + bigrams Type 3 – indicator words only Type 4 – normalized unigrams + indicators with distance Indicator is close to supplement mention He continues on Coumadin and also has recently started ginseng as he is concerned about the fatigue he will have during chemotherapy The optimal window size is 4 S

16 Feature Set Type 0 – raw unigrams Type 1 – normalized unigrams
Type 2 – normalized unigrams + bigrams Type 3 – indicator words only Type 4 – normalized unigrams + indicators with distance Type 5 – normalized unigrams + bigrams + indicator with distance

17 Feature Set Type 0 – raw unigrams Type 1 – normalized unigrams
Type 2 – normalized unigrams + bigrams Type 3 – indicator words only Type 4 – normalized unigrams + indicators with distance Type 5 – normalized unigrams + bigrams + indicator with distance Type 6 – nouns + verbs + adverbs Verbs hold more information (indicators) Stanford parser Nouns (NN/NNS/NNP/NNPS) Verbs (VB/VBG/VBP/VBZ/VBD/VBN) Some adverbs (RB): “no”, “not”, “never”

18 Feature Set Type 0 – raw unigrams Type 1 – normalized unigrams
Type 2 – normalized unigrams + bigrams Type 3 – indicator words only Type 4 – normalized unigrams + indicators with distance Type 5 – normalized unigrams + bigrams + indicator with distance Type 6 – nouns + verbs + adverbs

19 Training and Evaluation
Algorithms Support Vector Machine (SVM) Maximum Entropy Naive Bayes Decision Tree Random Forest Evaluation 10-fold cross validation Precision, Recall, and F-measure

20 Training Data Performance
Classifier SVM Maximum Entropy Naïve Bayes Decision Tree Random Forest *P *R *F P R F Type 0 0.771 0.751 0.748 0.778 0.762 0.760 0.757 0.726 0.721 0.738 0.718 0.717 0.789 0.763 0.753 Type 1 0.799 0.772 0.735 0.734 0.659 0.639 0.596 0.792 0.759 0.743 0.791 0.767 0.756 Type 2 0.839 0.838 0.813 0.794 0.786 0.635 0.579 0.497 0.804 0.790 0.785 0.747 Type 3 0.784 0.783 0.750 0.729 0.711 0.788 0.818 0.815 0.812 Type 4 0.798 0.793 0.761 0.678 0.612 0.541 0.745 0.816 Type 5 0.845 0.844 0.823 0.806 0.800 0.653 0.584 0.499 0.810 0.808 Type 6 0.829 0.828 0.749 0.681 0.647 0.613 0.787 *P: precision, R: recall, F: F-measure

21 Training Data Performance
Classifier SVM Maximum Entropy Naïve Bayes Decision Tree Random Forest *P *R *F P R F Type 0 0.771 0.751 0.748 0.778 0.762 0.760 0.757 0.726 0.721 0.738 0.718 0.717 0.789 0.763 0.753 Type 1 0.799 0.772 0.735 0.734 0.659 0.639 0.596 0.792 0.759 0.743 0.791 0.767 0.756 Type 2 0.839 0.838 0.813 0.794 0.786 0.635 0.579 0.497 0.804 0.790 0.785 0.747 Type 3 0.784 0.783 0.750 0.729 0.711 0.788 0.818 0.815 0.812 Type 4 0.798 0.793 0.761 0.678 0.612 0.541 0.745 0.816 Type 5 0.845 0.844 0.823 0.806 0.800 0.653 0.584 0.499 0.810 0.808 Type 6 0.829 0.828 0.749 0.681 0.647 0.613 0.787 *P: precision, R: recall, F: F-measure

22 Training Data Performance
Classifier SVM Maximum Entropy Naïve Bayes Decision Tree Random Forest *P *R *F P R F Type 0 0.771 0.751 0.748 0.778 0.762 0.760 0.757 0.726 0.721 0.738 0.718 0.717 0.789 0.763 0.753 Type 1 0.799 0.772 0.735 0.734 0.659 0.639 0.596 0.792 0.759 0.743 0.791 0.767 0.756 Type 2 0.839 0.838 0.813 0.794 0.786 0.635 0.579 0.497 0.804 0.790 0.785 0.747 Type 3 0.784 0.783 0.750 0.729 0.711 0.788 0.818 0.815 0.812 Type 4 0.798 0.793 0.761 0.678 0.612 0.541 0.745 0.816 Type 5 0.845 0.844 0.823 0.806 0.800 0.653 0.584 0.499 0.810 0.808 Type 6 0.829 0.828 0.749 0.681 0.647 0.613 0.787 *P: precision, R: recall, F: F-measure

23 Training Data Performance
Classifier SVM Maximum Entropy Naïve Bayes Decision Tree Random Forest *P *R *F P R F Type 0 0.771 0.751 0.748 0.778 0.762 0.760 0.757 0.726 0.721 0.738 0.718 0.717 0.789 0.763 0.753 Type 1 0.799 0.772 0.735 0.734 0.659 0.639 0.596 0.792 0.759 0.743 0.791 0.767 0.756 Type 2 0.839 0.838 0.813 0.794 0.786 0.635 0.579 0.497 0.804 0.790 0.785 0.747 Type 3 0.784 0.783 0.750 0.729 0.711 0.788 0.818 0.815 0.812 Type 4 0.798 0.793 0.761 0.678 0.612 0.541 0.745 0.816 Type 5 0.845 0.844 0.823 0.806 0.800 0.653 0.584 0.499 0.810 0.808 Type 6 0.829 0.828 0.749 0.681 0.647 0.613 0.787 *P: precision, R: recall, F: F-measure

24 Test Data Performance SVM with Type 5
Use Status Precision Recall F-measure Continuing (C) 0.869 0.946 0.906 Discontinued (D) 0.933 0.894 0.913 Started (S) 0.896 0.932 0.914 Unclassified (U) 0.786 0.657 0.715

25 Discussion SVM outperformed other algorithms
Performance of classifiers Lexical normalizations Bigrams Indication words Distance (window size) Best model SVM Normalized unigrams +bigrams+ indicator words with distance

26 Limitation & Future Work
Small corpus Incorporating more data to enlarge the dataset Abbreviations, acronyms and typos Incorporating existing abbreviation disambiguation methods

27 Conclusions The training model built from 10 supplements performed well on the test data on 15 supplements Applying text mining methods on clinical notes can extract supplements use information Knowing supplements use among patients can be further applied in clinical research

28 Acknowledgements Advisor: Rui Zhang, PhD
NIH/National Center for Complementary and Integrative Health (NCCIH) grant (R01AT009457) (Zhang) University of Minnesota Grant-In-Aid award (Zhang) Agency for Healthcare Research & Quality grant (R01HS022085) (Melton) National Center for Advancing Translational Sciences of the National Institutes of Health (UL1TR000114) (Blazar)

29 Thank you!


Download ppt "Classifying Supplement Use Status in Clinical Notes"

Similar presentations


Ads by Google