Classifying Supplement Use Status in Clinical Notes

Classifying Supplement Use Status in Clinical Notes
Yadan Fan, BS1, Lu He, BS2 Serguei V.S. Pakhomov, PhD1,3 Genevieve B. Melton, MD, PhD1,4 Rui Zhang, PhD1,4 1Institute for Health Informatics 2Department of Computer Science 3College of Pharmacy 4Department of Surgery University of Minnesota, Minneapolis, MN

Introduction Approximately 68% of the Americans take dietary supplements1 Adverse reactions2 From in-vivo or in-vitro studies, case report, post-market surveillance Under-reported Electronic Health Records (EHRs) Reliable patient information Supplements term coverage Great amount of information about supplements use is embedded in clinical notes 1. 2. Geller, Andrew I., et al. N Engl J Med (2015): Zhang, Rui, et al. AMIA Annual Symposium Proceedings. Vol

Objective To automatically classify use status of dietary supplements by applying text mining methods 25 supplements Alfalfa Ginkgo Bilberry Dandelion Kava Echinacea Ginseng Biotin Flax seed Lecithin Fish oil Melatonin Black cohosh Folic acid Milk thistle Garlic St. John’s Wort Coenzyme Q10 Glucosamine Saw palmetto Ginger Vitamin E Cranberry Glutamine Tumeric

7 feature sets with 5 classification algorithms
Overview Method clinical data repository notes Training Set (1000, ~77%) 10 supplements 100 for each supplement preprocessing Select the model Random Selection 10 supplements: alfalfa, echinacea, fish oil, garlic, ginger, ginkgo, ginseng, melatonin, St. John’s Wort, Vitamin E 7 feature sets with 5 classification algorithms 1300 sentences 15 supplements: bilberry, biotin, black cohosh, coenzyme Q10, cranberry, dandelion, flax seed, folic acid, glucosamine, glutamine, kava, lecithin, milk thistle, saw palmetto, and tumeric Test Set (300, ~23%) Annotation 15 supplements 20 for each supplement preprocessing Evaluate the model Gold Standard Continuing Discontinued Started Unclassified Performance evaluation

Data Collection Notes retrieval Data sets key words searching
lexical variations “ginkgo”, “gingko”, “ginko”, “ginkoba” Data sets Training set Compare 7 feature sets with 5 classification algorithms Test set Evaluate the optimal model in the training data

Development of Gold Standard
Annotation guideline Adapted from previous study* investigating drug use status Minor changes Apply on 20 randomly selected sentences Disagreement resolved by discussion *Pakhomov, Serguei V. et al. Proceedings of the AMIA Symposium. American Medical Informatics Association, 2002.

Annotation Guideline Use Status Definition Examples Continuing (C)
Patients continue on current supplements She continued on herbal supplements including echinacea. Increase the dose of garlic. Discontinued (D) Discontinuation the supplements Stopped taking her garlic two weeks ago. Pt will hold taking ginseng. Started (S) Initiation of new supplements or restarting supplements Start ginkgo to help memory. Begin melatonin 10mg 1 hour before bedtime Unclassified (U) Do no offer ample information about the use status, such as recommendation, education, negation Advised over-the-counter melatonin. Denies using st johns wort.

Development of Gold Standard
Annotation guideline Inter-annotator agreement 100 randomly selected sentences Cohen’s Kappa score: 0.93 Percentage agreement: 95% Equally split and annotated split and annotated the dataset among two reviewers

Gold Standard

Feature Set Type 0 – raw unigrams Bag-of-words representation method

Feature Set Type 0 – raw unigrams Type 1 – normalized unigrams
lexical variation generation (LVG) tool E.g.: “takes”, “taken”, “taking”, “took”: “take”

Type 2 – normalized unigrams + bigrams

Type 2 – normalized unigrams + bigrams Type 3 – indicator words only Semantic cues: She has increased alfalfa tables: Continuing Stopped taking her garlic two weeks ago : Discontinued Pt started taking ginkgo biloba : Started Melatonin is recommended for sleep aid: Unclassified A list of indicator words Pakhomov, Serguei V. et al. Proceedings of the AMIA Symposium. American Medical Informatics Association, 2002.

Indicator Keywords start start, starts, started, starting restart restart, restarts, restarted, restarting resume resume, resumed, resumes, resuming initiate initiate, initiates, initiated, initiating increase increase, increases, increased, increasing decrease decrease, decreases, decreased, decreasing reduce reduce, reduces, reduced, reducing lower lower, lowers, lowered, lowering take take, takes, took, taking, taken consume consume, consumes, consumed, consuming stop stop, stops, stopped, stopping hold hold, holds, held, holding advise advise, advises, advised, advising avoid avoid, avoids, avoided, avoiding deny deny, denies, denied, denying decline decline, declines, declined, declining refuse Refuse, refuses, refused, refusing neg no, not, never

Type 2 – normalized unigrams + bigrams Type 3 – indicator words only Type 4 – normalized unigrams + indicators with distance Indicator is close to supplement mention He continues on Coumadin and also has recently started ginseng as he is concerned about the fatigue he will have during chemotherapy The optimal window size is 4 S

Type 2 – normalized unigrams + bigrams Type 3 – indicator words only Type 4 – normalized unigrams + indicators with distance Type 5 – normalized unigrams + bigrams + indicator with distance

Type 2 – normalized unigrams + bigrams Type 3 – indicator words only Type 4 – normalized unigrams + indicators with distance Type 5 – normalized unigrams + bigrams + indicator with distance Type 6 – nouns + verbs + adverbs Verbs hold more information (indicators) Stanford parser Nouns (NN/NNS/NNP/NNPS) Verbs (VB/VBG/VBP/VBZ/VBD/VBN) Some adverbs (RB): “no”, “not”, “never”

Type 2 – normalized unigrams + bigrams Type 3 – indicator words only Type 4 – normalized unigrams + indicators with distance Type 5 – normalized unigrams + bigrams + indicator with distance Type 6 – nouns + verbs + adverbs

Training and Evaluation
Algorithms Support Vector Machine (SVM) Maximum Entropy Naive Bayes Decision Tree Random Forest Evaluation 10-fold cross validation Precision, Recall, and F-measure

Training Data Performance
Classifier SVM Maximum Entropy Naïve Bayes Decision Tree Random Forest *P *R *F P R F Type 0 0.771 0.751 0.748 0.778 0.762 0.760 0.757 0.726 0.721 0.738 0.718 0.717 0.789 0.763 0.753 Type 1 0.799 0.772 0.735 0.734 0.659 0.639 0.596 0.792 0.759 0.743 0.791 0.767 0.756 Type 2 0.839 0.838 0.813 0.794 0.786 0.635 0.579 0.497 0.804 0.790 0.785 0.747 Type 3 0.784 0.783 0.750 0.729 0.711 0.788 0.818 0.815 0.812 Type 4 0.798 0.793 0.761 0.678 0.612 0.541 0.745 0.816 Type 5 0.845 0.844 0.823 0.806 0.800 0.653 0.584 0.499 0.810 0.808 Type 6 0.829 0.828 0.749 0.681 0.647 0.613 0.787 *P: precision, R: recall, F: F-measure

Test Data Performance SVM with Type 5
Use Status Precision Recall F-measure Continuing (C) 0.869 0.946 0.906 Discontinued (D) 0.933 0.894 0.913 Started (S) 0.896 0.932 0.914 Unclassified (U) 0.786 0.657 0.715

Discussion SVM outperformed other algorithms
Performance of classifiers Lexical normalizations Bigrams Indication words Distance (window size) Best model SVM Normalized unigrams +bigrams+ indicator words with distance

Limitation & Future Work
Small corpus Incorporating more data to enlarge the dataset Abbreviations, acronyms and typos Incorporating existing abbreviation disambiguation methods

Conclusions The training model built from 10 supplements performed well on the test data on 15 supplements Applying text mining methods on clinical notes can extract supplements use information Knowing supplements use among patients can be further applied in clinical research

Acknowledgements Advisor: Rui Zhang, PhD
NIH/National Center for Complementary and Integrative Health (NCCIH) grant (R01AT009457) (Zhang) University of Minnesota Grant-In-Aid award (Zhang) Agency for Healthcare Research & Quality grant (R01HS022085) (Melton) National Center for Advancing Translational Sciences of the National Institutes of Health (UL1TR000114) (Blazar)

Thank you!

Classifying Supplement Use Status in Clinical Notes

Similar presentations

Presentation on theme: "Classifying Supplement Use Status in Clinical Notes"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Classifying Supplement Use Status in Clinical Notes

Similar presentations

Presentation on theme: "Classifying Supplement Use Status in Clinical Notes"— Presentation transcript:

Similar presentations

About project

Feedback