Using text mining methods to detect a clinical infection

Using text mining methods to detect a clinical infection
Milena Gianfrancesco, PhD MPH Postdoctoral Researcher Division of Rheumatology UCSF School of Medicine Suzanne Tamang, PhD Assistant Faculty Director, Data Science Stanford Center for Population Health Sciences 05/18/2018 4/3/2019 [ADD PRESENTATION TITLE: INSERT TAB > HEADER & FOOTER > NOTES AND HANDOUTS]

Zoster infection a.k.a. “shingles”
Reactivation of the virus that causes chickenpox: varicella zoster virus 1 out of every 3 people will develop zoster in their lifetime. Anyone with a history of chickenpox can get zoster, but the risk generally increases with age. Patients on an immunosuppressive drug have a higher risk of developing zoster, and it can be more severe. Limited knowledge available to determine which patients are at highest risk for zoster, information critical for implementing preventive strategies such as vaccination or antiviral prophylaxis. *If we knew that a certain medication was associated with infection as well as a person’s characteristics, could vaccinate those first or consider switching medications. More severe cases: hospitalizations, central nervous system manifestations with potentially fatal or long term disabling outcome can occur. Lastly, new vaccine (killed) that’s more effective (vs live vaccine currently); can allocate this to those at high risk. [(4/1000 U.S. annually, age adjusted; above 60 years, 10 per 1,000 U.S. population)] Performance of machine learning methods using electronic medical records to detect and predict a clinical infection 4/3/2019 4/3/2019 [ADD PRESENTATION TITLE: INSERT TAB > HEADER & FOOTER > NOTES AND HANDOUTS]

Clinical reporting of zoster
Zoster infection is often treated outside of specialty clinic Sometimes entered as diagnosis (i.e. ICD code); more often mentioned in clinical note ICD codes may underestimate prevalence Potential bias towards more severe cases VZV often treated outside of specialty clinic, in outpatient setting or at an outside PCP Performance of machine learning methods using electronic medical records to detect and predict a clinical infection 4/3/2019 4/3/2019 [ADD PRESENTATION TITLE: INSERT TAB > HEADER & FOOTER > NOTES AND HANDOUTS]

Goals of project Apply and validate a text mining system to extract incident zoster infection from clinical notes Do ICD codes truly associate with more severe cases of zoster? Performance of machine learning methods using electronic medical records to detect and predict a clinical infection 4/3/2019 4/3/2019 [ADD PRESENTATION TITLE: INSERT TAB > HEADER & FOOTER > NOTES AND HANDOUTS]

Study population UCSF EHR
Data from June 1, 2012 – November 5, 2016 for 800,000+ individuals Structured tables Unstructured data (e.g. clinical notes) Performance of machine learning methods using electronic medical records to detect and predict a clinical infection 4/3/2019 4/3/2019 [ADD PRESENTATION TITLE: INSERT TAB > HEADER & FOOTER > NOTES AND HANDOUTS]

Study sample Individuals prescribed an immunosuppressant medication, and > 2 encounters in EHR 30 days apart. 31 immunosuppressant medications (IM) included N= 36,042 IM orders N= 16,344 unique individuals N= 259 cases identified via ICD code EHR N~800,000 Zoster prevalence in general population ~ 0.5% IM N~16,000 Zoster prevalence in IM population ~ 1.6% Performance of machine learning methods using electronic medical records to detect and predict a clinical infection 4/3/2019 4/3/2019 [ADD PRESENTATION TITLE: INSERT TAB > HEADER & FOOTER > NOTES AND HANDOUTS]

Demographics of participants (n=16,344)
N (%) or Mean (SD) Female 8,506 (52%) Age 50.31 (19.60) Race White Asian Black Other Unknown/Declined 8,304 (51%) 2,206 (13%) 1,111 (7%) 3,810 (23%) 859 (5%) Ethnicity Non-Hispanic or Latino Hispanic or Latino 12,589 (77%) 2,867 (18%) 888 (5%) Diverse sample but a majority were non Hispanic white 43% of sample was under 50 years old; traditionally zoster associated with age > 50 and currently recommended to vaccinate that age group; no recommendations for those under 61% under 60 years old. Performance of machine learning methods using electronic medical records to detect and predict a clinical infection 4/3/2019 4/3/2019 [ADD PRESENTATION TITLE: INSERT TAB > HEADER & FOOTER > NOTES AND HANDOUTS]

[CL]inical [EVE]nt [R]ecognition
Tricks based a less formal representation of text, more aligned with informal text, and aimed at efficiency, domain knowledge, common sense, and neat little tricks. CLEVER rules.

Challenges of Clinical Text Analysis
Clinical notes are not SOAP notes EMR > text (IE, not NLP!) Boundary detection is challenging (context window) End to end? Maybe not… CLEVER: assume there are important local contexts Synonyms, lexical variants (seed terms) Highly ambiguous (task specific lexicon) Semantic modifiers (base classes) Subgrammer, sublanguage (word embeddings) Acronyms Colloquial terms Out of UMLS vocabulary

CLEVER Pipeline 1. Terminology Construction 4. Patient-level Reporting
Candidate Event Matrix p0 t0 PID Event p0 … 1 pn 2. Pre-processing Structured t364 Unstructured PID Event p0 … 1 pn ______ Events Unstructured EHR Data = pn … … … … … … … … Eligible Patients … … … … … … … … target sequence class sequence candidate id note section time offset target term patient id note type ... Clinical Text … … … … … … … … Patient Labels time offset … … … … … … … … … … … … … … … … Combined Events … … … … … … … … time offset CPT codes ICD codes patient id gender age … Tokenizer 3. Extraction Structured Encounter Data cid0 Section Detector cid1 Event-level Labels cid2 Class Sequencer p0 N-gram Ranker p1 … … … … … … … … … cidm Rule-based Extractor Eligible Patients p2 Concept Recognizer* Statistical Extractor pn … … … … … … … … Qualifying Criteria * We do not use a distinct concept extraction step this work, but files for the purpose are produced by CLEVER

Example of zoster dictionary
Explain classes(?); i.e. VCV; VCV rx; VCV low 4/3/2019 [ADD PRESENTATION TITLE: INSERT TAB > HEADER & FOOTER > NOTES AND HANDOUTS]

Labeled Output: “Negative”
SNIPPET: . He was admitted [DATE] for evaluation and management of likely varicella zoster infection. His symptoms began as L-sided mouth pain ~6-7 days PTA which has become progressively worse. He was initially seen at the… CLEVER ANNOTATION: NEGATIVE|SCREEN|DOT_SCREEN_#VCV#_DOT_DOT| zoster infection|PID|NID|Consults|DATETIME|7|VCV|807|1059 |UK|NULL|period:DOT:1:986:73,evaluation:SCREEN:686:1013:46, period:DOT:1:1075:16,period:DOT:1:1168:109 Example 1: “negative” / “risk” = tagging as negative because in past (“she had.. Infection”) Example 2: two snippets from same person but see that each mention is tagged as it’s own. From this we’d have to decide for each person how to label (again, # positive mentions / all mentions?)

Labeled Output: Positive
SNIPPET: Would continue supportive care and refrain from using nephrotoxic agents at this time until pt demonstrates renal recovery. Zoster in immunocompormised: would recommend decrease dose of Acyclovir to 350mg Q8 and treat for the minimal treatment time. CLEVER ANNOTATION: POSITIVE|VCV|PT_DOT_#VCV#_PUNCT_DOT|zoster|PID|NID|Ambulatory Progress Notes|DATETIME|3|VCV|745|3960|ct| 3326|pt:PT:8:3928:32,period:DOT:1:3958:2,colon:PUNCT:4:3987:27, period:DOT:1:4084:124 Example 1: “negative” / “risk” = tagging as negative because in past (“she had.. Infection”) Example 2: two snippets from same person but see that each mention is tagged as it’s own. From this we’d have to decide for each person how to label (again, # positive mentions / all mentions?)

Zoster case detection using CLEVER
Generated a ‘dictionary’ of terms associated with zoster to assist in labeling notes Ran CLEVER on all notes Compiled files for each patient All positive mentions All negative mentions Will join with structured data (ICD codes, meds, labs, age, sex, race, etc.) to help identify case status Need to determine heuristic (i.e. # positive mentions / all mentions) to guide in labeling as “case”

Conclusion Further refinement of CLEVER to detect all types of infections will assist in developing a highly accurate pipeline for adverse event detection Better phenotyping of outcomes will assist future studies in identifying risk factors to prevent occurrences of adverse events Performance of machine learning methods using electronic medical records to detect and predict a clinical infection 4/3/2019 4/3/2019 [ADD PRESENTATION TITLE: INSERT TAB > HEADER & FOOTER > NOTES AND HANDOUTS]

Acknowledgements Funding: AHRQ: R01 HS024412 (PI: Yazdany)
NIAMS: F32 AR (PI: Gianfrancesco) Rheumatology Quality and Informatics Laboratory (QUIL) Jinoos Yazdany Gabriela Schmajuk Dana Ludwig Steve Shiboski Laura Trupin Michael Evans Julia Kay Zara Izadi Jing Li Performance of machine learning methods using electronic medical records to detect and predict a clinical infection 4/3/2019 4/3/2019 [ADD PRESENTATION TITLE: INSERT TAB > HEADER & FOOTER > NOTES AND HANDOUTS]

Using text mining methods to detect a clinical infection

Similar presentations

Presentation on theme: "Using text mining methods to detect a clinical infection"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Using text mining methods to detect a clinical infection

Similar presentations

Presentation on theme: "Using text mining methods to detect a clinical infection"— Presentation transcript:

Similar presentations

About project

Feedback