Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and.

Slides:



Advertisements
Similar presentations
1 Using Ontologies in Clinical Decision Support Applications Samson W. Tu Stanford Medical Informatics Stanford University.
Advertisements

Consistent and standardized common model to support large-scale vocabulary use and adoption Robust, scalable, and common API to reduce variation in clinical.
NLP Highlights GS Savova And team. Medication CEM template associatedCode Change_status Conditional Dosage Duration End_date Form Frequency Generic Negation_indicator.
Discovering Severity and Body Site Modifiers Dmitriy Dligach, Ph.D. Boston Children’s Hospital and Harvard Medical School.
Clinical Natural Language Processing: Part I Guergana K. Savova, PhD Childrens Hospital Boston and Harvard Medical School.
10 Points to Remember for the Management of Overweight and Obesity in Adults Management of Overweight and Obesity in Adults Summary Prepared by Elizabeth.
ClearTK: A Framework for Statistical Biomedical Natural Language Processing Philip Ogren Philipp Wetzler Department of Computer Science University of Colorado.
©2013 MFMER | slide-1 Building A Knowledge Base of Severe Adverse Drug Events Based On AERS Reporting Data Using Semantic Web Technologies Guoqian Jiang,
Towards Next Generation Integrative Mobile Semantic Health Information Assistants Evan W. Patton John Sheehan Yue.
Area 4 SHARP Face-to-Face Conference Phenotyping Team – Centerphase Project Assessing the Value of Phenotyping Algorithms June 30, 2011.
Guoqian Jiang, MD, PhD Mayo Clinic
Biomedical Informatics Some Observations on Clinical Data Representation in EHRs Christopher G. Chute, MD DrPH, Mayo Clinic Chair, ICD11 Revision, World.
EleMAP: An Online Tool for Harmonizing Data Elements using Standardized Metadata Registries and Biomedical Vocabularies Jyotishman Pathak, PhD 1 Janey.
A Flexible Workbench for Document Analysis and Text Mining NLDB’2004, Salford, June Gulla, Brasethvik and Kaada A Flexible Workbench for Document.
PBHCI Grantee Technical Assistance - Physical Health Indicator Data Jam Session Friday, February 11, :00 PM - 2:00 PM EST.
AUGUST 21, 2014 STANLEY M. HUFF, MD CHIEF MEDICAL INFORMATICS OFFICER INTERMOUNTAIN HEALTHCARE HSPC Meeting Introduction.
Apache Clinical Text Analysis and Knowledge Extraction System (cTAKES)
SHARPn Data Normalization November 18, Data-driven Healthcare Big Data Knowledge Research Practice Analytics Domain Pragmatics Experts.
Source: Site Name and Year IHS Diabetes Audit Diabetes Health Status Report ______Site Name_________ Health Outcomes and Care Given to Patients with Diabetes.
SHARPn Milestones: Natural Language Processing Guergana Savova, PhD Boston Childrens Hospital and Harvard Medical School.
Initial Prototype for Clinical Data Normalization and High Throughput Phenotyping SHARPn F2F June 30,2011.
© 2010 IBM Corporation © 2011 IBM Corporation September 6, 2012 NCDHHS FAMS Overview for Behavioral Health Managed Care Organizations.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
1 Betsy L. Humphreys, MLS Betsy L. Humphreys, MLS National Library of Medicine National Library of Medicine National Institutes of Health National Institutes.
Continual Development of a Personalized Decision Support System Dina Demner-Fushman Charlotte Seckman Cheryl Fisher George Thoma.
GRACE Project IST EGAAP meeting – Den Haag, 25/11/2004 Giuseppe Sisto – Telecom Italia Lab.
Improving the Quality of Physical Health Checks Kate Dale, Mental/Physical Health Lead BDCT.
1 CSE 2102 CSE 2102 Ph.D. Proposal A Process Framework For Ontology Modeling, Design, And Development Realized By Extending OWL and ODM Candidate: Rishi.
Adi Kartolo University of Ottawa. Initial Presentation 42-year-old African-American male with type 2 diabetes Chief Complaint: increasing body weight.
NYU Medical Grand Rounds Clinical Vignette Karyn Singer, PGY3 September 22, 2010 U NITED S TATES D EPARTMENT OF V ETERANS A FFAIRS.
16 February, 2003medXchange© 2003 Private & Confidential 1 INTEGRATED ELECTRONIC MEDICAL RECORD IEMR AND MEDICAL KNOWLEDGE BASE.
Funded by: European Commission – 6th Framework Project Reference: IST WP 2: Learning Web-service Domain Ontologies Miha Grčar Jožef Stefan.
A Case Study of ICD-11 Anatomy Value Set Extraction from SNOMED CT Guoqian Jiang, PhD ©2011 MFMER | slide-1 Division of Biomedical Statistics & Informatics,
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
The analyses upon which this publication is based were performed under Contract Number HHSM C sponsored by the Center for Medicare and Medicaid.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
1 Peter Fox Xinformatics 4400/6400 Week 11, April 16, 2013 Information Audit and dealing with Unstructured Information.
Knowledge Representation and Indexing Using the Unified Medical Language System Kenneth Baclawski* Joseph “Jay” Cigna* Mieczyslaw M. Kokar* Peter Major.
Open Health Natural Language Processing Consortium (OHNLP)
Information System Development Courses Figure: ISD Course Structure.
Treatment Summary University of California San Francisco Center of Excellence for Breast Cancer Care PI: Laura J Esserman MD MBA; Edward Mahoney; Elly.
Value Set Resolution: Build generalizable data normalization pipeline using LexEVS infrastructure resources Explore UIMA framework for implementing semantic.
CTAKES The clinical Text Analysis and Knowledge Extraction System.
Correlating Knowledge Using NLP: Relationships between the concepts of blood cancers, stem cell transplantation, and biomarkers Katy Zou and Weizhong Zhu.
System Changes and Interventions: Registry as a Clinical Practice Tool Mike Hindmarsh Improving Chronic Illness Care, a national program of the Robert.
NYU Medical Grand Rounds Clinical Vignette Jason Feliberti, MD PGY 2 Tuesday, May 22, 2012 U NITED S TATES D EPARTMENT OF V ETERANS A FFAIRS.
1 Guy Divita Qing Zeng-Treitler Salt Lake City VA, University of Utah School of Medicine Pragmatic Interoperability.
Clinical Practice Glycemic Management of Type 2 Diabetes Mellitus Faramarz Ismail-Beigi, M.D., Ph.D. Dr.kalantar N Engl J Med Volume 366(14):
Clinical Data Normalization Dr. Chute Aims: Build generalizable data normalization pipeline Semantic normalization annotators involving LexEVS Establish.
MedKAT Medical Knowledge Analysis Tool December 2009.
Prince Sattam Bin AbdulAziz University College Of Pharmacy Professor Mohammad Abd- elmotaal Mohammad Ruhal Ain, R Ph, PGDPRA, M Pharm Diabetes Mellitus.
Clinical Health Indicator Improvements and Hospital Usage Report Health Integration Project December 2013 Matthew Rich Matthew Rich – Health Integration.
Clinical Language Annotation, Modeling, and Processing Toolkit (CLAMP)
Open Health Natural Language Processing Consortium
SAGE Nick Beard Vice President, IDX Systems Corp..
© 2016 Chapter 6 Data Management Health Information Management Technology: An Applied Approach.
Showcasing work by Jonnageddala, Liaw, Ray, Kumar, Chang, and Dai on
Building chronic disease registries from EMR and administrative data
Semantic natural language understanding at scale using Spark, machine-learned annotators and deep-learned ontologies David Claudiu Branzan.
Natural Language Processing (NLP)
Health Natural Language Processing Center
cTAKES: Demo Clinical Text Analysis and Knowledge Extraction System
Diabetes Health Status Report
Walden University Carrie Vanzant February 7, 2010
Strategic Health IT Advanced Research Projects (SHARP)
RichAnnotator: Annotating rich (XML-like) documents
EPocrates The Coalition of Orange County Community Clinics Information Technology Activities A case study on the pursuit of HIT in Community Clinic Healthcare.
Natural Language Processing (NLP)
Outline Context for database development Goals of database development
Natural Language Processing (NLP)
Presentation transcript:

Recent Efforts in Clinical NLP: Clinical Text Analysis and Knowledge Extraction System (cTAKES) Guergana K. Savova, PhD Children’s Hospital Boston and Harvard Medical School

Acknowledgements Software developers and contributors at different times (in no specific order) James Masanz, Mayo Clinic Patrick Duffy, Mayo Clinic Philip Ogren, University of Colorado Sean Murphy, Mayo Clinic Vinod Kaggal, Mayo Clinic Jiaping Zheng, Childrens Hospital Boston Pei Chen, Childrens Hospital Boston Jihno Choi, University of Colorado Investigators (in no specific order) Christopher Chute, MD, DrPH, Mayo Clinic James Buntrock, MS, Mayo Clinic Guergana Savova, PhD, Childrens Hospital Boston

Overview Background Clinical Text Analysis and Knowledge Extraction System (cTAKES) cTAKES for developers  Download and install of cTAKES  How to build the dictionary cTAKES: graphical user interface

4 Definitions Information Extraction (IE) Extracting existing facts from unstructured or loosely structured text into a structured form Information Retrieval (IR) Finding documents relevant to a user query Named Entity Recognition (NER) Discovery of groups of textual mentions that belong to certain semantic class Natural Language Processing (NLP) Computational methods for text processing based on linguistically sound principles Clinical NLP – NLP for the clinical narrative Biomedical NLP – NLP for the clinical narrative and biomedical literature

5 Problem Space Structured information Relational databases Easy to extract information from them Semi-structured information Loosely formatted XML, CSV tables Not challenging to extract information Unstructured information Scholarly literature, clinical notes, research reports, webpages Majority of information is unstructured!! Real challenge to extract the information

Overarching Goal Open-source, general-purpose clinical NLP toolkit  Phenotype extraction from unstructured data  Library of modules  Cohesive with other initiatives  Cutting edge methodologies  Best software development practices Our principles  Open source  Scalable and robust  Modular and expandable  Based on existing standards and conventions  Scalable, adaptable methodologies through open collaboration in the open-source development

A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 mpresentation. Her initial blood glucose was 340 mg/dL. Glyburide A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide 2.5 mg once daily was prescribed. Since then, self-monitoring of blood glucose (SMBG) showed blood glucose levels of mg/dL. She was referred to an endocrinologist for further evaluation. On examination, she was normotensive and not acutely ill. Her body mass index (BMI) was 18.7 kg/m2 following a recent 10 lb weight loss. Her thyroid was symmetrically enlarged and ankle reflexes absent. Her blood glucose was 272 mg/dL, and her hemoglobin A1c (HbA1c) was 10.3%. A lipid profile showed a total cholesterol of 261 mg/dL, triglyceride level of 321 mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid function was normal. Urinanalysis showed trace ketones. She adhered to a regular exercise program and vitamin regimen, smoked 2 packs of cigarettes daily for the past 25 years, and limited her alcohol intake to 1 drink daily. Her mother's brother was diabetic. Processing Clinical Notes A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide 2.5 mg once daily was prescribed. Since then, self-monitoring of blood glucose (SMBG) showed blood glucose levels of mg/dL. She was referred to an endocrinologist for further evaluation. On examination, she was normotensive and not acutely ill. Her body mass index (BMI) was 18.7 kg/m2 following a recent 10 lb weight loss. Her thyroid was symmetrically enlarged and ankle reflexes absent. Her blood glucose was 272 mg/dL, and her hemoglobin A1c (HbA1c) was 10.3%. A lipid profile showed a total cholesterol of 261 mg/dL, triglyceride level of 321 mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid function was normal. Urinanalysis showed trace ketones. She adhered to a regular exercise program and vitamin regimen, smoked 2 packs of cigarettes daily for the past 25 years, and limited her alcohol intake to 1 drink daily. Her mother's brother was diabetic.

Clinical Element Model home.aspx Disorder CEM text: diabetes mellitus code: subject: patient relative temporal context: 3 months ago negation indicator: not negated Disorder CEM text: diabetes mellitus code: subject: family member relative temporal context: negation indicator: not negated Tobacco Use CEM text: smoking code: subject: patient relative temporal context: 25 years negation indicator: not negated Medication CEM text: Glyburide code: subject: patient frequency: once daily negation indicator: not negated strength:2.5 mg A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide 2.5 mg once daily was prescribed. Since then, self-monitoring of blood glucose (SMBG) showed blood glucose levels of mg/dL. She was referred to an endocrinologist for further evaluation. On examination, she was normotensive and not acutely ill. Her body mass index (BMI) was 18.7 kg/m2 following a recent 10 lb weight loss. Her thyroid was symmetrically enlarged and ankle reflexes absent. Her blood glucose was 272 mg/dL, and her hemoglobin A1c (HbA1c) was 10.3%. A lipid profile showed a total cholesterol of 261 mg/dL, triglyceride level of 321 mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid function was normal. Urinanalysis showed trace ketones. She adhered to a regular exercise program and vitamin regimen, smoked 2 packs of cigarettes daily for the past 25 years, and limited her alcohol intake to 1 drink daily. Her mother's brother was diabetic. A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide 2.5 mg once daily was prescribed. Since then, self-monitoring of blood glucose (SMBG) showed blood glucose levels of mg/dL. She was referred to an endocrinologist for further evaluation. On examination, she was normotensive and not acutely ill. Her body mass index (BMI) was 18.7 kg/m2 following a recent 10 lb weight loss. Her thyroid was symmetrically enlarged and ankle reflexes absent. Her blood glucose was 272 mg/dL, and her hemoglobin A1c (HbA1c) was 10.3%. A lipid profile showed a total cholesterol of 261 mg/dL, triglyceride level of 321 mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid function was normal. Urinanalysis showed trace ketones. She adhered to a regular exercise program and vitamin regimen, smoked 2 packs of cigarettes daily for the past 25 years, and limited her alcohol intake to 1 drink daily. Her mother's brother was diabetic. A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide 2.5 mg once daily was prescribed. Since then, self-monitoring of blood glucose (SMBG) showed blood glucose levels of mg/dL. She was referred to an endocrinologist for further evaluation. On examination, she was normotensive and not acutely ill. Her body mass index (BMI) was 18.7 kg/m2 following a recent 10 lb weight loss. Her thyroid was symmetrically enlarged and ankle reflexes absent. Her blood glucose was 272 mg/dL, and her hemoglobin A1c (HbA1c) was 10.3%. A lipid profile showed a total cholesterol of 261 mg/dL, triglyceride level of 321 mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid function was normal. Urinanalysis showed trace ketones. She adhered to a regular exercise program and vitamin regimen, smoked 2 packs of cigarettes daily for the past 25 years, and limited her alcohol intake to 1 drink daily. Her mother's brother was diabetic. A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide 2.5 mg once daily was prescribed. Since then, self-monitoring of blood glucose (SMBG) showed blood glucose levels of mg/dL. She was referred to an endocrinologist for further evaluation. On examination, she was normotensive and not acutely ill. Her body mass index (BMI) was 18.7 kg/m2 following a recent 10 lb weight loss. Her thyroid was symmetrically enlarged and ankle reflexes absent. Her blood glucose was 272 mg/dL, and her hemoglobin A1c (HbA1c) was 10.3%. A lipid profile showed a total cholesterol of 261 mg/dL, triglyceride level of 321 mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid function was normal. Urinanalysis showed trace ketones. She adhered to a regular exercise program and vitamin regimen, smoked 2 packs of cigarettes daily for the past 25 years, and limited her alcohol intake to 1 drink daily. Her mother's brother was diabetic.

Comparative Effectiveness Disorder CEM text: diabetes mellitus code: subject: patient relative temporal context: 3 months ago negation indicator: not negated Disorder CEM text: diabetes mellitus code: subject: family member relative temporal context: negation indicator: not negated Tobacco Use CEM text: smoking code: subject: patient relative temporal context: 25 years negation indicator: not negated Medication CEM text: Glyburide code: subject: patient frequency: once daily negation indicator: not negated strength:2.5 mg Compare the effectiveness of different treatment strategies (e.g., modifying target levels for glucose, lipid, or blood pressure) in reducing cardiovascular complications in newly diagnosed adolescents and adults with type 2 diabetes. Compare the effectiveness of traditional behavioral interventions versus economic incentives in motivating behavior changes (e.g., weight loss, smoking cessation, avoiding alcohol and substance abuse) in children and adults.

Meaningful Use Disorder CEM text: diabetes mellitus code: subject: patient relative temporal context: 3 months ago negation indicator: not negated Disorder CEM text: diabetes mellitus code: subject: family member relative temporal context: negation indicator: not negated Tobacco Use CEM text: smoking code: subject: patient relative temporal context: 25 years negation indicator: not negated Medication CEM text: Glyburide code: subject: patient frequency: once daily negation indicator: not negated strength:2.5 mg Maintain problem list Maintain active med list Record smoking status Provide clinical summaries for each office visit Generate patient lists for specific conditions Submit syndromic surveillance data

Clinical Practice Disorder CEM text: diabetes mellitus code: subject: patient relative temporal context: 3 months ago negation indicator: not negated Medication CEM text: Glyburide code: subject: patient frequency: once daily negation indicator: not negated strength:2.5 mg Provide problem list and meds from the visit

Applications Meaningful use of the EMR Comparative effectiveness Clinical investigation  Patient cohort identification  Phenotype extraction Epidemiology Clinical practice and many more…. With deep semantic processing, the sky is the limit for applications

Partnerships NCBC-funded initiatives  Integrating Data for Analysis, Anonymization and Sharing (iDASH)  Ontology Development and Information Extraction (ODIE) Veterans Administration Strategic Health Advanced Research Projects (SHARP)  SHARP 3: SMaRT app (  SHARP 4: R01s  Shared annotated lexical resource  Temporal relation discovery for the clinical domain  Milti-source integrated platform for answering clinical questions eMERGE, PGRN (Pharmacogenomics Research Network) Linguistic Data Consortium and Penn Treebank MITRE Corporation

Integrating cTAKES within i2b2 Querying encrypted clinical notes stored in the i2b2 database Processing the result notes through cTAKES Persisting extracted concepts into the i2b2 database Thus, the concepts are now searchable by the researcher Enabling the training and running classifiers directly from the i2b2 workbench ….a scalable informatics framework that will enable clinical researchers to use existing clinical data for discovery research and, when combined with IRB-approved genomic data, facilitate the design of targeted therapies for individual patients with diseases having genetic origins.

15 clinical Text Analysis and Knowledge Extraction System (cTAKES)

16

cTAKES Adoption May, 2011:  2306 downloads* eMERGE (SGH, NW) PGRN (HMS, NW) Extensions: Yale (YATEX), MITRE * Source:

18 cTAKES Technical Details Open source Apache v2.0 license Java 1.5 Dependency on UMLS which requires a UMLS license (free) Framework IBM’s Unstructured Information Management Architecture (UIMA) open source framework, Apache project Methods Natural Language Processing methods (NLP) Based on standards and conventions to foster interoperability Application High-throughput system

19 cTAKES: Components Sentence boundary detection (OpenNLP technology) Tokenization (rule-based) Morphologic normalization (NLM’s LVG) POS tagging (OpenNLP technology) Shallow parsing (OpenNLP technology) Named Entity Recognition Dictionary mapping (lookup algorithm) Machine learning (MAWUI) types: diseases/disorders, signs/symptoms, anatomical sites, procedures, medications Negation and context identification (NegEx) Dependency parser Drug Profile module Smoking status classifier CEM normalization module (soon to be released)

20 Output Example: Drug Object “Tamoxifen 20 mg po daily started on March 1, 2005.” Drug Text: Tamoxifen Associated code: C Strength: 20 mg Start date: March 1, 2005 End date: null Dosage: 1.0 Frequency: 1.0 Frequency unit: daily Duration: null Route: Enteral Oral Form: null Status: current Change Status: no change Certainty: null

21 Output Example: Disorder Object “No evidence of cholangiocarcinoma.” Disorder Text: cholangiocarcinoma Associated code: SNOMED Certainty: 1 Context: current Relatedness to patient: true Status: negated

(1)cTAKES for developers Download and install of cTAKES Building the dictionary Jiaping Zheng Children’s Hospital Boston

Introduction See separate pdf for the slides

24 Graphical User Interface (GUI) to cTAKES: a Prototype Pei J. Chen Children’s Hospital Boston

cTAKES as a Service Objectives 1. Demo cTAKES prototype web application Empower End Users to leverage cTAKES 2. Gather feedback for future cTAKES GUI 3. Potential system integrations with other applications (i.e. i2b2, ARC, Web Annotator) Developed within i2b2 to integrate cTAKES in the i2b2 NLP cell

cTAKES Web Application: a Prototype

Single clinical note

Technologies Front-End Web GUI  ExtJS  JavaScript Back-End cTAKES  JAVA  UIMA Middleware Web Services JAVA Apache CXF JSON

Deployment Considerations Deployment Model Security Performance Licensing (UMLS, Apache, GPL v.3)