Citation Biomedical Informatics Data ➜ Information ➜ Knowledge BMI Biomedical Named Entity Recognition Ramakanth Kavuluru NLP Seminar – 8/21/2012.

Slides:



Advertisements
Similar presentations
Consistent and standardized common model to support large-scale vocabulary use and adoption Robust, scalable, and common API to reduce variation in clinical.
Advertisements

Bio-Medical Interaction Extractor Syed Toufeeq Ahmed ASU.
Discovering Severity and Body Site Modifiers Dmitriy Dligach, Ph.D. Boston Children’s Hospital and Harvard Medical School.
Ontological analysis of the semantic types Anand Kumar MBBS, PhD IFOMIS, University of Saarland, Germany. BIOMEDICALONTOLOGYBIOMEDICALONTOLOGY.
Rev LAYERED SEMANTICS WHAT IS TOPICS ENTITIES SENTIMENT CATEGORIES RELATIONSHIPS.
Mining External Resources for Biomedical IE Why, How, What Malvina Nissim
Semantic indexing in PubMed CERN Workshop on Innovations in Scholarly Communication (OAI8) CERN Workshop on Innovations in Scholarly Communication (OAI8)
Knowledge Enabled Information and Services Science Schema-Driven Relationship Extraction from Unstructured Text Cartic Ramakrishnan Kno.e.sis Center, Wright.
Searching and Exploring Biomedical Data Vagelis Hristidis School of Computing and Information Sciences Florida International University.
Indexing the Biomedical Literature in a Time of Increased Demand and Limited Resources BioASQ Workshop September 27, 2013 Alan R. Aronson Lister Hill Center,
Ke Liu1, Junqiu Wu2, Shengwen Peng1,Chengxiang Zhai3, Shanfeng Zhu1
U. S. National Library of Medicine NLM Indexing Initiative Tools for NLP: MetaMap and the Medical Text Indexer Natural Language Processing: State of the.
Codifying Semantic Information in Medical Questions Using Lexical Sources Paul E. Pancoast Arthur B. Smith Chi-Ren Shyu.
Retrieval of Similar Electronic Health Records using UMLS Concept Graphs Laura Plaza and Alberto Díaz Universidad Complutense de Madrid.
1 Question Answering in Biomedicine Student: Andreea Tutos Id: Supervisor: Diego Molla.
Intelligent Information Directory System for Clinical Documents Qinghua Zou 6/3/2005 Dr. Wesley W. Chu (Advisor)
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dörre, Peter Gerstl, and Roland Seiffert Presented By: Jake Happs,
HIKM’2006AMTEx Automatic Document Indexing in Large Medical Collections Angelos Hliaoutakis, Kalliopi Zervanou, Euripides G.M. Petrakis Technical University.
Semantic Interpretation of Medical Text Barbara Rosario, SIMS Steve Tu, UC Berkeley Advisor: Marti Hearst, SIMS.
HIKM’2006AMTEx Automatic Document Indexing in Large Medical Collections Angelos Hliaoutakis, Kalliopi Zervanou, Euripides G.M. Petrakis Technical University.
Mining Officially Unrecognized Side effects of drugs by combining Web Search and Machine learning Carlo Carino, Yuanyuan Jia, Bruce Lambert, Patricia West.
B IOMEDICAL T EXT M INING AND ITS A PPLICATION IN C ANCER R ESEARCH Henry Ikediego
Unified Medical Language System® (UMLS®) NLM Presentation Theater MLA 2007 National Library of Medicine National Institutes of Health U.S. Dept. of Health.
Automated Classification of Medical Questions Using Semantic Parsing Techniques Paul E. Pancoast, MD Arthur B. Smith, MS Chi-Ren Shyu, PhD University of.
BioNLP, Information Extraction from Radiology Reports
1 Betsy L. Humphreys, MLS Betsy L. Humphreys, MLS National Library of Medicine National Library of Medicine National Institutes of Health National Institutes.
How to do a literature search Saharuddin Ahmad Aida Jaffar Department of Family Medicine.
Lawrence Hunter & K. Bretonnel Cohen Center for Computational Pharmacology UCHSC School of Medicine Using.
Unified Medical Language System® (UMLS®) NLM Presentation Theater MLA 2005 May 16 & 17, 2005 Rachel Kleinsorge.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Session II: Scientific Publishing and Semantic Web W3C Semantic Web for Life Sciences Workshop October 27, 2004 Moderator: Alan R. Aronson.
Olivier Bodenreider Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA Experiences in visualizing and navigating biomedical.
A Case Study of ICD-11 Anatomy Value Set Extraction from SNOMED CT Guoqian Jiang, PhD ©2011 MFMER | slide-1 Division of Biomedical Statistics & Informatics,
Annual reports and feedback from UMLS licensees Kin Wah Fung MD, MSc, MA The UMLS Team National Library of Medicine Workshop on the Future of the UMLS.
Semantics and Literature 1 st HCLSIG Meeting Cambridge January 2006 Davide Zaccagnini MD, MS.
Using Text Mining and Natural Language Processing for Health Care Claims Processing Cihan ÜNAL
Intralab Workshop - Reactome CMAP Chang-Feng Quo June 29 th, 2006.
1 st June 2006 St. George’s University of LondonSlide 1 Using UMLS to map from a Library to a Clinical Classification: Improving the Functionality of a.
Survey of Medical Informatics CS 493 – Fall 2004 September 27, 2004.
Flexible Text Mining using Interactive Information Extraction David Milward
Using the UMLS MetaMap as a Cause of Death Analyzer Michael Hogarth, MD Michael Resendez, MS Univ. of California, Davis.
Lexical Tools Briefing The Lexical Systems Group NLMNLM. LHNCBC. CGSBLHNCBCCGSB June, 2006.
Discovery from Linking Open Data (LOD) Annotated Datasets Louiqa Raschid University of Maryland PAnG/PSL/ANAPSID/Manjal.
UMLS Unified Medical Language System. What is UMLS? A Unified knowledge representation system Project of NLM Large scale Distributed First launched in.
Knowledge-Based Semantic Interpretation for Summarizing Biomedical Text Thomas C. Rindflesch, Ph.D. Marcelo Fiszman, M.D., Ph.D. Halil Kilicoglu, M.S.
Correlating Knowledge Using NLP: Relationships between the concepts of blood cancers, stem cell transplantation, and biomarkers Katy Zou and Weizhong Zhu.
The Gene Ontology and its insertion into UMLS Jane Lomax.
Sharing Ontologies in the Biomedical Domain Alexa T. McCray National Library of Medicine National Institutes of Health Department of Health & Human Services.
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
- EVS Overview - Biomedical Terminology and Ontology Resources Frank Hartel, Ph.D. Director, Enterprise Vocabulary Services NCI Center for Bioinformatics.
12/7/2015Page 1 Service-enabling Biomedical Research Enterprise Chapter 5 B. Ramamurthy.
Digital Libraries, Archives, and Large Data Sets Alexa T. McCray National Library of Medicine Bethesda, Maryland USA WHOI, June 3, 2004.
Unsupervised Discovery of Compound Entities for Relationship Extraction Cartic Ramakrishnan, Pablo N. Mendes Shaojun Wang, Amit P. Sheth
1 Semantic Relations for Interpreting DNA Microarray Data and for Novel Hypotheses Generation Dimitar Hristovski, 1 PhD, Andrej Kastrin, 2 Borut Peterlin,
The UMLS Semantic Network Alexa T. McCray Center for Clinical Computing Beth Israel Deaconess Medical Center Harvard Medical School
Automatic Assignment of Biomedical Categories: Toward a Generic Approach Patrick Ruch University Hospitals of Geneva, Medical Informatics Service, Geneva.
8 December 1997Industry Day Applications of SuperTagging Raman Chandrasekar.
Automatically Identifying Candidate Treatments from Existing Medical Literature Catherine Blake Information & Computer Science University.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Automatic Document Indexing in Large Medical Collections.
Oncologic Pathology in Biomedical Terminologies Challenges for Data Integration Olivier Bodenreider National Library of Medicine Bethesda, Maryland -
MetaMap UMLS Concept Mapping Program Pawel Matykiewicz and Others.
Oncology in SNOMED CT NCI Workshop The Role of Ontology in Big Cancer Data Session 3: Cancer big data and the Ontology of Disease Bethesda, Maryland May.
© University of Manchester Creative Commons Attribution-NonCommercial 3.0 unported 3.0 license Quality Assurance, Ontology Engineering, and Semantic Interoperability.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
The UMLS and the Semantic Web
The Unified Medical Language System Overview
Terminology problems in literature mining and NLP
CS246: Information Retrieval
Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan
Presentation transcript:

Citation Biomedical Informatics Data ➜ Information ➜ Knowledge BMI Biomedical Named Entity Recognition Ramakanth Kavuluru NLP Seminar – 8/21/2012

BMI What are named entities? The benefits of taking cholesterol lowering statin drugs outweigh the risks even among people who are likely to develop diabetes. Acute exposure to resveratrol inhibits AMPK activity in human skeletal muscle cells

BMI What are named entities? The benefits of taking cholesterol lowering statin drugs outweigh the risks even among people who are likely to develop diabetes. Acute exposure to resveratrol inhibits AMPK activity in human skeletal muscle cells Biologically Active Substance Drug Disorder Organic Chemical Enzyme Cell

BMI What are named entities? The benefits of taking cholesterol lowering statin drugs outweigh the risks even among people who are likely to develop diabetes. Acute exposure to resveratrol inhibits AMPK activity in human skeletal muscle cells Cholesterol lowering drugsDrug Biological Function

BMI Why do we need to extract them? To provide effective semantic search – Find all discharge summaries of patients that have a history of diabetes and obesity and have taken statins as part of their treatment. – Find all biomedical articles that discuss the dopamine neurotransmitter in the context of depressive disorders. Clinical Trial Recruitment Literature Review

BMI Why do we need to extract them? To use as features in machine learning for effective text classification To build semantic clusters of textual documents to understand evolving themes Reduce noise by avoiding key words that are not indicative of the classes or clusters Recently, as a first step in relation extraction and hence in knowledge discovery

BMI A major task in text mining Extract information from textual data Use this information to solve problems What type of information? – relevant concepts - a medical condition or finding, a drug, a gene or protein, an emotion (hope, love, …) – Relevant (binary) relations – drug TREATS a condition, protein CAUSES a disease What are the typical questions? – Does a pathology report indicate a reportable case? – Which patients satisfy the criteria for a clinical trial?

BMI Knowledge Discovery VIP Peptide – increases – Catecholamine Biosynthesis Catecholamines – induce – β-adrenergic receptor activity β-adrenergic receptors – are involved – fear conditioning VIP Peptide – affects – fear conditioning ????? In Cattle In Rats In Humans

BMI Clinical NER Concept TypeAttributes Disorder/Symptom Medication Procedures Present/historical/absent, Acute? Uncertain? Present/historical/future

BMI Why is NER Hard?

BMI Linguistic Variation Derivational variation: cranial, cranium Inflectional variation: coughed, coughing Synonymy – nuerofibromin 2, merlin, NF2 protein, and schwannomin. – Addison’s disease, adrenal insufficiency, hypocortisolism, bronzed disease – Feeding problems in newborn – The mother said she was having trouble feeding the baby.

BMI Polysemy Merlin – both a bird and protein in UMLS Discharge – Patient was prescribed codeine upon discharge – The discharge was yellow and purulent Abbreviations – APC: Activated protein C, Adenomatosis polyposis coli, antigen presenting cell, aerobic plate count, advanced pancreatic cancer, age period cohort, antibody producing cells, atrial premature complex

BMI Negation Nearly half of all clinical concepts in dictated narratives are negated – There is no maxillary sinus tenderness Implied absence without negation – Lungs are clear upon auscultation So, – Rales: Absent – Rhonchi: Absent – Wheezing: Absent

BMI Controlled Terminologies Controlled vocabularies or taxonomies – Gene Ontology (gene products) most cited, 450 per year in PubMed Total of terms – SNOMED CT (about 300K+ concepts) – NCI Thesaurus, ICD-9/10, ICD-0-3, LOINC, MedlinePlus – UMLS Metathesaurus (integration of 140+ vocabularies) 2.3 million concepts

BMI more Metathesaurus CUIs LUIs SUIs AUIs

BMI Semantic Types and Relations NLM Semantic Network, the type system behind UMLS Metathesaurus – Semantic Types (135) Semantic Types Semantic Groups (15) Semantic Groups – Semantic Relations (54) Semantic Relations Specialist Lexicon – Malaria, malarial – Hyperplasia, hyperplastic How do we extract named entities?

BMI Metamap from NLM Identify phrases: Use SPECIALIST parser Map to CUIs: Use SPECIALIST Lexicon, Metathesaurus and Semantic Network

BMI Output of syntactic analysis Syntactic Analysis – “ocular complications of myasthenia gravis” – Ocular (adj), complications (noun), of (prep), myasthenia (noun), gravis (noun) – gives noun phrases (NP): “Ocular complications” and “Myasthenia gravis” – Prepositions are ignored – In a given NP, you have a head and modifiers: Ocular (mod) and complications (head) How about “male pattern baldness”?

BMI Variant Generation

BMI Variant Generation

BMI Candidate identification Look for all variants in Metathesaurus strings and identify those candidate concepts (CUIs) that contain at least one variant as a substring Example: For ocular complication, obtain all Metathesaurus strings that contain any of the following as substrings – Optic complication – Eyes complication – Opthalmic complicated – ….

BMI Mapping and Evaluation So now we have a bunch of candidate CUIs based on presence of variants of the given phrase in Metathesaurus strings. How do we select the best candidate. Use several measures to compute a rank – Centrality (involvement of head) – Variation (average of inverse distance scores) – Coverage – Cohesivness

BMI Final Score

BMI Metamap Options Types of variants: include or exclude derivational variants Word sense disambiguation – Discharge (bodily secretion VS release the patient) Concept gaps – Obstructive apnea mapping to “obstructive sleep apnea” or “obstructive neonatal apnea” Term processing – Process the input string as a single concept, that is, don’t split it into noun phrases

BMI Output options Human readable format XML format Restrictions based on certain vocabularies: consider only ICD-9 Restrictions based on certain types: consider only pharmacological substances (i.e., drugs) DEMO TIME: Daniel Harris

BMI References An overview of Metamap: Historical Perspectives and Recent Advances, Alan Aronson and Francois Lang An overview of Metamap: Historical Perspectives and Recent Advances Effective Mapping of Biomedical Text to the UMLS Metathesaurus: The MetaMap Program, Alan Aronson Effective Mapping of Biomedical Text to the UMLS Metathesaurus: The MetaMap Program Comparison of LVG and Metamap Functionality, Alan Aronson Comparison of LVG and Metamap Functionality Lexical, Terminological, and Ontological Resources for Biological Text Mining, Olivier Bodenreider Lexical, Terminological, and Ontological Resources for Biological Text Mining