Download presentation
Presentation is loading. Please wait.
Published byGodwin Casey Modified over 9 years ago
1
Citation Biomedical Informatics Data ➜ Information ➜ Knowledge BMI Biomedical Named Entity Recognition Ramakanth Kavuluru NLP Seminar – 8/21/2012
2
BMI What are named entities? The benefits of taking cholesterol lowering statin drugs outweigh the risks even among people who are likely to develop diabetes. Acute exposure to resveratrol inhibits AMPK activity in human skeletal muscle cells
3
BMI What are named entities? The benefits of taking cholesterol lowering statin drugs outweigh the risks even among people who are likely to develop diabetes. Acute exposure to resveratrol inhibits AMPK activity in human skeletal muscle cells Biologically Active Substance Drug Disorder Organic Chemical Enzyme Cell
4
BMI What are named entities? The benefits of taking cholesterol lowering statin drugs outweigh the risks even among people who are likely to develop diabetes. Acute exposure to resveratrol inhibits AMPK activity in human skeletal muscle cells Cholesterol lowering drugsDrug Biological Function
5
BMI Why do we need to extract them? To provide effective semantic search – Find all discharge summaries of patients that have a history of diabetes and obesity and have taken statins as part of their treatment. – Find all biomedical articles that discuss the dopamine neurotransmitter in the context of depressive disorders. Clinical Trial Recruitment Literature Review
6
BMI Why do we need to extract them? To use as features in machine learning for effective text classification To build semantic clusters of textual documents to understand evolving themes Reduce noise by avoiding key words that are not indicative of the classes or clusters Recently, as a first step in relation extraction and hence in knowledge discovery
7
BMI A major task in text mining Extract information from textual data Use this information to solve problems What type of information? – relevant concepts - a medical condition or finding, a drug, a gene or protein, an emotion (hope, love, …) – Relevant (binary) relations – drug TREATS a condition, protein CAUSES a disease What are the typical questions? – Does a pathology report indicate a reportable case? – Which patients satisfy the criteria for a clinical trial?
8
BMI Knowledge Discovery VIP Peptide – increases – Catecholamine Biosynthesis Catecholamines – induce – β-adrenergic receptor activity β-adrenergic receptors – are involved – fear conditioning VIP Peptide – affects – fear conditioning ????? In Cattle In Rats In Humans
9
BMI Clinical NER Concept TypeAttributes Disorder/Symptom Medication Procedures Present/historical/absent, Acute? Uncertain? Present/historical/future
10
BMI Why is NER Hard?
11
BMI Linguistic Variation Derivational variation: cranial, cranium Inflectional variation: coughed, coughing Synonymy – nuerofibromin 2, merlin, NF2 protein, and schwannomin. – Addison’s disease, adrenal insufficiency, hypocortisolism, bronzed disease – Feeding problems in newborn – The mother said she was having trouble feeding the baby.
12
BMI Polysemy Merlin – both a bird and protein in UMLS Discharge – Patient was prescribed codeine upon discharge – The discharge was yellow and purulent Abbreviations – APC: Activated protein C, Adenomatosis polyposis coli, antigen presenting cell, aerobic plate count, advanced pancreatic cancer, age period cohort, antibody producing cells, atrial premature complex
13
BMI Negation Nearly half of all clinical concepts in dictated narratives are negated – There is no maxillary sinus tenderness Implied absence without negation – Lungs are clear upon auscultation So, – Rales: Absent – Rhonchi: Absent – Wheezing: Absent
14
BMI Controlled Terminologies Controlled vocabularies or taxonomies – Gene Ontology (gene products) most cited, 450 per year in PubMed Total of 33000+ terms – SNOMED CT (about 300K+ concepts) – NCI Thesaurus, ICD-9/10, ICD-0-3, LOINC, MedlinePlus – UMLS Metathesaurus (integration of 140+ vocabularies) 2.3 million concepts
15
BMI more Metathesaurus CUIs LUIs SUIs AUIs
16
BMI Semantic Types and Relations NLM Semantic Network, the type system behind UMLS Metathesaurus – Semantic Types (135) Semantic Types Semantic Groups (15) Semantic Groups – Semantic Relations (54) Semantic Relations Specialist Lexicon – Malaria, malarial – Hyperplasia, hyperplastic How do we extract named entities?
17
BMI Metamap from NLM Identify phrases: Use SPECIALIST parser Map to CUIs: Use SPECIALIST Lexicon, Metathesaurus and Semantic Network
18
BMI Output of syntactic analysis Syntactic Analysis – “ocular complications of myasthenia gravis” – Ocular (adj), complications (noun), of (prep), myasthenia (noun), gravis (noun) – gives noun phrases (NP): “Ocular complications” and “Myasthenia gravis” – Prepositions are ignored – In a given NP, you have a head and modifiers: Ocular (mod) and complications (head) How about “male pattern baldness”?
19
BMI Variant Generation
20
BMI Variant Generation
21
BMI Candidate identification Look for all variants in Metathesaurus strings and identify those candidate concepts (CUIs) that contain at least one variant as a substring Example: For ocular complication, obtain all Metathesaurus strings that contain any of the following as substrings – Optic complication – Eyes complication – Opthalmic complicated – ….
22
BMI Mapping and Evaluation So now we have a bunch of candidate CUIs based on presence of variants of the given phrase in Metathesaurus strings. How do we select the best candidate. Use several measures to compute a rank – Centrality (involvement of head) – Variation (average of inverse distance scores) – Coverage – Cohesivness
23
BMI Final Score
24
BMI Metamap Options Types of variants: include or exclude derivational variants Word sense disambiguation – Discharge (bodily secretion VS release the patient) Concept gaps – Obstructive apnea mapping to “obstructive sleep apnea” or “obstructive neonatal apnea” Term processing – Process the input string as a single concept, that is, don’t split it into noun phrases
25
BMI Output options Human readable format XML format Restrictions based on certain vocabularies: consider only ICD-9 Restrictions based on certain types: consider only pharmacological substances (i.e., drugs) DEMO TIME: Daniel Harris
26
BMI References An overview of Metamap: Historical Perspectives and Recent Advances, Alan Aronson and Francois Lang An overview of Metamap: Historical Perspectives and Recent Advances Effective Mapping of Biomedical Text to the UMLS Metathesaurus: The MetaMap Program, Alan Aronson Effective Mapping of Biomedical Text to the UMLS Metathesaurus: The MetaMap Program Comparison of LVG and Metamap Functionality, Alan Aronson Comparison of LVG and Metamap Functionality Lexical, Terminological, and Ontological Resources for Biological Text Mining, Olivier Bodenreider Lexical, Terminological, and Ontological Resources for Biological Text Mining
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.