An Unsupervised Approach to Biomedical Term Disambiguation: Integrating UMLS and Medline Bridget T McInnes University of Minnesota Twin Cities Background.

Slides:



Advertisements
Similar presentations
Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes¹ Ted Pedersen² and Serguei V. Pakhomov¹ University of Minnesota¹.
Advertisements

U. S. National Library of Medicine NLM Indexing Initiative Tools for NLP: MetaMap and the Medical Text Indexer Natural Language Processing: State of the.
Title Course opinion mining methodology for knowledge discovery, based on web social media Authors Sotirios Kontogiannis Ioannis Kazanidis Stavros Valsamidis.
K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov
1 Representing Meaning in Unsupervised Word Sense Disambiguation Bridget T. McInnes 5 September 2008 University of Minnesota Twin Cities.
1 Duluth Word Alignment System Bridget Thomson McInnes Ted Pedersen University of Minnesota Duluth Computer Science Department 31 May 2003.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Presented by Zeehasham Rasheed
J. Chen, O. R. Zaiane and R. Goebel An Unsupervised Approach to Cluster Web Search Results based on Word Sense Communities.
1 Complementarity of Lexical and Simple Syntactic Features: The SyntaLex Approach to S ENSEVAL -3 Saif Mohammad Ted Pedersen University of Toronto, Toronto.
1 Discriminating Word Senses Using McQuitty’s Similarity Analysis Amruta Purandare University of Minnesota, Duluth Advisor : Dr Ted Pedersen Research supported.
Word sense induction using continuous vector space models
Evaluation of Utility of LSA for Word Sense Discrimination Esther Levin, Mehrbod Sharifi, Jerry Ball
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
NLM-Semantic Medline Data Science Data Publication Commons Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community Data.
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
Automated Classification of Medical Questions Using Semantic Parsing Techniques Paul E. Pancoast, MD Arthur B. Smith, MS Chi-Ren Shyu, PhD University of.
Datamining MEDLINE for Topics and Trends in Dental and Craniofacial Research William C. Bartling, D.D.S. NIDCR/NLM Fellow in Dental Informatics Center.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
Unified Medical Language System® (UMLS®) NLM Presentation Theater MLA 2005 May 16 & 17, 2005 Rachel Kleinsorge.
“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.
Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning Author: Chaitanya Chemudugunta America Holloway Padhraic Smyth.
Short Text Understanding Through Lexical-Semantic Analysis
Name : Emad Zargoun Id number : EASTERN MEDITERRANEAN UNIVERSITY DEPARTMENT OF Computing and technology “ITEC547- text mining“ Prof.Dr. Nazife Dimiriler.
Olivier Bodenreider Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA Experiences in visualizing and navigating biomedical.
Annual reports and feedback from UMLS licensees Kin Wah Fung MD, MSc, MA The UMLS Team National Library of Medicine Workshop on the Future of the UMLS.
Knowledge Representation and Indexing Using the Unified Medical Language System Kenneth Baclawski* Joseph “Jay” Cigna* Mieczyslaw M. Kokar* Peter Major.
1 st June 2006 St. George’s University of LondonSlide 1 Using UMLS to map from a Library to a Clinical Classification: Improving the Functionality of a.
Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
Shelly Warwick, MLS, Ph.D – Permission is granted to reproduce and edit this work for non-commercial educational use as long as attribution is provided.
2014 EMNLP Xinxiong Chen, Zhiyuan Liu, Maosong Sun State Key Laboratory of Intelligent Technology and Systems Tsinghua National Laboratory for Information.
Unsupervised Word Sense Disambiguation REU, Summer, 2009.
HyperLex: lexical cartography for information retrieval Jean Veronis Presented by: Siddhanth Jain( ) Samiulla Shaikh( )
Extracting Keyphrases to Represent Relations in Social Networks from Web Junichiro Mori and Mitsuru Ishizuka Universiry of Tokyo Yutaka Matsuo National.
Evgeniy Gabrilovich and Shaul Markovitch
Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒.
Comparing Frequency of Content- Bearing Words in Abstracts and Texts in Articles from Four Medical Journals: An Exploratory Study September 4, 2001 James.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and.
INFORMATION RETRIEVAL PROJECT Creation of clusters of concepts that represent a domain corpus.
Insurance Network Fees and Claims InterAccess, Inc.
1 Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling.
Contextual Text Cube Model and Aggregation Operator for Text OLAP
Second Language Learning From News Websites Word Sense Disambiguation using Word Embeddings.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
An Adaptive Learning with an Application to Chinese Homophone Disambiguation from Yue-shi Lee International Journal of Computer Processing of Oriental.
Word Sense Disambiguation Algorithms in Hindi
Medical Semantic Similarity with a Neural Language Model Dongfang Xu School of Information Using Skip-gram Model for word embedding.
Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :
Concept Grounding to Multiple Knowledge Bases via Indirect Supervision
Wei Wei, PhD, Zhanglong Ji, PhD, Lucila Ohno-Machado, MD, PhD
Bridget McInnes Ted Pedersen Serguei Pakhomov
Using UMLS CUIs for WSD in the Biomedical Domain
Category-Based Pseudowords
Statistical NLP: Lecture 9
Topic Oriented Semi-supervised Document Clustering
WordNet WordNet, WSD.
2. The Self-Term Expansion Method:
Title Introduction: Discussion & Conclusion: Methods & Results:
Statistical NLP : Lecture 9 Word Sense Disambiguation
Presentation transcript:

An Unsupervised Approach to Biomedical Term Disambiguation: Integrating UMLS and Medline Bridget T McInnes University of Minnesota Twin Cities Background and Introduction Word Sense Disambiguation is the problem of determining the appropriate sense of a word that has multiple senses. This is a problem for biomedical applications such as medical coding and indexing. We explore the question of whether biomedical knowledge sources, such as the Unified Medical Language System (UMLS) and Medline, can be used to help identify the appropriate sense of a word. To do this, we introduce an unsupervised vector approach to disambiguate words in biomedical text using contextual information from the UMLS and compare our results to Humphrey, et al. (JAMIA, 2006) and SenseClusters (Pedersen, et al. Background and Introduction Word Sense Disambiguation is the problem of determining the appropriate sense of a word that has multiple senses. This is a problem for biomedical applications such as medical coding and indexing. We explore the question of whether biomedical knowledge sources, such as the Unified Medical Language System (UMLS) and Medline, can be used to help identify the appropriate sense of a word. To do this, we introduce an unsupervised vector approach to disambiguate words in biomedical text using contextual information from the UMLS and compare our results to Humphrey, et al. (JAMIA, 2006) and SenseClusters (Pedersen, et al. Algorithm UMLS Extract Context for Possible Concepts Extract Context for Possible Concepts Medline (Training Data) Medline (Training Data) Test Data NLM-WSD Results Conclusion  The CUI —> ST definition obtains the highest accuracy when compared to other context definitions  Our approach makes for disambiguation distinctions for words that have the same ST, unlike Humphrey et al.  Our approach can be used to perform all-words disambiguation, unlike SenseClusters Conclusion  The CUI —> ST definition obtains the highest accuracy when compared to other context definitions  Our approach makes for disambiguation distinctions for words that have the same ST, unlike Humphrey et al.  Our approach can be used to perform all-words disambiguation, unlike SenseClusters C : Mole, unit of measurement It is the amount of substance that contains as many elementary units as there are atoms in kg of carbon-12. C : Melanocytic nevus A benign growth on the skin that contains a cluster of melanocytes and surrounding supportive tissue. C : Mole, unit of measurement It is the amount of substance that contains as many elementary units as there are atoms in kg of carbon-12. C : Melanocytic nevus A benign growth on the skin that contains a cluster of melanocytes and surrounding supportive tissue. Extract Possible Concepts Three vectors C vector: amount 4 substance 4 elementary 8 units 12 atoms 32 carbon-12 3 benign 0 growth 0 skin 0 cluster 0 melanocytes 0 tissue 0 C vector: amount 0 substance 0 elementary 0 units 0 atoms 0 carbon-12 0 benign 10 growth 12 skin 34 cluster 11 melanocytes 5 tissue 6 Target word vector: amount 0 substance 4 elementary 0 units 0 atoms 0 carbon-12 0 benign 0 growth 0 skin 0 cluster 0 melanocytes 0 tissue 0 Three vectors C vector: amount 4 substance 4 elementary 8 units 12 atoms 32 carbon-12 3 benign 0 growth 0 skin 0 cluster 0 melanocytes 0 tissue 0 C vector: amount 0 substance 0 elementary 0 units 0 atoms 0 carbon-12 0 benign 10 growth 12 skin 34 cluster 11 melanocytes 5 tissue 6 Target word vector: amount 0 substance 4 elementary 0 units 0 atoms 0 carbon-12 0 benign 0 growth 0 skin 0 cluster 0 melanocytes 0 tissue 0 EXAMPLE: Disambiguating mole Instance: He calculated three moles of the substance in the first sample and five in the second. Data and Resources  National Library of Medicine WSD dataset  Conflate Dataset  actin - antigens (a_a)  angiotensin II – olgomycin (a_o)  endogenous – extracellular matrix (e_e)  allogenic – arginine – ischemic (a_a_i)  X chromosome – peptide – plasmid (x_p_p)  diacetate – apamin – meatus – enterocyte (d_a_m_e)  CuiTools Software Package version 0.13  Data and Resources  National Library of Medicine WSD dataset  Conflate Dataset  actin - antigens (a_a)  angiotensin II – olgomycin (a_o)  endogenous – extracellular matrix (e_e)  allogenic – arginine – ischemic (a_a_i)  X chromosome – peptide – plasmid (x_p_p)  diacetate – apamin – meatus – enterocyte (d_a_m_e)  CuiTools Software Package version 0.13  Calculate Cosine Concept of Target Word Vectors of Possible Concepts Vector of Target Word Possible Concepts and their context Create Vectors Conflate Results Training Data... was around 1 mole mole dose of angiotensin large mole with brown... Training Data... was around 1 mole mole dose of angiotensin large mole with brown... Test Data He calculated three mole of the substance in the first sample and five in the second. Test Data He calculated three mole of the substance in the first sample and five in the second. Calculate the Cosine Θ² Θ¹ Assign Sense He calculated three mole of the substance in the first sample and five in the second. Assign Sense He calculated three mole of the substance in the first sample and five in the second. Create Vectors Context  Context of Possible Concepts:  Definition of possible concepts Concept Unique Identifier (CUI)  Definition of possible concepts Semantic Types (ST)  Definition of possible concepts CUI unless one does not exist then use the definition of its ST (CUI->ST)  Definition of possible concepts CUI and ST (CUI+ST) Context  Context of Possible Concepts:  Definition of possible concepts Concept Unique Identifier (CUI)  Definition of possible concepts Semantic Types (ST)  Definition of possible concepts CUI unless one does not exist then use the definition of its ST (CUI->ST)  Definition of possible concepts CUI and ST (CUI+ST) Acknowledgements  Ted Pedersen, University of Minnesota Duluth  John Carlis, University of Minnesota Twin Cities Acknowledgements  Ted Pedersen, University of Minnesota Duluth  John Carlis, University of Minnesota Twin Cities