K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget.

Slides:



Advertisements
Similar presentations
A Robust Approach to Aligning Heterogeneous Lexical Resources Mohammad Taher Pilehvar Roberto Navigli MultiJEDI ERC
Advertisements

Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes¹ Ted Pedersen² and Serguei V. Pakhomov¹ University of Minnesota¹.
Improved TF-IDF Ranker
Semantic News Recommendation Using WordNet and Bing Similarities 28th Symposium On Applied Computing 2013 (SAC 2013) March 21, 2013 Michel Capelle
Using Semantic Similarity Measures in the Biomedical Domain for Computing Similarity between Genes based on Gene Ontology By : Elham Khabiri Adviser :
U. S. National Library of Medicine NLM Indexing Initiative Tools for NLP: MetaMap and the Medical Text Indexer Natural Language Processing: State of the.
Measures of Text Similarity
Codifying Semantic Information in Medical Questions Using Lexical Sources Paul E. Pancoast Arthur B. Smith Chi-Ren Shyu.
LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: From EMNLP.
Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov
1 Representing Meaning in Unsupervised Word Sense Disambiguation Bridget T. McInnes 5 September 2008 University of Minnesota Twin Cities.
An Unsupervised Approach to Biomedical Term Disambiguation: Integrating UMLS and Medline Bridget T McInnes University of Minnesota Twin Cities Background.
June 19-21, 2006WMS'06, Chania, Crete1 Design and Evaluation of Semantic Similarity Measures for Concepts Stemming from the Same or Different Ontologies.
1 Complementarity of Lexical and Simple Syntactic Features: The SyntaLex Approach to S ENSEVAL -3 Saif Mohammad Ted Pedersen University of Toronto, Toronto.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
Using Information Content to Evaluate Semantic Similarity in a Taxonomy Presenter: Cosmin Adrian Bejan Philip Resnik Sun Microsystems Laboratories.
A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora Benjamin Arai Computer Science and Engineering Department.
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
Automated Classification of Medical Questions Using Semantic Parsing Techniques Paul E. Pancoast, MD Arthur B. Smith, MS Chi-Ren Shyu, PhD University of.
Indexing 1/2 BDK12-3 Information Retrieval William Hersh, MD Department of Medical Informatics & Clinical Epidemiology Oregon Health & Science University.
Personalisation Seminar on Unlocking the Secrets of the Past: Text Mining for Historical Documents Sven Steudter.
Learning Information Extraction Patterns Using WordNet Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield,
Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.
C OLLECTIVE ANNOTATION OF WIKIPEDIA ENTITIES IN WEB TEXT - Presented by Avinash S Bharadwaj ( )
Intelligent Database Systems Lab Presenter : BEI-YI JIANG Authors : UNIVERSIT´E CATHOLIQUE DE LOUVAIN, BELGIUM ASSOCIATION FOR COMPUTING MACHINERY.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
Unified Medical Language System® (UMLS®) NLM Presentation Theater MLA 2005 May 16 & 17, 2005 Rachel Kleinsorge.
Session II: Scientific Publishing and Semantic Web W3C Semantic Web for Life Sciences Workshop October 27, 2004 Moderator: Alan R. Aronson.
1 st June 2006 St. George’s University of LondonSlide 1 Using UMLS to map from a Library to a Clinical Classification: Improving the Functionality of a.
Survey of Medical Informatics CS 493 – Fall 2004 September 27, 2004.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur.
W ORD S ENSE D ISAMBIGUATION By Mahmood Soltani Tehran University 2009/12/24 1.
Combining terminology resources and statistical methods for entity recognition: an evaluation Angus Roberts, Robert Gaizauskas, Mark Hepple, Yikun Guo.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.
A S URVEY ON I NFORMATION E XTRACTION FROM D OCUMENTS U SING S TRUCTURES OF S ENTENCES Chikayama Taura Lab. M1 Mitsuharu Kurita 1.
UMLS Unified Medical Language System. What is UMLS? A Unified knowledge representation system Project of NLM Large scale Distributed First launched in.
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Knowledge-Based Semantic Interpretation for Summarizing Biomedical Text Thomas C. Rindflesch, Ph.D. Marcelo Fiszman, M.D., Ph.D. Halil Kilicoglu, M.S.
Semantics-Based News Recommendation with SF-IDF+ International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013) June 13, 2013 Marnix Moerland.
Retrieval of Highly Related Biomedical References by Key Passages of Citations Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan.
Using Semantic Relatedness for Word Sense Disambiguation
Distribution of information in biomedical abstracts and full- text publications M. J. Schuemie et al. Dept. of Medical Informatics, Erasmus University.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
Presented By- Shahina Ferdous, Student ID – , Spring 2010.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
1 Measuring the Semantic Similarity of Texts Author : Courtney Corley and Rada Mihalcea Source : ACL-2005 Reporter : Yong-Xiang Chen.
1 Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling.
Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation Bioinformatics, July 2003 P.W.Load,
Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Semantic Evaluation of Machine Translation Billy Wong, City University of Hong Kong 21 st May 2010.
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
Metaphrase EVS API Presented to the National Cancer Institute (NCI) Office of Informatics (OI) Noel Ramathal 06/06/2000.
Medical Semantic Similarity with a Neural Language Model Dongfang Xu School of Information Using Skip-gram Model for word embedding.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Wei Wei, PhD, Zhanglong Ji, PhD, Lucila Ohno-Machado, MD, PhD
Bridget McInnes Ted Pedersen Serguei Pakhomov
Using UMLS CUIs for WSD in the Biomedical Domain
Category-Based Pseudowords
Statistical NLP: Lecture 9
Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou
Unsupervised Word Sense Disambiguation Using Lesk algorithm
Statistical NLP : Lecture 9 Word Sense Disambiguation
Presentation transcript:

K NOWLEDGE - BASED M ETHOD FOR D ETERMINING THE M EANING OF A MBIGUOUS B IOMEDICAL T ERMS U SING I NFORMATION C ONTENT M EASURES OF S IMILARITY Bridget McInnes Ted Pedersen Ying Liu Genevieve B. Melton Serguei Pakhomov 1

O BJECTIVE OF THIS WORK Develop and evaluate a method than can disambiguate terms in biomedical text by exploiting similarity information extrapolated from the Unified Medical Language System Evaluate the efficacy of Information Content-based similarity measures over path-based similarity measures for Word Sense Disambiguation, WSD 2

W ORD S ENSE D ISAMBIGUATION Word sense disambiguation is the task of determining the appropriate sense of a term given context in which it is used. TERM: tolerance Drug Tolerance Immune Tolerance 3

W ORD S ENSE D ISAMBIGUATION Word sense disambiguation is the task of determining the appropriate sense of a term given context in which it is used. Busprione attenuates tolerance to morphine in mice with skin cancer Drug Tolerance Immune Tolerance 4

S ENSE INVENTORY : U NIFIED M EDICAL L ANGUAGE S YSTEM Unified Medical Language Sources (UMLS) Semantic Network Metathesaurus ~1.7 million biomedical and clinical concepts; integrated semi-automatically CUIs (Concept Unique Identifiers), linked: Hierarchical: PAR/CHD and RB/RN Non-hierarchical: SIB, RO Sources viewed together or independently Medical Subject Heading (MSH) SPECIALIST Lexicon Biomedical and clinical terms, including variants 5

W ORD S ENSE D ISAMBIGUATION Busprione attenuates tolerance to morphine in mice with skin cancer Drug Tolerance: C Immune Tolerance: C Concept Unique Identifiers: CUIs 6

S ENSE R ELATE ALGORITHM Each possible sense of a target word is assigned a score [sum similarity between it and its surrounding terms] Assign target word the sense with highest score Proposed by Patwardhan and Pedersen 2003 using WordNet UMLS::SenseRelate is a modification of this algorithm using information from the UMLS NEXT UP: an example 7

S ENSE R ELATE E XAMPLE Busprione attenuates tolerance to morphine in mice with skin cancer 8

S ENSE R ELATE E XAMPLE Busprione attenuates tolerance to morphine in mice with skin cancer Drug Tolerance: C Immune Tolerance: C

S ENSE R ELATE E XAMPLE Busprione attenuates tolerance to morphine in mice with skin cancer Drug Tolerance: C Immune Tolerance: C Busprione: C Morphine: C Mice: C Skin cancer: C

S ENSE R ELATE E XAMPLE Busprione attenuates tolerance to morphine in mice with skin cancer 0.09 Drug Tolerance: C Immune Tolerance: C Busprione: C Morphine: C Mice: C Skin cancer: C

S ENSE R ELATE E XAMPLE Busprione attenuates tolerance to morphine in mice with skin cancer 0.09 Drug Tolerance: C Immune Tolerance: C Busprione: C Morphine: C Mice: C Skin cancer: C Drug Tolerance Score = =

S ENSE R ELATE E XAMPLE Busprione attenuates tolerance to morphine in mice with skin cancer 0.09 Drug Tolerance: C Immune Tolerance: C Busprione: C Morphine: C Mice: C Skin cancer: C Drug Tolerance Score = =

S ENSE R ELATE E XAMPLE Busprione attenuates tolerance to morphine in mice with skin cancer 0.09 Drug Tolerance: C Immune Tolerance: C Busprione: C Morphine: C Mice: C Skin cancer: C Drug Tolerance Score = = 0.45 Immune Tolerance Score = =

S ENSE R ELATE E XAMPLE Busprione attenuates tolerance to morphine in mice with skin cancer 0.09 Drug Tolerance: C Immune Tolerance: C Busprione: C Morphine: C Mice: C Skin cancer: C Drug Tolerance Score = = 0.45 Immune Tolerance Score = =

S ENSE R ELATE A SSUMPTION An ambiguous word is often used in the sense that is most similar to the sense of the terms that surround it 16

S ENSE R ELATE C OMPONENTS Identifying the concepts of surrounding terms Calculating semantic similarity 17

I DENTIFYING THE CONCEPTS OF THE SURROUNDING TERMS Use the SPECIALIST LEXICON to identify the terms and map the terms doing a string match to the MRCONSO table in the UMLS 18

I DENTIFYING THE CONCEPTS OF THE SURROUNDING TERMS Use the SPECIALIST LEXICON to identify the terms and map the terms doing a string match to the MRCONSO table in the UMLS Busprione attenuates tolerance to morphine in mice with skin cancer 19

I DENTIFYING THE CONCEPTS OF THE SURROUNDING TERMS Use the SPECIALIST LEXICON to identify the terms and map the terms doing a string match to the MRCONSO table in the UMLS... skin cancer skin grafting skin disease... SPECIALIST LEXICON Busprione attenuates tolerance to morphine in mice with skin cancer 20

I DENTIFYING THE CONCEPTS OF THE SURROUNDING TERMS Use the SPECIALIST LEXICON to identify the terms and map the terms doing a string match to the MRCONSO table in the UMLS... skin cancer skin grafting skin disease... skin cancer C skin grafting C skin disease C SPECIALIST LEXICON MRCONSO Busprione attenuates tolerance to morphine in mice with skin cancer 21

S EMANTIC S IMILARITY M EASURES Path-based measures Path Wu and Palmer Leacock and Chodorow Ngyuen and Al-Mubaid Information content (IC)-based measures Resnik Lin Jiang and Conrath 22

P ATH - BASED SIMILARITY MEASURES Use only the path information obtained from a taxonomy 23

P ATH - BASED SIMILARITY MEASURES Use only the path information obtained from a taxonomy Path measure sim(c1,c2) = 1 / minpath(c2,c2) where minpath is the shortest path between the two concepts 24

P ATH - BASED SIMILARITY MEASURES Use only the path information obtained from a taxonomy Path measure sim(c1,c2) = 1/minpath(c2,c2) where minpath is the shortest path between the two concepts Wu and Palmer, 1994 sim(c1,c2) = (2*depth(LCS(c2,c2))) / (depth(c1)+depth(c2)) where LCS is the least common subsumer of the two concepts 25

P ATH - BASED SIMILARITY MEASURES Use only the path information obtained from a taxonomy Path measure sim(c1,c2) = 1/ minpath(c2,c2) where minpath is the shortest path between the two concepts Wu and Palmer, 1994 sim(c1,c2) = (2*depth(LCS(c2,c2))) / (depth(c1)+depth(c2)) where LCS is the least common subsumer of the two concepts Leacock and Chodorow, 1998 sim(c1,c2) = -log( minpath(c1,c2) / (2D) ) where D is the total depth of the taxonomy 26

P ATH - BASED SIMILARITY MEASURES Use only the path information obtained from a taxonomy Path measure sim(c1,c2) = 1/ minpath(c2,c2) where minpath is the shortest path between the two concepts Leacock and Chodorow, 1998 sim(c1,c2) = -log( minpath(c1,c2) / (2D) ) where D is the total depth of the taxonomy Wu and Palmer, 1994 sim(c1,c2) = (2*depth(LCS(c2,c2))) / (depth(c1)+depth(c2)) where LCS is the least common subsumer of the two concepts Nyguen and Al-Mubaid, 2006 sim(c1,c2) = log ( (2 + minpath(c1,c2) - 1) * (D - depth(LCS(c1,c2))) ) 27

P ATH - BASED S IMILARITY M EASURES USE ONLY THE PATH INFORMATION OBTAINED FROM A TAXONOMY Disease: C Drug Related Disorder: C Drug Tolerance: C Neoplasm: C Neoplastic Disease: C Malignant Neoplasm: C Skin cancer: C

I NFORMATION CONTENT - BASED M EASURES Incorporate the probability of the concepts IC = -log(P(concept)) 29

I NFORMATION CONTENT - BASED M EASURES Incorporate the probability of the concepts IC = -log(P(concept)) P(concept) Calculated by summing the probability of the concept and the probability of its descendants Probabilities are obtained from an external corpus 30

I NFORMATION CONTENT - BASED M EASURES Incorporate the probability of the concepts IC = -log(P(concept) Resnik, 1995 sim(c1,c2) = IC(LCS(c1,c2)) 31

I NFORMATION CONTENT - BASED M EASURES Incorporate the probability of the concepts IC = -log(P(concept) Resnik, 1995 sim(c1,c2) = IC(LCS(c2,c2)) Jiang and Conrath, 1997 sim(c1,c2) = 1 / (IC(c1)+IC(c2) – 2* IC(LCS(c1,c2)) 32

I NFORMATION CONTENT - BASED M EASURES Incorporate the probability of the concepts IC = -log(P(concept) Resnik, 1995 sim(c1,c2) = IC(LCS(c2,c2)) Jiang and Conrath, 1997 sim(c1,c2) = 1 ÷ (IC(c1)+IC(c2) – 2* IC(LCS(c1,c2)) Lin, 1998 sim(c1,c2) = (2*IC(LCS(c2,c2))) / (IC(c1)+IC(c2)) 33

IC- BASED SIMILARITY MEASURES Disease: C Drug Related Disorder: C Drug Tolerance: C Neoplasm: C Neoplastic Disease: C Malignant Neoplasm: C Skin cancer: C PATH INFORMATION PROBABILITY OF CONCEPTS EXTERNAL CORPUS 34

E XPERIMENTAL F RAMEWORK Use open-source UMLS::Similarity package to obtain the similarity between the terms and possible senses in the SenseRelate algorithm Path information: parent/child relations in MSH source Information content: calculated using the UMLSonMedline dataset created by NLM Consists of concepts from 2009AB UMLS and the frequency they occurred in Medline using the Essie Search Engine (Ide et al 2007) Medline: database of citations of biomedical/clinical articles 35

E VALUATION D ATA : MSH WSD MSH-WSD dataset (Jimeno-Yepes, et al 2011) 203 target words (ambiguous word) from Medline  106 termse.g. tolerance  88 acronymse.g. CA (calcium, california)  9 mixturese.g. bat (brown adipose tissue) Each target word contains ~187 instances (Medline abstracts) abstract = ~ 500 words Each target word in the instances assigned a concept from MSH by exploiting the manually assigned MSH concepts assigned to the abstract Average of 2.08 possible senses per target word Majority sense over all the target words is 54.5% 36

R ESULTS baseline path lchwup nam resjcn accuracyaccuracy Path-basedIC-based lin 37

C OMPARISON ACROSS SUBSETS OF MSH - WSD accuracyaccuracy 38

C OMPARISON ACROSS SUBSETS OF MSH - WSD accuracyaccuracy 39

C OMPARISON ACROSS SUBSETS OF MSH - WSD accuracyaccuracy 40

C OMPARISON ACROSS SUBSETS OF MSH - WSD accuracyaccuracy 41

C OMPARISON ACROSS SUBSETS OF MSH - WSD accuracyaccuracy 42

W INDOW SIZES Use the terms surrounding the target word within a specified window: 1, 2, 5, 10, 25, 50, 60, 70 Busprione attenuates tolerance to morphine in mice with skin_cancer WINDOW SIZE = 2 43

C OMPARISON OF WINDOW SIZES FOR LIN accuracyaccuracy window size 44

S URROUNDING TERMS Not all terms have a concept in the UMLS therefore Not all surrounding terms in the window mapped to CUIs 45

W INDOW SIZES VERSUS MAPPED TERMS numberofmappingsnumberofmappings window size 46

F UTURE WORK : MAPPING T ERMS Currently looking at mapping the terms to CUIs using information from the concept mapping system MetaMap Obtain the terms from MetaMap and do a dictionary look up in MRCONSO Hypothesis – the terms obtained by MetaMap are more accurate than using the SPECIALIST Lexicon Obtain the CUIs from MetaMap Hypothesis – the CUIs obtained by MetaMap will be more accurate than the dictionary look-up 47

O BJECTIVE #1 Develop and evaluate a method than can disambiguate terms in biomedical text by exploiting similarity information extrapolated from the UMLS UMLS::SenseRelate statistically significantly higher disambiguation accuracy than the baseline On par with previous unsupervised methods for terms 48

O BJECTIVE #2 Evaluate the efficacy of IC-based similarity measures over path- based measures on a secondary task There is no statistically significant difference between the accuracies obtained by the IC-based measures There is a statistically significant difference between the IC- based measures and the path-based measures 49

T AKE HOME MESSAGE : An ambiguous word is often used in the sense that is most similar to the sense of the concepts of the terms that surround it 50

R ESOURCES Software: UMLS::SenseRelate  UMLS::Similarity  Data MSH-WSD 

R ESOURCES Software: UMLS::SenseRelate  UMLS::Similarity  Data MSH-WSD  THANK YOU 52

R ESOURCES Software: UMLS::SenseRelate  UMLS::Similarity  Data MSH-WSD  QUESTIONS? 53