Finding High-frequent Synonyms of a Domain- specific Verb in English Sub-language of MEDLINE Abstracts Using WordNet Chun Xiao and Dietmar Rösner Institut.

Slides:



Advertisements
Similar presentations
THE STEPS OF SEARCH You have opened a new veterinary clinic in a small town, and want people in the vicinity to know about it. You need some new ideas.
Advertisements

Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.
Using Link Grammar and WordNet on Fact Extraction for the Travel Domain.
PubMed and its search options Jan Emmerich, Sonja Jacobi, Kerstin Müller (5th Semester Library Management)
HINTS on using literature databases Louis Volkers & Gerdien de Jonge Medical Library - Erasmus MC phone
BioContrasts: Extracting and Exploiting Protein-protein Contrastive Relations from Biomedical Literature Jung-jae Kim 1, Zhuo Zhang 2, Jong C. Park 1 and.
D ETERMINING THE S ENTIMENT OF O PINIONS Presentation by Md Mustafizur Rahman (mr4xb) 1.
LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: From EMNLP.
1 Question Answering in Biomedicine Student: Andreea Tutos Id: Supervisor: Diego Molla.
How to Make Manual Conjunctive Normal Form Queries Work in Patent Search Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science.
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
Detecting Multiword Verbs (MWVs) in MEDLINE Abstracts Chun Xiao and Dietmar Rösner Institut für Wissens- und Sprachverarbeitung, Otto-von-Guericke-Universität.
Word Sense Disambiguation Using Semantic Graph (Narayan Unny and Pushpak Bhattacharyya) A presentation by Ranjini Swaminathan University of Arizona.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Automatic Classification of Semantic Relations between Facts and Opinions Koji Murakami, Eric Nichols, Junta Mizuno, Yotaro Watanabe, Hayato Goto, Megumi.
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
Detection of Relations in Textual Documents Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing.
Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text Soo-Min Kim and Eduard Hovy USC Information Sciences Institute 4676.
Mining and Summarizing Customer Reviews
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
UAM CorpusTool: An Overview Debopam Das Discourse Research Group Department of Linguistics Simon Fraser University Feb 5, 2014.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
WordNet ® and its Java API ♦ Introduction to WordNet ♦ WordNet API for Java Name: Hao Li Uni: hl2489.
Bio-Medical Information Retrieval from Net By Sukhdev Singh.
Nursing 386. Your Assignment:  Summarize two research articles that address the clinical issue. Acquire these articles by searching various databases.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
Czech-English Word Alignment Ondřej Bojar Magdalena Prokopová
1 Query Operations Relevance Feedback & Query Expansion.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
Quality Control for Wordnet Development in BalkaNet Pavel Smrž Faculty of Informatics, Masaryk University in Brno, Czech.
10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.
Efficiently Computed Lexical Chains As an Intermediate Representation for Automatic Text Summarization H.G. Silber and K.F. McCoy University of Delaware.
Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.
Relevance Detection Approach to Gene Annotation Aid to automatic annotation of databases Annotation flow –Extraction of molecular function of a gene from.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
Semiautomatic domain model building from text-data Petr Šaloun Petr Klimánek Zdenek Velart Petr Šaloun Petr Klimánek Zdenek Velart SMAP 2011, Vigo, Spain,
GUIDE : PROF. PUSHPAK BHATTACHARYYA Bilingual Terminology Mining BY: MUNISH MINIA (07D05016) PRIYANK SHARMA (07D05017)
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Spanish FrameNet Project Autonomous University of Barcelona Marc Ortega.
Detection of Links between Words in the Task of Syntactic-Semantic Analysis of Russian Texts. Dmitry V. Merkuryev Saint-Petersburg State University, Russia.
Natural Language Processing for Information Retrieval -KVMV Kiran ( )‏ -Neeraj Bisht ( )‏ -L.Srikanth ( )‏
WordNet Enhancements: Toward Version 2.0 WordNet Connectivity Derivational Connections Disambiguated Definitions Topical Connections.
An Iterative Approach to Extract Dictionaries from Wikipedia for Under-resourced Languages G. Rohit Bharadwaj Niket Tandon Vasudeva Varma Search and Information.
Layered MorphoSaurus Lexicon Extension. Problem Confuse and arbitrary synonym classes of non-medical concepts High ambiguity of general (non- terminological)
Distribution of information in biomedical abstracts and full- text publications M. J. Schuemie et al. Dept. of Medical Informatics, Erasmus University.
Collocations and Terminology Vasileios Hatzivassiloglou University of Texas at Dallas.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
Information Retrieval
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.
Using Wikipedia for Hierarchical Finer Categorization of Named Entities Aasish Pappu Language Technologies Institute Carnegie Mellon University PACLIC.
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
Automatically Identifying Candidate Treatments from Existing Medical Literature Catherine Blake Information & Computer Science University.
English II Terms 1 Context Clues - words or phrases in a sentence or a paragraph that are understood and can be used to determine a word or phrase that.
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Automatic Document Indexing in Large Medical Collections.
Word Sense and Subjectivity (Coling/ACL 2006) Janyce Wiebe Rada Mihalcea University of Pittsburgh University of North Texas Acknowledgements: This slide.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Genomics research paper presentation
CS 430: Information Discovery
Multimedia Information Retrieval
WordNet: A Lexical Database for English
Extracting Semantic Concept Relations
Bulgarian WordNet Svetla Koeva Institute for Bulgarian Language
Introduction Task: extracting relational facts from text
By Hossein Hematialam and Wlodek Zadrozny Presented by
Presentation transcript:

Finding High-frequent Synonyms of a Domain- specific Verb in English Sub-language of MEDLINE Abstracts Using WordNet Chun Xiao and Dietmar Rösner Institut für Wissens- und Sprachverarbeitung (IWS), Faculty of Computer Science, University of Magdeburg, Magdeburg, Germany

Introduction — MEDLINE Abstract MEDLINE®: –Domain: clinical medicine, biomedicine, biological and physical sciences; –Source: articles from over 4,600 journals published throughout the world; –Coverage: abstracts are included for about 52% of the articles. PubMed®, an application of UMLS (unified medical language system), provides links within MEDLINE® to the full text of 15 clinical medical journals. –Available at:

Available Resources in the Experiment The test corpus consists of 800 MEDLINE abstracts extracted from the GENIA Corpus V3.0p and V Available a t: WordNet 1.7.1

Extraction of a Specific Relation Inhibitory relation –Example: Secreted from activated T cells and macrophages, bone marrow-derived MIP-1 alpha/GOS19 inhibits primitive hematopoietic stem cells and appears to be involved in the homeostatic control of stem cell proliferation. Semantic annotations in the GENIA corpus:  protein_molecule  cell_type

5 High-frequent Verbs in the Test Corpus

Synonym Sets (Synsets) of Verb inhibit Synset in WordNet Sense 1 suppress, stamp down, inhibit, subdue, conquer, curb => control, hold in, hold, contain, check, curb, moderate Sense 2 inhibit => restrict, restrain, trammel, limit, bound, confine, throttle Synset in test corpus of MEDLINE abstracts Inhibit, block, prevent, etc.

7 Problem Occurrences of verbs in the two synsets in the test corpus of MEDLINE abstracts –WN-synonyms: suppress (69), limit (16), restrict (5) –non WN-synonyms: block (124), reduce (119), prevent(53) How can WordNet synsets and information from the corpus be combined to create domain-specific verb synsets?

Three Definitions Language unit — a text segment (a sentence, several sentences, or a paragraph, etc.) that expresses one semantic topic. Core word — the verb, whose synset in the test corpus is to be found out. E.g., in this test inhibit is the core word. Keyword — the word, whose corresponding verb base form is the core word. E.g., in this test inhibitor, inhibiting, and so on are keywords.

Example We performed an analysis of the mechanisms by which two PKC inhibitors, Calphostin C and Staurosporine, prevent the FN-induced IL-1beta response. Both inhibitors blocked the secretion of IL-1beta protein into the media of peripheral blood mononuclear cells exposed to FN. Language unit: two sentences Core word: inhibit Keyword: inhibitor (2 times) Local context: searching window size >=3 Verbs around the first keyword: perform, prevent, block, expose Verbs around the second keyword: prevent, perform, block, expose  In the following test, the language unit is selected to be the whole abstract.

Idea Description Assumption: The synonyms of a verb co-occur much more frequently together with the keywords of the verb than together with other words in the language unit. Method: Thus the verb chunks around the keywords are collected, from which the synonyms of the core word will be selected and filtered, using WordNet synset information. - One resource: WordNet synset information - The other resource: Local context information in the test corpus

Distribution of Keywords of inhibit in the Test Corpus

Verbs around the Keywords in the Test Corpus

Method Description I Expansion of WordNet Synsets (S i ) –S 1 : the verb collection of synonyms of all synonyms of the core word; –S 2 : the verb collection of synonyms of all verbs in S 1 ; –…–… Expansion of Stoplist (STOP k ) –STOP 0 : manually select 15 stop-verbs from the high- frequent verbs in the test corpus (e.g., suggest, indicate, including the high-frequent antonyms of the core word); –STOP 1 : the verb collection of synonyms of all verbs in STOP 0 ; –…–…

Method Description II Verb list from the corpus (V j ) Verbs around the keywords in a local context of searching window size of j are collected. Synonym candidate list (S g ) If a verb is in V j and also in S i, but not in STOP k, then add it to S g.

15 Evaluation Golden standard list (S G ) –A manually created synonym list, which is extracted from the test corpus. –Consist of 10 verbs with the most frequent occurrences, in which 3 verbs come directly from the WordNet synset of “inhibit”, the rest 7 verbs come from its hypernym set or the expanded list of its synonyms. Recall & Precision

Result  60% recall of S G 93.05% occurrences in the test corpus

Conclusions and Future Work Conclusions –English sublanguage of MEDLINE abstract; –The core word and its keywords were high-frequent; –Multiword verb structures were not considered yet; –Balance between recall and precision: expansion of S i and STOP k should be limited. Future works –Consideration of other WordNet information besides synsets; –Automatic creation of stoplists; –Extraction of multiword verb structures; –Utilization of syntactic information.

Thanks!

Looking forward to your questions!

20

21 Possible Errors Errors of POS tags between Adjectives Past participles Errors of manual works when selecting stop-verbs

22 Question or Hope Can WordNet provide the possibility for accessing multiword expressions?