Presentation is loading. Please wait.

Presentation is loading. Please wait.

Finding High-frequent Synonyms of a Domain- specific Verb in English Sub-language of MEDLINE Abstracts Using WordNet Chun Xiao and Dietmar Rösner Institut.

Similar presentations


Presentation on theme: "Finding High-frequent Synonyms of a Domain- specific Verb in English Sub-language of MEDLINE Abstracts Using WordNet Chun Xiao and Dietmar Rösner Institut."— Presentation transcript:

1 Finding High-frequent Synonyms of a Domain- specific Verb in English Sub-language of MEDLINE Abstracts Using WordNet Chun Xiao and Dietmar Rösner Institut für Wissens- und Sprachverarbeitung (IWS), Faculty of Computer Science, University of Magdeburg, 39016 Magdeburg, Germany

2 Introduction — MEDLINE Abstract MEDLINE®: –Domain: clinical medicine, biomedicine, biological and physical sciences; –Source: articles from over 4,600 journals published throughout the world; –Coverage: abstracts are included for about 52% of the articles. PubMed®, an application of UMLS (unified medical language system), provides links within MEDLINE® to the full text of 15 clinical medical journals. –Available at: http://www.ncbi.nlm.nih.gov/PubMed/

3 Available Resources in the Experiment The test corpus consists of 800 MEDLINE abstracts extracted from the GENIA Corpus V3.0p and V3.01. -Available a t: http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/ http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/ WordNet 1.7.1

4 Extraction of a Specific Relation Inhibitory relation –Example: Secreted from activated T cells and macrophages, bone marrow-derived MIP-1 alpha/GOS19 inhibits primitive hematopoietic stem cells and appears to be involved in the homeostatic control of stem cell proliferation. Semantic annotations in the GENIA corpus:  protein_molecule  cell_type

5 5 High-frequent Verbs in the Test Corpus

6 Synonym Sets (Synsets) of Verb inhibit Synset in WordNet Sense 1 suppress, stamp down, inhibit, subdue, conquer, curb => control, hold in, hold, contain, check, curb, moderate Sense 2 inhibit => restrict, restrain, trammel, limit, bound, confine, throttle Synset in test corpus of MEDLINE abstracts Inhibit, block, prevent, etc.

7 7 Problem Occurrences of verbs in the two synsets in the test corpus of MEDLINE abstracts –WN-synonyms: suppress (69), limit (16), restrict (5) –non WN-synonyms: block (124), reduce (119), prevent(53) How can WordNet synsets and information from the corpus be combined to create domain-specific verb synsets?

8 Three Definitions Language unit — a text segment (a sentence, several sentences, or a paragraph, etc.) that expresses one semantic topic. Core word — the verb, whose synset in the test corpus is to be found out. E.g., in this test inhibit is the core word. Keyword — the word, whose corresponding verb base form is the core word. E.g., in this test inhibitor, inhibiting, and so on are keywords.

9 Example We performed an analysis of the mechanisms by which two PKC inhibitors, Calphostin C and Staurosporine, prevent the FN-induced IL-1beta response. Both inhibitors blocked the secretion of IL-1beta protein into the media of peripheral blood mononuclear cells exposed to FN. Language unit: two sentences Core word: inhibit Keyword: inhibitor (2 times) Local context: searching window size >=3 Verbs around the first keyword: perform, prevent, block, expose Verbs around the second keyword: prevent, perform, block, expose  In the following test, the language unit is selected to be the whole abstract.

10 Idea Description Assumption: The synonyms of a verb co-occur much more frequently together with the keywords of the verb than together with other words in the language unit. Method: Thus the verb chunks around the keywords are collected, from which the synonyms of the core word will be selected and filtered, using WordNet synset information. - One resource: WordNet synset information - The other resource: Local context information in the test corpus

11 Distribution of Keywords of inhibit in the Test Corpus

12 Verbs around the Keywords in the Test Corpus

13 Method Description I Expansion of WordNet Synsets (S i ) –S 1 : the verb collection of synonyms of all synonyms of the core word; –S 2 : the verb collection of synonyms of all verbs in S 1 ; –…–… Expansion of Stoplist (STOP k ) –STOP 0 : manually select 15 stop-verbs from the high- frequent verbs in the test corpus (e.g., suggest, indicate, including the high-frequent antonyms of the core word); –STOP 1 : the verb collection of synonyms of all verbs in STOP 0 ; –…–…

14 Method Description II Verb list from the corpus (V j ) Verbs around the keywords in a local context of searching window size of j are collected. Synonym candidate list (S g ) If a verb is in V j and also in S i, but not in STOP k, then add it to S g.

15 15 Evaluation Golden standard list (S G ) –A manually created synonym list, which is extracted from the test corpus. –Consist of 10 verbs with the most frequent occurrences, in which 3 verbs come directly from the WordNet synset of “inhibit”, the rest 7 verbs come from its hypernym set or the expanded list of its synonyms. Recall & Precision

16 Result  60% recall of S G 93.05% occurrences in the test corpus

17 Conclusions and Future Work Conclusions –English sublanguage of MEDLINE abstract; –The core word and its keywords were high-frequent; –Multiword verb structures were not considered yet; –Balance between recall and precision: expansion of S i and STOP k should be limited. Future works –Consideration of other WordNet information besides synsets; –Automatic creation of stoplists; –Extraction of multiword verb structures; –Utilization of syntactic information.

18 Thanks!

19 Looking forward to your questions!

20 20

21 21 Possible Errors Errors of POS tags between Adjectives Past participles Errors of manual works when selecting stop-verbs

22 22 Question or Hope Can WordNet provide the possibility for accessing multiword expressions?


Download ppt "Finding High-frequent Synonyms of a Domain- specific Verb in English Sub-language of MEDLINE Abstracts Using WordNet Chun Xiao and Dietmar Rösner Institut."

Similar presentations


Ads by Google