Presentation is loading. Please wait.

Presentation is loading. Please wait.

Paradigmatic Modifiability Statistics For the Extraction of

Similar presentations


Presentation on theme: "Paradigmatic Modifiability Statistics For the Extraction of"— Presentation transcript:

1 Paradigmatic Modifiability Statistics For the Extraction of
Complex Multi-Word Terms Joachim Wermter Udo Hahn

2 Introduction Increasing proliferation of (domain-specific) biomedical texts Incompleteness of available dictionaries and terminological vocabularies New terms are constantly introduced Need for automatic term recognition (ATR) and extraction Challenge: majority (85%) of Terms are Multi-Word More difficult to recognize than singletons ANY subfield of human research / expertise needs high-performance terminology identification methods Increasing proliferation of (domain-specific) biomedical texts Incompleteness of available dictionaries and terminological vocabularies New terms are constantly introduced Need for automatic term recognition (ATR) and extraction Challenge: majority (85%) of Terms are Multi-Word More difficult to recognize than singletons ANY subfield of human research / expertise needs high-performance terminology identification methods

3 What is a Term? 1: J Hepatol Aug 5; [Epub ahead of print]   Activation of dendritic cells by local ablation of hepatocellular carcinoma. Ali M, Grimm CF, Ritter M, Mohr L, Weth R, Bocher WO, Endrulat K, Blum HE, Geissler M Department of Medicine II, University Hospital Freiburg, D Freiburg, Germany. BACKGROUND/AIMS: Local ablation methods are an effective treatment for hepatocellular carcinoma (HCC). The rate of recurrence or development of intra-hepatic metastasis may be lowered by immune responses. Since HCCs are in general only weakly immunogenic, cell injury induced by local tumor ablation (PEI/RFTA) may increase HCC immunogenicity and may release endogenous adjuvants that activate dendritic cells (DC). The aim of the study, therefore, was the analysis whether PEI or RFTA induced injury results in an adjuvant effect for immune responses to HCCs. METHODS: Eight HCC patients were treated with PEI or RFTA and serially analyzed for 4 weeks. Plasmocytoid dentritic cells (PDC) and myeloid dendritic cells (MDC) were analyzed directly ex vivo and in vitro using FACS and proliferation assays. RESULTS: HCC ablation induced a functional transient activation of MDC but not of PDC associated with increased serum levels of TNF-alpha and IL-1beta. CONCLUSIONS: These findings suggest the combination of PEI or RFTA and active antigen specific immunotherapeutic approaches using DCs as a promising approach for the induction of sustained antitumoral immune responses aiming at the reduction of tumor recurrence and metastases in HCC patients.

4 These are Terms … 1: J Hepatol Aug 5; [Epub ahead of print]   Activation of dendritic cells by local ablation of hepatocellular carcinoma. Ali M, Grimm CF, Ritter M, Mohr L, Weth R, Bocher WO, Endrulat K, Blum HE, Geissler M Department of Medicine II, University Hospital Freiburg, D Freiburg, Germany. BACKGROUND/AIMS: Local ablation methods are an effective treatment for hepatocellular carcinoma (HCC). The rate of recurrence or development of intra-hepatic metastasis may be lowered by immune responses. Since HCCs are in general only weakly immunogenic, cell injury induced by local tumor ablation (PEI/RFTA) may increase HCC immunogenicity and may release endogenous adjuvants that activate dendritic cells (DC). The aim of the study, therefore, was the analysis whether PEI or RFTA induced injury results in an adjuvant effect for immune responses to HCCs. METHODS: Eight HCC patients were treated with PEI or RFTA and serially analyzed for 4 weeks. Plasmocytoid dentritic cells (PDC) and myeloid dendritic cells (MDC) were analyzed directly ex vivo and in vitro using FACS and proliferation assays. RESULTS: HCC ablation induced a functional transient activation of MDC but not of PDC associated with increased serum levels of TNF alpha and IL-1 beta. CONCLUSIONS: These findings suggest the combination of PEI or RFTA and active antigen specific immunotherapeutic approaches using DCs as a promising approach for the induction of sustained antitumoral immune responses aiming at the reduction of tumor recurrence and metastases in HCC patients.

5 These are not Terms … 1: J Hepatol Aug 5; [Epub ahead of print]   Activation of dendritic cells by local ablation of hepatocellular carcinoma. Ali M, Grimm CF, Ritter M, Mohr L, Weth R, Bocher WO, Endrulat K, Blum HE, Geissler M Department of Medicine II, University Hospital Freiburg, D Freiburg, Germany. BACKGROUND/AIMS: Local ablation methods are an effective treatment for hepatocellular carcinoma (HCC). The rate of recurrence or development of intra-hepatic metastasis may be lowered by immune responses. Since HCCs are in general only weakly immunogenic, cell injury induced by local tumor ablation (PEI/RFTA) may increase HCC immunogenicity and may release endogenous adjuvants that activate dendritic cells (DC). The aim of the study, therefore, was the analysis whether PEI or RFTA induced injury results in an adjuvant effect for immune responses to HCCs. METHODS: Eight HCC patients were treated with PEI or RFTA and serially analyzed for 4 weeks. Plasmocytoid dentritic cells (PDC) and myeloid dendritic cells (MDC) were analyzed directly ex vivo and in vitro using FACS and proliferation assays. RESULTS: HCC ablation induced a functional transient activation of MDC but not of PDC associated with increased serum levels of TNF-alpha and IL-1beta. CONCLUSIONS: These findings suggest the combination of PEI or RFTA and active antigen specific immunotherapeutic approaches using DCs as a promising approach for the induction of sustained antitumoral immune responses aiming at the reduction of tumor recurrence and metastases in HCC patients.

6 Related Work Limited Paradigmatic Modifiability
Term Mining: (orthomorphological) normalization, term variation, term context, term clustering, … Grading Termhood Typical procedures to obtain and grade terms: (shallow) linguistic filtering (POS, chunking, parsing, …) non-linguistic frequency- or statistically-based association measures to grade termhood (e.g. t-test, MI, C-value …) ranked output More linguistically-oriented work (e.g. Jacquemin 2001, Daille 1996) requires deep (morpho)syntactic analysis Difficult to port across domains! Limited Paradigmatic Modifiability Cross-domain linguistic property for ATR based on shallow syntactic analysis

7 Text Corpus 513,000 MEDLINE abstracts from the domain of Molecular Biology (104 million words) Shallow syntactic annotation Genia POS tagger and YamCha Chunker Most terms are inside NPs (Justeson & Katz 95) Focus on NP recognition Filtering of stop words NPs of length 2 (word bigrams), length 3 (word trigrams) and length 4 (word quadgrams) Morphological normalization of rightmost noun via full-form UMLS Specialist Lexicon

8 Term Candidates

9 Evaluating ATR Quality
Typical: Domain experts (ad hoc) identify true positives among ranked output candidates Often only one expert  no interannotator agreement Labor-intensive  small size of candidate set What constitutes a relevant term? Difficult to decide in the absence of context Alternative: Take already existing terminological resources evolved and curated through community-wide expert consensus Biomedical domain as ideal test bed for evaluating ATR algorithms: UMLS Metathesaurus Existence in UMLS is decision criterion whether candidate in candidate sets is a term or not Excluding UMLS source vocabularies not relevant for biology (e.g. Nursing, Health Care Billing, etc.)

10 True Terms in Candidate Sets
Term candidate is true positive (TP) if it is found in UMLS 2004 word bigrams types: % word trigram types: % word quadgram types: %

11 Limited Paradigmatic Modifiability
Frequency may be misleading wrt termhood Term “long terminal repeat”: 434 occurrences Non-term “t cell response”: 2410 occurrences MWUs contains n slots: e.g.: [long]slot-1 [terminal]slot-2 [repeat]slot-3 Limited P-Mod: Probability with which one or more slots cannot be filled by other tokens Or: Likelihood of precluding the appearance of alternative tokens in particular slot positions Basic assumption: terms are linguistically more fixed and show less distributional variation

12 Limited Paradigmatic Modifiability
Select k slots of an n-gram (unordered selection without putting back): Selections selk=1 n = 3 k = 1 kslot terminal repeat long kslot repeat long terminal kslot-3

13 Limited Paradigmatic Modifiability
Select k slots of an n-gram (unordered selection without putting back): Selections selk=2 n = 3 k = 2 kslot kslot repeat kslot terminal kslot-3 long kslot kslot-3

14 Limited Paradigmatic Modifiability
Select k slots of an n-gram (unordered selection without putting back): kslot-x is a placeholder for any possible word type (and its frequency) Selections selk=3 n = 3 k = 3 kslot kslot kslot-3

15 Limited Paradigmatic Modifiability
For particular k (1 ≤ k ≤ n; n = length of n-gram): Determine frequency of n-gram Scale it by frequency of each possible selection sel Product over |sel|: Particular k-modifiability: Lower freq of sel  more limited paradigmatic modifiability for particular k Paradigmatic Modifiability of an n-gram:

16 Limited Paradigmatic Modifiability
Term Non-Term

17 Methods of Evaluation UMLS as large and consensual terminology identifies TPs in our candidate sets Dynamic examination of m-highest ranked output samples Standard P/R graphs More reliable evaluation setting for ATR measures Compare P-Mod against standard algorithms: t-test (best among standard association measures for general-language collocation extraction) (cf. Evert & Krenn 2001) C-Value (widely used for ATR; cf. Frantzi et al 2000)

18 Precision/Recall for Bigrams
Steady advantage for P-Mod

19 Precision/Recall for Trigrams
80% recall at 50% output 80% recall at 63% output 80% recall at 66% output

20 Precision/Recall for Quadgrams
80% recall at 45% output 80% recall at 62% output 80% recall at 65% output

21 Significance Testing of Differences

22 Significance Testing of Differences

23 Significance Testing of Differences

24 Limited Paradigmatic Modifiability
Frequency may be misleading wrt termhood Term “long terminal repeat”: 434 occurrences Non-term “t cell response”: 2410 occurrences “long terminal repeat”: P-Mod: Rank 24 t-test: Rank 242 “t cell response”: P-Mod: Rank 1249 t-test: Rank

25 Corpus Size and Domain Independence
Performance as a Result of quite large Corpus Size (104 million words)? Some (Sub-)Domains lack plethora of free-text material e.g.: engineering, plant biology, clinical texts Experiment: Drastic reduction of corpus size down to million words Assessing terminology extraction methods

26 Precision/Recall for Trigrams
statistically significant

27 Conclusions Linguistically motivated terminology extraction method
P-Mod: Limited Paradigmatic Modifiability incorporates distributional fixedness of terms Significantly outperforms standard (non-linguistic) measures used for ATR Technical merit of P-Mod likely to be domain-independent independent of corpus size only requires shallow syntactic analysis

28 Paradigmatic Modifiability Statistics For the Extraction of
Complex Multi-Word Terms Joachim Wermter Udo Hahn

29 Appendix High-Performance ATR system also essential to update existing terminologies Concrete example: MeSH contains term “cell cycle” MeSH/GenBank suppl. contains term “cell cycle arrest protein BUB2” Term “cell cycle arrest”: Not contained in MeSH Ranked in top 10% of P-Mod Contained in GO stand-alone Hence: provide missing semantic link in UMLS

30 Significance Testing of Differences

31 Significance Testing of Differences

32 Jena University Language and Information Engineering (JULIE) Lab
Germany

33 Heading Blablabla…


Download ppt "Paradigmatic Modifiability Statistics For the Extraction of"

Similar presentations


Ads by Google