Download presentation
Presentation is loading. Please wait.
Published byAsher Waters Modified over 7 years ago
1
Paradigmatic Modifiability Statistics For the Extraction of
Complex Multi-Word Terms Joachim Wermter Udo Hahn
2
Introduction Increasing proliferation of (domain-specific) biomedical texts Incompleteness of available dictionaries and terminological vocabularies New terms are constantly introduced Need for automatic term recognition (ATR) and extraction Challenge: majority (85%) of Terms are Multi-Word More difficult to recognize than singletons ANY subfield of human research / expertise needs high-performance terminology identification methods Increasing proliferation of (domain-specific) biomedical texts Incompleteness of available dictionaries and terminological vocabularies New terms are constantly introduced Need for automatic term recognition (ATR) and extraction Challenge: majority (85%) of Terms are Multi-Word More difficult to recognize than singletons ANY subfield of human research / expertise needs high-performance terminology identification methods
3
What is a Term? 1: J Hepatol Aug 5; [Epub ahead of print] Activation of dendritic cells by local ablation of hepatocellular carcinoma. Ali M, Grimm CF, Ritter M, Mohr L, Weth R, Bocher WO, Endrulat K, Blum HE, Geissler M Department of Medicine II, University Hospital Freiburg, D Freiburg, Germany. BACKGROUND/AIMS: Local ablation methods are an effective treatment for hepatocellular carcinoma (HCC). The rate of recurrence or development of intra-hepatic metastasis may be lowered by immune responses. Since HCCs are in general only weakly immunogenic, cell injury induced by local tumor ablation (PEI/RFTA) may increase HCC immunogenicity and may release endogenous adjuvants that activate dendritic cells (DC). The aim of the study, therefore, was the analysis whether PEI or RFTA induced injury results in an adjuvant effect for immune responses to HCCs. METHODS: Eight HCC patients were treated with PEI or RFTA and serially analyzed for 4 weeks. Plasmocytoid dentritic cells (PDC) and myeloid dendritic cells (MDC) were analyzed directly ex vivo and in vitro using FACS and proliferation assays. RESULTS: HCC ablation induced a functional transient activation of MDC but not of PDC associated with increased serum levels of TNF-alpha and IL-1beta. CONCLUSIONS: These findings suggest the combination of PEI or RFTA and active antigen specific immunotherapeutic approaches using DCs as a promising approach for the induction of sustained antitumoral immune responses aiming at the reduction of tumor recurrence and metastases in HCC patients.
4
These are Terms … 1: J Hepatol Aug 5; [Epub ahead of print] Activation of dendritic cells by local ablation of hepatocellular carcinoma. Ali M, Grimm CF, Ritter M, Mohr L, Weth R, Bocher WO, Endrulat K, Blum HE, Geissler M Department of Medicine II, University Hospital Freiburg, D Freiburg, Germany. BACKGROUND/AIMS: Local ablation methods are an effective treatment for hepatocellular carcinoma (HCC). The rate of recurrence or development of intra-hepatic metastasis may be lowered by immune responses. Since HCCs are in general only weakly immunogenic, cell injury induced by local tumor ablation (PEI/RFTA) may increase HCC immunogenicity and may release endogenous adjuvants that activate dendritic cells (DC). The aim of the study, therefore, was the analysis whether PEI or RFTA induced injury results in an adjuvant effect for immune responses to HCCs. METHODS: Eight HCC patients were treated with PEI or RFTA and serially analyzed for 4 weeks. Plasmocytoid dentritic cells (PDC) and myeloid dendritic cells (MDC) were analyzed directly ex vivo and in vitro using FACS and proliferation assays. RESULTS: HCC ablation induced a functional transient activation of MDC but not of PDC associated with increased serum levels of TNF alpha and IL-1 beta. CONCLUSIONS: These findings suggest the combination of PEI or RFTA and active antigen specific immunotherapeutic approaches using DCs as a promising approach for the induction of sustained antitumoral immune responses aiming at the reduction of tumor recurrence and metastases in HCC patients.
5
These are not Terms … 1: J Hepatol Aug 5; [Epub ahead of print] Activation of dendritic cells by local ablation of hepatocellular carcinoma. Ali M, Grimm CF, Ritter M, Mohr L, Weth R, Bocher WO, Endrulat K, Blum HE, Geissler M Department of Medicine II, University Hospital Freiburg, D Freiburg, Germany. BACKGROUND/AIMS: Local ablation methods are an effective treatment for hepatocellular carcinoma (HCC). The rate of recurrence or development of intra-hepatic metastasis may be lowered by immune responses. Since HCCs are in general only weakly immunogenic, cell injury induced by local tumor ablation (PEI/RFTA) may increase HCC immunogenicity and may release endogenous adjuvants that activate dendritic cells (DC). The aim of the study, therefore, was the analysis whether PEI or RFTA induced injury results in an adjuvant effect for immune responses to HCCs. METHODS: Eight HCC patients were treated with PEI or RFTA and serially analyzed for 4 weeks. Plasmocytoid dentritic cells (PDC) and myeloid dendritic cells (MDC) were analyzed directly ex vivo and in vitro using FACS and proliferation assays. RESULTS: HCC ablation induced a functional transient activation of MDC but not of PDC associated with increased serum levels of TNF-alpha and IL-1beta. CONCLUSIONS: These findings suggest the combination of PEI or RFTA and active antigen specific immunotherapeutic approaches using DCs as a promising approach for the induction of sustained antitumoral immune responses aiming at the reduction of tumor recurrence and metastases in HCC patients.
6
Related Work Limited Paradigmatic Modifiability
Term Mining: (orthomorphological) normalization, term variation, term context, term clustering, … Grading Termhood Typical procedures to obtain and grade terms: (shallow) linguistic filtering (POS, chunking, parsing, …) non-linguistic frequency- or statistically-based association measures to grade termhood (e.g. t-test, MI, C-value …) ranked output More linguistically-oriented work (e.g. Jacquemin 2001, Daille 1996) requires deep (morpho)syntactic analysis Difficult to port across domains! Limited Paradigmatic Modifiability Cross-domain linguistic property for ATR based on shallow syntactic analysis
7
Text Corpus 513,000 MEDLINE abstracts from the domain of Molecular Biology (104 million words) Shallow syntactic annotation Genia POS tagger and YamCha Chunker Most terms are inside NPs (Justeson & Katz 95) Focus on NP recognition Filtering of stop words NPs of length 2 (word bigrams), length 3 (word trigrams) and length 4 (word quadgrams) Morphological normalization of rightmost noun via full-form UMLS Specialist Lexicon
8
Term Candidates
9
Evaluating ATR Quality
Typical: Domain experts (ad hoc) identify true positives among ranked output candidates Often only one expert no interannotator agreement Labor-intensive small size of candidate set What constitutes a relevant term? Difficult to decide in the absence of context Alternative: Take already existing terminological resources evolved and curated through community-wide expert consensus Biomedical domain as ideal test bed for evaluating ATR algorithms: UMLS Metathesaurus Existence in UMLS is decision criterion whether candidate in candidate sets is a term or not Excluding UMLS source vocabularies not relevant for biology (e.g. Nursing, Health Care Billing, etc.)
10
True Terms in Candidate Sets
Term candidate is true positive (TP) if it is found in UMLS 2004 word bigrams types: % word trigram types: % word quadgram types: %
11
Limited Paradigmatic Modifiability
Frequency may be misleading wrt termhood Term “long terminal repeat”: 434 occurrences Non-term “t cell response”: 2410 occurrences MWUs contains n slots: e.g.: [long]slot-1 [terminal]slot-2 [repeat]slot-3 Limited P-Mod: Probability with which one or more slots cannot be filled by other tokens Or: Likelihood of precluding the appearance of alternative tokens in particular slot positions Basic assumption: terms are linguistically more fixed and show less distributional variation
12
Limited Paradigmatic Modifiability
Select k slots of an n-gram (unordered selection without putting back): Selections selk=1 n = 3 k = 1 kslot terminal repeat long kslot repeat long terminal kslot-3
13
Limited Paradigmatic Modifiability
Select k slots of an n-gram (unordered selection without putting back): Selections selk=2 n = 3 k = 2 kslot kslot repeat kslot terminal kslot-3 long kslot kslot-3
14
Limited Paradigmatic Modifiability
Select k slots of an n-gram (unordered selection without putting back): kslot-x is a placeholder for any possible word type (and its frequency) Selections selk=3 n = 3 k = 3 kslot kslot kslot-3
15
Limited Paradigmatic Modifiability
For particular k (1 ≤ k ≤ n; n = length of n-gram): Determine frequency of n-gram Scale it by frequency of each possible selection sel Product over |sel|: Particular k-modifiability: Lower freq of sel more limited paradigmatic modifiability for particular k Paradigmatic Modifiability of an n-gram:
16
Limited Paradigmatic Modifiability
Term Non-Term
17
Methods of Evaluation UMLS as large and consensual terminology identifies TPs in our candidate sets Dynamic examination of m-highest ranked output samples Standard P/R graphs More reliable evaluation setting for ATR measures Compare P-Mod against standard algorithms: t-test (best among standard association measures for general-language collocation extraction) (cf. Evert & Krenn 2001) C-Value (widely used for ATR; cf. Frantzi et al 2000)
18
Precision/Recall for Bigrams
Steady advantage for P-Mod
19
Precision/Recall for Trigrams
80% recall at 50% output 80% recall at 63% output 80% recall at 66% output
20
Precision/Recall for Quadgrams
80% recall at 45% output 80% recall at 62% output 80% recall at 65% output
21
Significance Testing of Differences
22
Significance Testing of Differences
23
Significance Testing of Differences
24
Limited Paradigmatic Modifiability
Frequency may be misleading wrt termhood Term “long terminal repeat”: 434 occurrences Non-term “t cell response”: 2410 occurrences “long terminal repeat”: P-Mod: Rank 24 t-test: Rank 242 “t cell response”: P-Mod: Rank 1249 t-test: Rank
25
Corpus Size and Domain Independence
Performance as a Result of quite large Corpus Size (104 million words)? Some (Sub-)Domains lack plethora of free-text material e.g.: engineering, plant biology, clinical texts Experiment: Drastic reduction of corpus size down to million words Assessing terminology extraction methods
26
Precision/Recall for Trigrams
statistically significant
27
Conclusions Linguistically motivated terminology extraction method
P-Mod: Limited Paradigmatic Modifiability incorporates distributional fixedness of terms Significantly outperforms standard (non-linguistic) measures used for ATR Technical merit of P-Mod likely to be domain-independent independent of corpus size only requires shallow syntactic analysis
28
Paradigmatic Modifiability Statistics For the Extraction of
Complex Multi-Word Terms Joachim Wermter Udo Hahn
29
Appendix High-Performance ATR system also essential to update existing terminologies Concrete example: MeSH contains term “cell cycle” MeSH/GenBank suppl. contains term “cell cycle arrest protein BUB2” Term “cell cycle arrest”: Not contained in MeSH Ranked in top 10% of P-Mod Contained in GO stand-alone Hence: provide missing semantic link in UMLS
30
Significance Testing of Differences
31
Significance Testing of Differences
32
Jena University Language and Information Engineering (JULIE) Lab
Germany
33
Heading Blablabla…
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.