Download presentation
Presentation is loading. Please wait.
Published byMorgan Terry Modified over 9 years ago
1
Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France
2
2 Outline Context : UMLF for French The desired coverage The target lexical information The organisation of a specialised lexicon Acquiring lexical information Initial coverage Obtaining lexical entries from general lexicon Guessing technique Results Consensus guessing Acquisition of the full paradigm General improvement Conclusion and further work
3
3 Context : the InterSTIS project InterSTIS: development of Terminology Server for French Medical Terminologies Sub-Project: Improving the Lexical Coverage of a French medical lexicon (UMLF : Unified Medical Lexicon for French) Use: support indexation process of medical texts Issues: What is the desired lexical knowledge ? How to acquire it ?
4
4 The desired coverage Reference: “Term-Union” Union of 10 terminologies (CIM-10, SNOMED, MeSH, CISMeF, …) of French medical domains, organised around concept identifiers (CUI) of the UMLS 311,518 terms 203,300 unique concepts (CUI) 94,964 word-forms
5
5 Term-Union: example C0000936MSHFRE…Accommodation de l'oei C0000936MSHFRE…Accommodation des yeux C0000936MSHFRE…Accommodation oculaire C0000936SNMIGIPFRE…accommodation visuelle... C00001558MSHF … Voie cutanée C00001558 MSHF… Voie intradermique C00001558MSHF … Voie percutanée C00001558 MSHF … Voie transcutanée Observation of term variation
6
6 Target lexical information Term variation within Term-Union Graphemic équilibre acido-basique – équilibre acidobasique [EN: acid-base balance] Morphosyntactic adaptation de l'oeil - adaptation des yeux [EN: eye adaptation] Morphosemantic intoxication à l’alcool - intoxication alcoolique [EN: alcohol intoxication] Others...
7
7 Organisation of the specialised lexicon 3 types of relational tables for the 3 levels of representation (graphemic, inflection, derivation) A full-entry lexicon (LMF compliant) that gathers all lexical information … inter-maxillaire | intermaxillaire insulino-sécrétantes | insulinosécrétantes scléro-cornéenne | sclérocornéenne …... abdominal | abdomen aplasique | aplasie arachnoïdien | arachnoïde argentique | argent … sérofibrineux | sérofibrineux | Afpms sérofibrineuse | sérofibrineux | Afpfs sérofibrineux | sérofibrineux | Afpmp sérofibrineuses | sérofibrineux | Afpfp …
8
8 Outline Context : UMLS for French The desired coverage The target lexical information The organisation of a specialised lexicon Acquiring lexical information Initial coverage Obtaining lexical entries from general lexicon Guessing technique Results Consensus guessing Acquisition of the full paradigm General improvement Conclusion and further work
9
9 Acquiring the lexical information Initial coverage of UMLF (previous project, UMLF, based on Baud et al. 1998) 17,192 lexical units 5,353 adjectives 11,799 nouns 36,211 word forms
10
10 Acquiring the lexical information From general lexicon Existing French general lexicon (Morphalou) With a guessing technique
11
11 Acquiring the lexical information From guessing technique (Tanguy & Hathout 2007) 3 steps: Learning phase : calculating the most frequent tag for each ending string in 2 existing lexicons Guessing phase: assigning possible tag(s) Cross validation with 2 guessing based on 2 lexicons
12
12 Acquiring the lexical information Acquiring the full paradigm All the inflectional forms Lemma Based on “productive” inflectional paradigms 9 for adjectives 3 for nouns Algorithm based on lexical tries to cluster forms of the same paradigm
13
13 Outline Context : UMLS for French The desired coverage The target lexical information The organisation of a specialised lexicon Acquiring lexical information Initial coverage Obtaining lexical entries from general lexicon Guessing technique Results Consensus guessing Acquisition of the full paradigm General improvement Conclusion and further work
14
14 Acquisition from general lexicon: results 74,9786,617Morphalou 81,59519,599Initial UMLF 94,964Term-Union Remaining words to describe Known words entries
15
15 Acquisition with guessing techniques: results 74,978 unknown forms 44,515 analyses from Morphalou-based program 35,438 analyses from UMLF-based program Cross-validation: 30,137 in common
16
16 Acquisition with guessing techniques: evaluation Errors: 82 out of 1000 (8.2 %) 82Total 5Other 10Spelling/segmentation 1English words 5Latin words 49Proper names 12Wrong label
17
17 Acquisition of the full paradigm: Results 4,453 paradigms captured (incomplete or not, grouping 9352 word forms) 3,308 adjectives 514 nouns Automatic extension for the full paradigms (with canonical forms only) Manually checked for the others
18
18 General improvement 25,7%70,6028,088Acquisition 21,0%74,97817,828Morphalou 14,1%81,59536,211UMLF-v1 CoverageStill unknown in Term-union Forms added Source
19
19 Outline Context : UMLS for French The desired coverage The target lexical information The organisation of a specialized lexicon Acquiring lexical information Initial coverage Obtaining lexical entries from general lexicon Guessing technique Results Consensus guessing Acquisition of the full paradigm General improvement Conclusion and further work
20
20 Discussion and conclusion The acquisition and evaluation of specialised lexical resources require a specific reference Term-Union Extract (full) lexical information Assess lexical needs and target Other acquisition techniques (CRF for inflectional information, rule-based techniques for derivational information)
21
21 Acknowledgment This work was partially funded by project InterSTIS (ANR-07-TECSAN- 010) InterSTIS project: www.interstis.orgwww.interstis.org
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.