Presentation is loading. Please wait.

Presentation is loading. Please wait.

Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France.

Similar presentations


Presentation on theme: "Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France."— Presentation transcript:

1 Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France

2 2 Outline  Context : UMLF for French The desired coverage The target lexical information The organisation of a specialised lexicon  Acquiring lexical information Initial coverage Obtaining lexical entries from general lexicon Guessing technique  Results Consensus guessing Acquisition of the full paradigm General improvement  Conclusion and further work

3 3 Context : the InterSTIS project  InterSTIS: development of Terminology Server for French Medical Terminologies  Sub-Project: Improving the Lexical Coverage of a French medical lexicon (UMLF : Unified Medical Lexicon for French)  Use: support indexation process of medical texts  Issues:  What is the desired lexical knowledge ?  How to acquire it ?

4 4 The desired coverage  Reference: “Term-Union” Union of 10 terminologies (CIM-10, SNOMED, MeSH, CISMeF, …) of French medical domains, organised around concept identifiers (CUI) of the UMLS 311,518 terms 203,300 unique concepts (CUI) ‏ 94,964 word-forms

5 5 Term-Union: example C0000936MSHFRE…Accommodation de l'oei C0000936MSHFRE…Accommodation des yeux C0000936MSHFRE…Accommodation oculaire C0000936SNMIGIPFRE…accommodation visuelle... C00001558MSHF … Voie cutanée C00001558 MSHF… Voie intradermique C00001558MSHF … Voie percutanée C00001558 MSHF … Voie transcutanée  Observation of term variation

6 6 Target lexical information  Term variation within Term-Union Graphemic  équilibre acido-basique – équilibre acidobasique [EN: acid-base balance] Morphosyntactic  adaptation de l'oeil - adaptation des yeux [EN: eye adaptation] Morphosemantic  intoxication à l’alcool - intoxication alcoolique [EN: alcohol intoxication] Others...

7 7 Organisation of the specialised lexicon  3 types of relational tables for the 3 levels of representation (graphemic, inflection, derivation)  A full-entry lexicon (LMF compliant) that gathers all lexical information … inter-maxillaire | intermaxillaire insulino-sécrétantes | insulinosécrétantes scléro-cornéenne | sclérocornéenne …... abdominal | abdomen aplasique | aplasie arachnoïdien | arachnoïde argentique | argent … sérofibrineux | sérofibrineux | Afpms sérofibrineuse | sérofibrineux | Afpfs sérofibrineux | sérofibrineux | Afpmp sérofibrineuses | sérofibrineux | Afpfp …

8 8 Outline  Context : UMLS for French The desired coverage The target lexical information The organisation of a specialised lexicon  Acquiring lexical information Initial coverage Obtaining lexical entries from general lexicon Guessing technique  Results Consensus guessing Acquisition of the full paradigm General improvement  Conclusion and further work

9 9 Acquiring the lexical information  Initial coverage of UMLF (previous project, UMLF, based on Baud et al. 1998) 17,192 lexical units  5,353 adjectives  11,799 nouns 36,211 word forms

10 10 Acquiring the lexical information  From general lexicon Existing French general lexicon (Morphalou)  With a guessing technique

11 11 Acquiring the lexical information  From guessing technique (Tanguy & Hathout 2007)  3 steps: Learning phase : calculating the most frequent tag for each ending string in 2 existing lexicons Guessing phase: assigning possible tag(s) Cross validation with 2 guessing based on 2 lexicons

12 12 Acquiring the lexical information  Acquiring the full paradigm All the inflectional forms Lemma  Based on “productive” inflectional paradigms 9 for adjectives 3 for nouns  Algorithm based on lexical tries to cluster forms of the same paradigm

13 13 Outline  Context : UMLS for French The desired coverage The target lexical information The organisation of a specialised lexicon  Acquiring lexical information Initial coverage Obtaining lexical entries from general lexicon Guessing technique  Results Consensus guessing Acquisition of the full paradigm General improvement  Conclusion and further work

14 14 Acquisition from general lexicon: results 74,9786,617Morphalou 81,59519,599Initial UMLF 94,964Term-Union Remaining words to describe Known words entries

15 15 Acquisition with guessing techniques: results  74,978 unknown forms 44,515 analyses from Morphalou-based program 35,438 analyses from UMLF-based program Cross-validation: 30,137 in common

16 16 Acquisition with guessing techniques: evaluation  Errors: 82 out of 1000 (8.2 %) 82Total 5Other 10Spelling/segmentation 1English words 5Latin words 49Proper names 12Wrong label

17 17 Acquisition of the full paradigm: Results  4,453 paradigms captured (incomplete or not, grouping 9352 word forms) 3,308 adjectives 514 nouns   Automatic extension for the full paradigms (with canonical forms only)  Manually checked for the others

18 18 General improvement 25,7%70,6028,088Acquisition 21,0%74,97817,828Morphalou 14,1%81,59536,211UMLF-v1 CoverageStill unknown in Term-union Forms added Source

19 19 Outline  Context : UMLS for French The desired coverage The target lexical information The organisation of a specialized lexicon  Acquiring lexical information Initial coverage Obtaining lexical entries from general lexicon Guessing technique  Results Consensus guessing Acquisition of the full paradigm General improvement  Conclusion and further work

20 20 Discussion and conclusion  The acquisition and evaluation of specialised lexical resources require a specific reference  Term-Union Extract (full) lexical information Assess lexical needs and target  Other acquisition techniques (CRF for inflectional information, rule-based techniques for derivational information)

21 21 Acknowledgment  This work was partially funded by project InterSTIS (ANR-07-TECSAN- 010)  InterSTIS project: www.interstis.orgwww.interstis.org


Download ppt "Semi-Automated Extension of a Specialized Medical Lexicon for French Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France."

Similar presentations


Ads by Google