Download presentation
Presentation is loading. Please wait.
1
1 D. Bekhouche/ Y. Pollet/ B. Grilheres/ X. Denis University of Salford, UK 06/24/2004 PSI Rouen Perception System Information 9 th International Conference on the Application of Natural Language to Information Systems Architecture of a Medical Information Extraction System Dalila Bekhouche (dalila.bekhouche@ loria.fr) Yann Pollet (pollet@cnam.fr) Bruno Grilheres (bruno.grilheres@sysde.eads.net) Xavier Denis (xavier.denis@tiscali.fr)
2
2 D. Bekhouche/ Y. Pollet/ B. Grilheres/ X. Denis University of Salford, UK 06/24/2004 PSI Rouen Perception System Information 9 th International Conference on the Application of Natural Language to Information Systems Index Introduction Information extraction The architecture of the IE System Extraction of lexical and medical terms Evaluation of ICD-10 and CCMA results Limits of this approach and future work
3
3 D. Bekhouche/ Y. Pollet/ B. Grilheres/ X. Denis University of Salford, UK 06/24/2004 PSI Rouen Perception System Information 9 th International Conference on the Application of Natural Language to Information Systems Database 1- Introduction Problem: Difficult to access and exploit this amount of information Variety of content Specific terminology The practionners use uncertain expressions and sens modifying Difficulties in understanding for most NLP tools
4
4 D. Bekhouche/ Y. Pollet/ B. Grilheres/ X. Denis University of Salford, UK 06/24/2004 PSI Rouen Perception System Information 9 th International Conference on the Application of Natural Language to Information Systems 2- Information extraction Aim Identify and Extract relevant information from medical documents (examination report as colonoscopy) Aim Identify and Extract relevant information from medical documents (examination report as colonoscopy) How to identify the relevant information? Relevant information: events and entities described in texts which concern the patient (signs, diagnosis, acts, results) How to identify the relevant information? Relevant information: events and entities described in texts which concern the patient (signs, diagnosis, acts, results) Relevant information Extraction Domain knowledge Documents Free text Lexical Ressource
5
5 D. Bekhouche/ Y. Pollet/ B. Grilheres/ X. Denis University of Salford, UK 06/24/2004 PSI Rouen Perception System Information 9 th International Conference on the Application of Natural Language to Information Systems 3- The architecture of the IE System Documents Thesauri ICD- 10/Vidal/CCMA dictionary Database validation Extraction Generation resources and rules 1- Lexical level Named entities (Name,Medical terms) Date of examination Document type Signs Diagnosis Acts Results 2-Sub-sentence level Signs, symptoms
6
6 D. Bekhouche/ Y. Pollet/ B. Grilheres/ X. Denis University of Salford, UK 06/24/2004 PSI Rouen Perception System Information 9 th International Conference on the Application of Natural Language to Information Systems REGEX(words) or dictionary REGEX(words) and level 1 Mr was addressed for a checkup by McGann Mr Smith was addressed for a checkup by McGann Level 1 Level 2 Named entities(location, companies, organizations, dates) 4- Extraction of the lexical terms
7
7 D. Bekhouche/ Y. Pollet/ B. Grilheres/ X. Denis University of Salford, UK 06/24/2004 PSI Rouen Perception System Information 9 th International Conference on the Application of Natural Language to Information Systems 5- Extraction of the ICD-10 and CCMA 1-Preprocessing step: Reduce the text and thesauri Standardisation of words, removing irrelevant words 2-Recognizing of the discminate terms 3-Evaluate the Similarity (cosine measure) between the neighbouring terms in text and each candidate entry of the ICD-10 in relationship with indexing term Identify the various occurrences of these thesauri ICD-10: International classification of the diseases CCMA: Common Classification of the Medical Acts
8
8 D. Bekhouche/ Y. Pollet/ B. Grilheres/ X. Denis University of Salford, UK 06/24/2004 PSI Rouen Perception System Information 9 th International Conference on the Application of Natural Language to Information Systems N. docPrPr ReRe Before adding knowledge3130,5000,710 After adding knowledge3700,8770,798 6- Evaluation of ICD-10 and CCMA results valid annotations found by the system valid annotations found by the practitionner Precision = valid annotations found by the system all annotations found by the system Recall = 50% correct annotations. After adding knowledge, the precision increases up to 87,7% Recall is approximatively the same, it represents problems due to ambiguous words.
9
9 D. Bekhouche/ Y. Pollet/ B. Grilheres/ X. Denis University of Salford, UK 06/24/2004 PSI Rouen Perception System Information 9 th International Conference on the Application of Natural Language to Information Systems 7- Limits of this approach and future work French medical texts only and specifics domains colonoscopy & oncology records. Simple sentences as medical records but may have difficulties to analyse complex sentences needing a deep syntactic analysis we will focus on the generation and acquisition steps. Taking into account synonyms and feedback users
10
10 D. Bekhouche/ Y. Pollet/ B. Grilheres/ X. Denis University of Salford, UK 06/24/2004 PSI Rouen Perception System Information 9 th International Conference on the Application of Natural Language to Information Systems Thank you! dalila.bekhouche@ loria.fr PSI (Perception, system, information) Insa Rouen, Place E. Blondel, 76130 Mont St Aignan, France
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.