Presentation is loading. Please wait.

Presentation is loading. Please wait.

De-identification of Medical Narrative Data

Similar presentations


Presentation on theme: "De-identification of Medical Narrative Data"— Presentation transcript:

1 De-identification of Medical Narrative Data
Introduction De-identification In Natural Language Processing (NLP), de-identification is the process of masking or removing a certain number of identifiers (i.e. pieces of information which could directly or in combination reveal the identity of the subject). Commonly, the choice of identifiers to be removed is based on the list of 18 identifiers provided by the Health Insurance Portability and Accountability Act (HIPPA) of the American legislation. Why ? Privacy Protection Decreases the risk of re-identification Promote Research No need for consent or waiver of consent Previous work De-identification has been performed on medical free text for over 20 years with a growing capacity to de-identify various types of medical documents and an ability to process larger amounts of information. Initially performed by hand, de-identification has been automated using rule based systems and more recently machine-learning (or mixed) methods. De-identified information ≠ Anonymous information Methods Rule-based method Named Entity Recognition (NER) task De-identification of PHI via finite-state automata Replacement of de-identified PHI by surrogate information Tags on de-identified PHI: <patient>, <doctor>, <location>, <date> Examples of de-identified text Original text De-identified text Madame Foufi sera suivie à la consultation de diabétologie aux HUG du 08.2 au 09/02/2019. Madame Cartier <patient> sera suivie à la consultation de diabétologie aux Hôpitaux <location> du au 30/02/2019 <date>. Mme Foufi a été transférée à la clinique de Joli-Mont le 06 janvier 2014. Mme Cartier <patient> a été transférée à la Clinique <location> le 30 février 2014 <date>. Results De-identified PHIs in 11’000 discharge summaries Results evaluation on 1’000 discharge summaries Dates Patients’ names Doctors’ names Locations Nurses’ names & nursing care services Telephone & fax numbers Addresses Health Insurance companies 29,221 27,357 4,685 7,907 483 83 5 3 Dates Patients’ names Doctors’ names Locations Overall performance Precision 0.9889 0.9970 1 0.9628 0.9907 Recall 0.9228 0.9916 0.9876 0.7872 0.9342 Conclusion Next steps Processing of time intervals De-identification, manual validation and evaluation on larger corpora Adapt de-identification rules to other languages (IT, GER) Remarks De-identification is a challenging task Our de-identification tool achieved a good performance Errors in discharge summaries affect the de-identification process Vasiliki FOUFI, PhD Christophe GAUDET-BLAVIGNAC, Bmed, Mmed, BSc CS Raphaël CHEVRIER, Bmed, Mmed Christian LOVIS, MD, MPH, FACMI


Download ppt "De-identification of Medical Narrative Data"

Similar presentations


Ads by Google