Presentation is loading. Please wait.

Presentation is loading. Please wait.

Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me? Hercules Dalianis Clinical Text Mining Group Department of.

Similar presentations


Presentation on theme: "Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me? Hercules Dalianis Clinical Text Mining Group Department of."— Presentation transcript:

1 Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me? Hercules Dalianis Clinical Text Mining Group Department of Computer and Systems Sciences (DSV) hercules@dsv.su.se

2 Background Starting 2007 Karolinska University Hospital, Stockholm Greater Stockholm (City Council) 2 million inhabitants 1800 beds/inpatients 550 clinical units Hercules Dalianis, MEDINFO 2013 2

3 TakeCare EPR system Swedish electronic patient record system, now owned by CompuGroup Medical Centralized, text file based Built on APL programming language Data transferred to MySQL database to make it manageable (Intelligence) Hercules Dalianis, MEDINFO 2013 3

4 Ethical permission What type of research will be carried out How will it be carried out No social security number No personal names Safe guard of data Hercules Dalianis, MEDINFO 2013 4

5 Encryption and safe guard Encrypted server Password protected Locked into an alarmed room Server locked to a rack No Internet connection Few people have access to this server (that have to sign security paper) => Probably safer than at the hospital Hercules Dalianis, MEDINFO 2013 5

6 Trust, Trust and more Trust Good contacts with hospital management They decide for the whole hospital/all clinical units No psychiatric or veneric diseases, no paperless refugees Hercules Dalianis, MEDINFO 2013 6

7 We obtained 1 million patient records from 550 clinical units from the year 2006-2010 In several extracts that also continue Each patient have an unique social security number, from birth to dead Replaced by a serial number All patient names removed The rest including sensitive text is present Hercules Dalianis, MEDINFO 2013 7 Stockholm EPR Corpus

8 DEID work Yes, we did it also to obtain an overview of what problems may occur We followed HIPAA *) but adapted it for Swedish conditions *) Health Insurance Portability and Accountability Act Hercules Dalianis, MEDINFO 2013 8

9 Hercules Dalianis The Stockholm EPR PHI *) corpus 100 electronic patient records (EPRs) in Swedish Five clinics: Neurology, Orthopaedia, Infection, Dental Surgery and Nutrition 20 patients from each clinic, 50% men, 50% women 380 000 tokens Three annotators annotated the whole corpus *) Protected Health Information 9

10 Hercules Dalianis 10 28 PHI-classes Account_Number, Age, Age_Over_89, Biometric_Identifier, Date_Part, Full_Date, Year, First_Name, Last_Name, Patient_First_Name, Patient_Last_Name, Relative_First_Name, Relative_Last_Name, Clinician_First_Name, Clinician_Last_Name, Location, Country, Municipality, Organization, Street_Address, Town, Health_Care_Unit, Device_Identifier_and_Serial_Number, Ethnicity, Fax_Number, Phone_Number, Relation, Uncertain

11 Hercules Dalianis 11

12 Consensus eight annotation classes Age Date_Part Full_Date First_Name Last_Name, Health_Care_Unit Location Phone_Number Hercules Dalianis 12

13 Annotation classes and instances Age 56 Full date710 Date part500 First name923 Last name928 Location 1 021 Health care unit148 Phone number135 Sum: 4 421 Hercules Dalianis 13

14 380 000 tokens 4 421 sensitive instances ~ 1 percent sensitive information Hercules Dalianis 14

15 Eight annotation classes training and test using Stanford NER-CRF Hercules Dalianis 15

16 0.95-0.74 precision, 0.83-0.36 recall 0.90-0.49 F-score The 8 annotation classes and the words The rest is Black box –Window breadth –Distance between words etc Hercules Dalianis 16 Conditional Random fields à la Stanford NER

17 Research on Stockholm EPR Corpus DEID and Resynthesis Factuality level detection of diagnoses Negation detection Detecting the amount of hospital-acquired infections (HAI) Detection of adverse drug events Comorbidities Hercules Dalianis, MEDINFO 2013 17

18 Conclusion Preferably to work on original data Too costly and difficult to de-identify data Not safe enough De-identification makes the data too noisy. Hercules Dalianis, MEDINFO 2013 18

19 References Velupillai, S., H. Dalianis, M. Hassel and G. H. Nilsson. 2009. Developing a standard for de-identifying electronic patient records written in Swedish: precision, recall and F-measure in a manual and computerized annotation trial. International Journal of Medical Informatics (2009), doi:10.1016/j.ijmedinf.2009.04.005 Dalianis, H. and S. Velupillai. 2010. De-identifying Swedish Clinical Text - Refinement of a Gold Standard and Experiments with Conditional Random Fields, Journal of Biomedical Semantics 2010, 1:6 (12 April 2010) Hercules Dalianis, MEDINFO 2013 19

20 Alfalahi, A., S. Brissman and H. Dalianis. 2012. Pseudonymisation of person names and other PHIs in an annotated clinical Swedish corpus. In the Proceedings of the Third Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM 2012) held in conjunction with LREC 2012, May 26, Istanbul, pp 49-54 Hercules Dalianis, MEDINFO 2013 20

21 Comorbidities in Comorbidity-view Which ICD-10 codes co-occur with which other ones Hercules Dalianis 21

22 Hercules Dalianis 22 Comorbidity View

23 Hercules Dalianis 23

24 Hercules Dalianis 24

25 Hercules Dalianis 25 123 H - IVA 322916614D 2007-08-21 9:12 1944 Kvinna Anamnesis Kvinna med hjrtsvikt, förmaksflimmer, angina pectoris. Ensamstående änka. Tidigare CVL med sequelae högersidig hemipares och afasi. Tidigare vårdad för krampanfall misstänkt apoplektisk. Inkommer nu efter att ha blivit hittad på en stol och sannolikt suttit så över natten. Inkommer nu för utredning. Sonen Johan är med. Example record (Anonymized manually)

26 23 H - IVA 322916614D 2008-08-21 10:54 1944 Kvinna Bedömning Grav hjärtsvikt efter hjärtinfarkt x 2 inklusive eoisod med asystoli och HLR. EF 20-25%. Neurologisk påverkan med hösidig svaghet. Blodprov. Odlingar tas i blod och urin. Remiss skickas pulm-rtg enl dr Svenssons anteckning. Atelektaser. Pneumoni, I110. Hjärtinsufficiens, ospecificerad, I509 Hercules Dalianis 26

27 Hercules Dalianis 27 (English translation) 123 H - IVA 322916614D 2008-08-21 9:12 1944 Woman Anamnesis Woman with hert failures, atrial fibrillation, and angina pectoris. Single widow. Former CVL with sequele, rght hemiparesis and aphasia. Prior hospital care for seizures, suspected to be apoepeleptic. Arrive to hospital after being found in a chair and probably been sitting there over night. Arrive for further investigation and care. Accompanied by her son Johan.

28 Hercules Dalianis 28 123 H - IVA 322916614D 2008-08-2110:54 1944 Woman Assessment/Plan Severe heart failure after heart infarction x 2. including episode with heart arrest and acute heart arrest treatment. Ejection fracture (EF) 20- 25%. Neurological symptoms with right sided hemiparesis. Blood samples. Culture for blood and urine. Referral for pulmonary x-ray according to dr Svensson’s notes. Atelectases. Pneumonia, I110. Heart failure, unspecified, I509.


Download ppt "Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me? Hercules Dalianis Clinical Text Mining Group Department of."

Similar presentations


Ads by Google