Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me? Hercules Dalianis Clinical Text Mining Group Department of Computer and Systems Sciences (DSV)
Background Starting 2007 Karolinska University Hospital, Stockholm Greater Stockholm (City Council) 2 million inhabitants 1800 beds/inpatients 550 clinical units Hercules Dalianis, MEDINFO
TakeCare EPR system Swedish electronic patient record system, now owned by CompuGroup Medical Centralized, text file based Built on APL programming language Data transferred to MySQL database to make it manageable (Intelligence) Hercules Dalianis, MEDINFO
Ethical permission What type of research will be carried out How will it be carried out No social security number No personal names Safe guard of data Hercules Dalianis, MEDINFO
Encryption and safe guard Encrypted server Password protected Locked into an alarmed room Server locked to a rack No Internet connection Few people have access to this server (that have to sign security paper) => Probably safer than at the hospital Hercules Dalianis, MEDINFO
Trust, Trust and more Trust Good contacts with hospital management They decide for the whole hospital/all clinical units No psychiatric or veneric diseases, no paperless refugees Hercules Dalianis, MEDINFO
We obtained 1 million patient records from 550 clinical units from the year In several extracts that also continue Each patient have an unique social security number, from birth to dead Replaced by a serial number All patient names removed The rest including sensitive text is present Hercules Dalianis, MEDINFO Stockholm EPR Corpus
DEID work Yes, we did it also to obtain an overview of what problems may occur We followed HIPAA *) but adapted it for Swedish conditions *) Health Insurance Portability and Accountability Act Hercules Dalianis, MEDINFO
Hercules Dalianis The Stockholm EPR PHI *) corpus 100 electronic patient records (EPRs) in Swedish Five clinics: Neurology, Orthopaedia, Infection, Dental Surgery and Nutrition 20 patients from each clinic, 50% men, 50% women tokens Three annotators annotated the whole corpus *) Protected Health Information 9
Hercules Dalianis PHI-classes Account_Number, Age, Age_Over_89, Biometric_Identifier, Date_Part, Full_Date, Year, First_Name, Last_Name, Patient_First_Name, Patient_Last_Name, Relative_First_Name, Relative_Last_Name, Clinician_First_Name, Clinician_Last_Name, Location, Country, Municipality, Organization, Street_Address, Town, Health_Care_Unit, Device_Identifier_and_Serial_Number, Ethnicity, Fax_Number, Phone_Number, Relation, Uncertain
Hercules Dalianis 11
Consensus eight annotation classes Age Date_Part Full_Date First_Name Last_Name, Health_Care_Unit Location Phone_Number Hercules Dalianis 12
Annotation classes and instances Age 56 Full date710 Date part500 First name923 Last name928 Location Health care unit148 Phone number135 Sum: Hercules Dalianis 13
tokens sensitive instances ~ 1 percent sensitive information Hercules Dalianis 14
Eight annotation classes training and test using Stanford NER-CRF Hercules Dalianis 15
precision, recall F-score The 8 annotation classes and the words The rest is Black box –Window breadth –Distance between words etc Hercules Dalianis 16 Conditional Random fields à la Stanford NER
Research on Stockholm EPR Corpus DEID and Resynthesis Factuality level detection of diagnoses Negation detection Detecting the amount of hospital-acquired infections (HAI) Detection of adverse drug events Comorbidities Hercules Dalianis, MEDINFO
Conclusion Preferably to work on original data Too costly and difficult to de-identify data Not safe enough De-identification makes the data too noisy. Hercules Dalianis, MEDINFO
References Velupillai, S., H. Dalianis, M. Hassel and G. H. Nilsson Developing a standard for de-identifying electronic patient records written in Swedish: precision, recall and F-measure in a manual and computerized annotation trial. International Journal of Medical Informatics (2009), doi: /j.ijmedinf Dalianis, H. and S. Velupillai De-identifying Swedish Clinical Text - Refinement of a Gold Standard and Experiments with Conditional Random Fields, Journal of Biomedical Semantics 2010, 1:6 (12 April 2010) Hercules Dalianis, MEDINFO
Alfalahi, A., S. Brissman and H. Dalianis Pseudonymisation of person names and other PHIs in an annotated clinical Swedish corpus. In the Proceedings of the Third Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM 2012) held in conjunction with LREC 2012, May 26, Istanbul, pp Hercules Dalianis, MEDINFO
Comorbidities in Comorbidity-view Which ICD-10 codes co-occur with which other ones Hercules Dalianis 21
Hercules Dalianis 22 Comorbidity View
Hercules Dalianis 23
Hercules Dalianis 24
Hercules Dalianis H - IVA D : Kvinna Anamnesis Kvinna med hjrtsvikt, förmaksflimmer, angina pectoris. Ensamstående änka. Tidigare CVL med sequelae högersidig hemipares och afasi. Tidigare vårdad för krampanfall misstänkt apoplektisk. Inkommer nu efter att ha blivit hittad på en stol och sannolikt suttit så över natten. Inkommer nu för utredning. Sonen Johan är med. Example record (Anonymized manually)
23 H - IVA D : Kvinna Bedömning Grav hjärtsvikt efter hjärtinfarkt x 2 inklusive eoisod med asystoli och HLR. EF 20-25%. Neurologisk påverkan med hösidig svaghet. Blodprov. Odlingar tas i blod och urin. Remiss skickas pulm-rtg enl dr Svenssons anteckning. Atelektaser. Pneumoni, I110. Hjärtinsufficiens, ospecificerad, I509 Hercules Dalianis 26
Hercules Dalianis 27 (English translation) 123 H - IVA D : Woman Anamnesis Woman with hert failures, atrial fibrillation, and angina pectoris. Single widow. Former CVL with sequele, rght hemiparesis and aphasia. Prior hospital care for seizures, suspected to be apoepeleptic. Arrive to hospital after being found in a chair and probably been sitting there over night. Arrive for further investigation and care. Accompanied by her son Johan.
Hercules Dalianis H - IVA D : Woman Assessment/Plan Severe heart failure after heart infarction x 2. including episode with heart arrest and acute heart arrest treatment. Ejection fracture (EF) %. Neurological symptoms with right sided hemiparesis. Blood samples. Culture for blood and urine. Referral for pulmonary x-ray according to dr Svensson’s notes. Atelectases. Pneumonia, I110. Heart failure, unspecified, I509.