Best-of-Breed Hybrid Methods for Text De-identification Yang H, Garibaldi JM. Automatic detection of protected health information from clinical narratives.

Slides:



Advertisements
Similar presentations
Visit the ccScan Website Scan, Import, and Automatically File documents to the Cloud SCAN, IMPORT, AND AUTOMATICALLY FILE DOCUMENTS TO SALESFORCE ® Introduction.
Advertisements

HIPAA and Public Health 2007 Epi Rapid Response Team Conference.
HIPAA – Privacy Rule and Research USCRF Research Educational Series March 19, 2003.
Increasing public concern about loss of privacy Broad availability of information stored and exchanged in electronic format Concerns about genetic information.
NAU HIPAA Awareness Training
Informed Consent.
Are you ready for HIPPO??? Welcome to HIPAA
HIPAA Training Presentation for New Employees How did we get here? HIPAA Police 1.
Health Insurance Portability Accountability Act of 1996 HIPAA for Researchers: IRB Related Issues HSC USC IRB.
Electronic Health Records Danielle P. Berthelot, RHIA Director, Health Information Management and Cancer Registry Privacy Officer Woman’s Hospital.
University of Miami1 HIPAA Survival Skills An Introduction to HIPAA and Research University of Miami Human Subjects Research Office October 31, 2006 Evelyne.
Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
1 HIPAA, Researchers and the IRB: Part Two Alan Homans, IRB Chair and Nancy Stalnaker, IRB Administrator.
HIPAA, Researchers and the IRB Alan Homans, IRB Chair and Nancy Stalnaker, IRB Administrator.
Parkwood Access & Flow Project – November 2011 INSTRUCTIONS This PowerPoint presentation has an audio track built into it To take full advantage of the.
Health Insurance Portability and Accountability Act (HIPAA)
Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK.
APPLICATION : DIAGNOSTIC CODING 1 SIEMENS  Coding is the translation of diagnosis terms describing patients diagnosis or treatment into a coded number.
Extraction of Adverse Drug Effects from Clinical Records E. ARAMAKI* Ph.D., Y. MIURA **, M. TONOIKE ** Ph.D., T. OHKUMA ** Ph.D., H. MASHUICHI ** Ph.D.,K.WAKI.
Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.
Protected Health Information (PHI). Privileged Communication An exchange of information between two individuals in a confidential relationship. (Examples:
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
HIPAA Business Associates Leadership Group Meeting June 28, 2001.
Example of Medical Record Elements
14 May Privacy Requirements Phoenix Ambulatory Blood Pressure Monitoring System © 2006 Christopher J. Adams Copying and distribution of this document.
Health Research & Information Division, ESRI, Dublin, July 2008 The Audit Process.
De-identifying Pathology Reports for Pathology Informatics
Health information that does not identify an individual and with respect to which there is no reasonable basis to believe that the information can be.
1 EviCare (NLP) Aid clinical work in hospitals using NLP: “Summarize” health records Select and rank recommendations from clinical practice guidelines.
Systematic Reviews.
Using Text Mining and Natural Language Processing for Health Care Claims Processing Cihan ÜNAL
Open Health Natural Language Processing Consortium (OHNLP)
University Health Care Computer Systems Fellows, Residents, & Interns.
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.
Medical Law and Ethics, Third Edition Bonnie F. Fremgen Copyright ©2009 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved.
The potential to bring huge benefits to Patients..
De-identification: A Critical Success Factor in Clinical and Population Research Steven Merahn MD Dee Lang, RHIT Prepared for 2007 APIII Pittsburgh, PA.
HIPAA THE PRIVACY RULE. 2 HISTORY In 2000, many patients that were newly diagnosed with depression received free samples of anti- depressant medications.
Using Semantic Relations to Improve Passage Retrieval for Question Answering Tom Morton.
Health Insurance portability and Accountability Act (HIPAA)‏
MedKAT Medical Knowledge Analysis Tool December 2009.
Detection of Spelling Errors in Swedish Clinical Text Nizamuddin Uddin and Hercules Dalianis Department of Computer and Systems Sciences, (DSV)
TWC Illuminate Knowledge Elements in Geoscience Literature Xiaogang (Marshall) Ma, Jin Guang Zheng, Han Wang, Peter Fox Tetherless World Constellation.
Electronic Medical Record for Patient John Doe InfoBot: What the user would see.
Configuring Electronic Health Records Privacy and Security in the US Lecture b This material (Comp11_Unit7b) was developed by Oregon Health & Science University.
Open Health Natural Language Processing Consortium
Using Semantic Relations to Improve Information Retrieval
Ioana Barbantan and Rodica Potolea. Lots of technology to capture health information.
“Translational research includes two areas of translation. One (T1) is the process of applying discoveries generated during research in the laboratory,
HIPAA Training. What information is considered PHI (Protected Health Information)  Dates- Birthdays, Dates of Admission and Discharge, Date of Death.
Amber Stubbs, Christopher Kotfila, Ozlem Uzuner Journal of Biomedical Informatics DOI: /j.jbi
Privacy: HIPAA Emerson Murphy-Hill. Rosie Callender, RHIA, web.msm.edu/hipaa/An%20Introduction%20to%20HIPAA.ppt What is HIPAA? A Federal Law Created in.
HIPAA and RESEARCH 5 th Thursday May 31, Page 2.
Showcasing work by Jonnageddala, Liaw, Ray, Kumar, Chang, and Dai on
HEALTH INSURANCE PORTABILITY AND ACCOUNTABILITY ACT (HIPAA)
Sentiment analysis algorithms and applications: A survey
Medication Information Extraction
Guangbing Yang Presentation for Xerox Docushare Symposium in 2011
CRF &SVM in Medication Extraction
New Texas EMS/Trauma Registry System Training – Web Data Entry of Submersion Records August 2012.
Fenglong Ma1, Jing Gao1, Qiuling Suo1
Health Insurance Portability and Accountability Act
HIPAA Overview.
CS639: Data Management for Data Science
De-identification of Medical Narrative Data
DSHS, Environmental & Injury Epidemiology and Toxicology
SIDE: The Summarization IDE
Yingze Wang and Shi-Kuo Chang University of Pittsburgh
Case Study Template Kerecis Aurora Awards
The Health Insurance Portability and Accountability Act
Presentation transcript:

Best-of-Breed Hybrid Methods for Text De-identification Yang H, Garibaldi JM. Automatic detection of protected health information from clinical narratives. JBI. 2015; Dehghan A, Kovacevic A, Karystianis G, et al. Combining knowledge- and data-driven methods for de-identification of clinical narratives. JBI. 2015;

Text De-identification  Remove Protected Health Information (PHI) from clinical notes  HIPPA PHI Categories (18 specific identifiers) NAME LOCATION (all geographical identifiers smaller than a state) DATE (e.g., birth date, admission date, discharge date) AGE* ID (SSN, MRN, Insurance number, Account/license number) CONTACT (Phone, Fax, , URL, IP address) Introduction Date of Admission: 11/11/2011. Assessment: John Smith is a 3 yo male with concern of hypoglycemia......

Descriptive Statistics of i2b2 data

Text De-identification  Growing adoption of Electronic Health Records (EHRs)  Increased emphasis on the re-use/sharing of clinical data  A large amount of data are in unstructured format  Health Insurance Portability and Accountability Act (HIPPA) Introduction

Problem Statement Date of Admission: 11/11/2011. Assessment: John Smith is a 3 yo male with concern of hypoglycemia Date of Admission: 01/01/2001. Assessment: Jackie Chen is a 12 yo male with concern of hypoglycemia......

Literature

Conceptual Model

Text Pre-processing Tokenization (special): MARY -> MRN+Hospital 3yo -> 3 yo Doc-level feature: Only process text from certain sections (e.g., delete the “Medication” section in discharge summary) Both tasks are rule-based and require manual design of the regular expressions.

PHI Term Identification

ML-based Tagger Conditional Random Fields (McCallum, 2002)

Rule-based Tagger

Dictionary-based Tagger

Feature Extraction  Feature Definition

Example Features Feature Extraction  Example Features

Post-processing  Token-level entity extraction: MARY -> MRN+Hospital  Co-reference: “John Smith” ↔ “J. Smith”. Co-reference relations were predicted by the adapted co-reference resolution system  Name permutation: “John Smith” → {“John”, “Smith”, “Smith, John”, “J. Smith”, “John Smith”} (also applied to the hospital category)  Longitudinal search: if “Dr. Hutton” appeared in one patient record, it was likely to appear in the patient’s other records

Performance (U. Nottingham) Performance  Achieved the best performances on most of the categories

Error Analysis

 Machine learning-based tagger  Supplemented with other techniques Regular expressions External dictionaries Post-processing methods (e.g., co-reference)  Opportunities & Challenges Improved performance Domain specific (ML and regular expressions) Leveraging structured data(?) Conclusions

Thank you! Questions? Contact: