De-identification: A Critical Success Factor in Clinical and Population Research Steven Merahn MD Dee Lang, RHIT Prepared for 2007 APIII Pittsburgh, PA.

Slides:



Advertisements
Similar presentations
| Implications for Health Information Exchange – MetroChicago January 2011.
Advertisements

A Plan for a Sustainable Community Behavioral Health Information Network Western States Health-e Connection Summit & Trade Show September 10, 2013.
Panel Identification Improvement Facilitator Training Session 1 Day 2.
Physician Assistants Optimizing Patient Care. Presentation Objectives What is a PA? Scope of Practice PAs in Canada PAs benefiting the Health Care System.
ELTSS Alignment to Nationwide Interoperability Roadmap DRAFT: For Stakeholder Consideration in response to public comment.
Transcription service.  Established in 2007 and providing software development and services for the past four years.  A leader among medical transcription.
Purpose Vision Values Version 1.0. Core Purpose The Health and Social Care Information Centre is a ground-breaking data, information and technology resource.
Overview of Biomedical Informatics Rakesh Nagarajan.
Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
Clinical Information System Implementation Project Prepared for Clinical Affairs Committee December 4, 2002.
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
Information Extraction from Clinical Reports Wendy W. Chapman, PhD University of Pittsburgh Department of Biomedical Informatics.
Chapter 3: System design. System design Creating system components Three primary components – designing data structure and content – create software –
BTRIS: The NIH Biomedical Translational Research Information System James J. Cimino Chief, Laboratory for Informatics Development NIH Clinical Center.
Faculty of Computer Science © 2006 CMPUT 605March 31, 2008 Towards Applying Text Mining and Natural Language Processing for Biomedical Ontology Acquisition.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dörre, Peter Gerstl, and Roland Seiffert Presented By: Jake Happs,
1 Digital Libraries and Evidence in the Developing World Context Dr. Jon Ferguson Senior Health Database Scientist IMMPACT Project University of Aberdeen.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Automatic Labeling of EEGs Using Deep Learning M. Golmohammadi, A. Harati, S. Lopez I. Obeid and J. Picone Neural Engineering Data Consortium College of.
Datamining MEDLINE for Topics and Trends in Dental and Craniofacial Research William C. Bartling, D.D.S. NIDCR/NLM Fellow in Dental Informatics Center.
Decision Support for Quality Improvement
Clinician or proxy Public Health Patient or proxy Business Actors Sys Admin Outside Systems Clinical Results Source Systems Registration Systems Claims.
ACCELERATING CLINICAL AND TRANSLATIONAL RESEARCH A simple, flexible tool for inexpensively building secure data capture systems Andy.
Data Collection and Aggregation: Making It Work for Your P4P Program Dolores Yanagihara, MPH Integrated Healthcare Association February 27, 2008 National.
Exchange: The Central Feature of Meaningful Use Stage Meaningful Use and Health Care Innovation Conference Craig Brammer Office of the National.
Community Paramedic. Benchmark 101 We need a description of the epidemiology of the medical conditions targeted by the community paramedicine program.
Confidentiality and Security Issues in ART & MTCT Clinical Monitoring Systems Meade Morgan and Xen Santas Informatics Team Surveillance and Infrastructure.
Chapter 15 HOSPITAL INSURANCE.
Li Xiong CS573 Data Privacy and Security Healthcare privacy and security: Genomic data privacy.
De-identifying Pathology Reports for Pathology Informatics
Standards & Vocabulary
Computers in Healthcare Jinbo Bi Department of Computer Science and Engineering Connecticut Institute for Clinical and Translational Research University.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
Using Text Mining and Natural Language Processing for Health Care Claims Processing Cihan ÜNAL
 Dr. Syed Noman Hasany.  Review of known methodologies  Analysis of software requirements  Real-time software  Software cost, quality, testing and.
Open Health Natural Language Processing Consortium (OHNLP)
July 31, 2009Prepared by the Maine Health Information Center Overview of All Payer Claims Data Suanne Singer, Senior Consultant Maine Health Information.
Chapter 6 – Data Handling and EPR. Electronic Health Record Systems: Government Initiatives and Public/Private Partnerships EHR is systematic collection.
EMR Data Portability Setting the Stage for Interoperability May 5, 2008 By: Harley Rodin & Ed Chang.
THE TUH EEG CORPUS: A Big Data Resource for Automated EEG Interpretation A. Harati, S. López, I. Obeid and J. Picone Neural Engineering Data Consortium.
Privacy in Healthcare Challenges Associated with Implementing Privacy in an Electronic Health Records Environment John P. Houston, J.D. Vice President,
Methodology - Conceptual Database Design. 2 Design Methodology u Structured approach that uses procedures, techniques, tools, and documentation aids to.
1/26/2004TCSS545A Isabelle Bichindaritz1 Database Management Systems Design Methodology.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Multimodal User Interface with Natural Language Classification for Clinicians At Point of Care Health Informatics Showcase Peter Budd Sponsors: NCCH -
March 31, 1998NSF IDM 98, Group F1 Group F Multi-modal Issues, Systems and Applications.
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
Administrative Applications of Information Technology for Nursing Managers CHAPTER 27.
Automatic Discovery and Processing of EEG Cohorts from Clinical Records Mission: Enable comparative research by automatically uncovering clinical knowledge.
October 9 th, 2015 University of Pennsylvania TIES Cancer Research Network Y3 Face to Face Meeting U24 CA Session 5 Regulatory Update.
Physicians, secondary providers, health care professionals and their staff use the P-Scribe Viewer to retrieve, view, edit, export, print or interface.
Electronic Medical Record for Patient John Doe InfoBot: What the user would see.
Resolving Challenges in Data Collection, Aggregation, and Use of Standardized Measures Dolores Yanagihara, MPH Integrated Healthcare Association February.
Introduction Complex and large SW. SW crises Expensive HW. Custom SW. Batch execution Structured programming Product SW.
Data Management in Clinical Research Rosanne M. Pogash, MPA Manager, PHS Data Management Unit January 12,
CS223: Software Engineering
Open Health Natural Language Processing Consortium
The Medical Record, Documentation, and Filing
Selecting Relevant Documents Assume: –we already have a corpus of documents defined. –goal is to return a subset of those documents. –Individual documents.
Documentation Requirements for Hospital Accreditation -By Global Manager Group.
Health Informatics Awareness Planned DayTopicPlanned Time Day 1 22/7/ Course introduction & pre course survey 2.Pre evaluation test 3.Introduction.
Best-of-Breed Hybrid Methods for Text De-identification Yang H, Garibaldi JM. Automatic detection of protected health information from clinical narratives.
The purpose of a CPU is to process data Custom written software is created for a user to meet exact purpose Off the shelf software is developed by a software.
Health Informatics Health Informatics professionals use technology to help patients and healthcare professionals. They design and develop information systems.
Massachusetts General Hospital (Inventory Management)
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Electronic Health Records (EHR)
Chapter 16 Medical Records.
Keeping all your medical forms in one location
System Model Acquisition from Requirements Text
Presentation transcript:

De-identification: A Critical Success Factor in Clinical and Population Research Steven Merahn MD Dee Lang, RHIT Prepared for 2007 APIII Pittsburgh, PA September 10, 2007

Major gaps exist today in between patient care, clinical research and evidence-based medicine.

Sharing Data is the Key “Amassing large quantities of anonymized clinical and non-clinical information from medical records and reports and analyzing that data for patterns and other observations (is the best way to) to support continuous quality improvement, shape best practices and inform clinical and population-based decision making” A Rapid Learning Health System Health Affairs 26(2), January 2007

Processing Predicated on Protecting Patient Privacy Clinical records can be an important source of information…most of the information in these records is in the form of free text and extracting useful information from them requires automatic processing (e.g., index, semantically interpret, and search). A prerequisite to the distribution of clinical records outside of hospitals, be it for Natural Language Processing (NLP) or medical re- search, is de-identification J Am Med Inform Assoc. 2007;14: DOI /jamia.M2444.

Problems to Solve Sources of data Protecting patient privacy Creating and maintaining a corpus of HIPAA compliant and searchable data Building collaborations; creating networks of institutions sharing data Emerging patient “data rights” issues

Sources of Data EMR/CIS systems Large amounts of free text; not all data is parsed or field-limited Transcribed Records and Reports Even in systems without CIS, most transcriptions are delivered as electronic files Pathology Reports (cf CaTIES) Surgical Notes Radiology Reports Dischage Summaries No need to wait for an EMR to create an RLHS

Protecting Patient Privacy De-identification is a well-defined, but limited, step in a broader research workflow or protocol The defined nature of the step includes managing individually identifiable information in records and reports Such schema includes redaction, elimination, categorical replacement (e.g., place, age range), and replacement with proxies (Dr X), and offsets (day 1) A process which must be constantly “tuned” in response to dynamic input variables and patterns of documentation

CIS Transcribed Reports De-identified Database De-identified Data De-identification Methodology Query Interface QA FIREWALL Trusted Proxy RE-ID Method Admin NLP Other processes

Considerations When choosing a de-identification methodology, four things need consideration What is the reliability and validity of the methodology? Can the method maintain its specificity and sensitivity in local use? What are the limitations of the methodology? Can files be re-identified?

Consistency, Reliability and Validity Fundamental problems is inter-record reliability, manpower resource and time constraints The issue then becomes the quality of the quality - - over-marking (specificity) and under-marking (sensitivity) What are acceptable levels of sensitivity and specificity? 100% for sensitivity for names What is the benchmark? What is the value of consistency?

Automated Methodologies: As Good As?/Better? Classification of tokens Sequence tracking problem (using Hidden Markov Models or Conditional Random Fields Rule-based system utilizing global features (sentence position), local features (lexical cues, special characters, and format patterns), and syntactic features Hybrid systems of rules, pattern matching algorithms, heuristics and dictionaries

Local Use Can your methodology be customized to meet local needs? While some methods may have good ‘numbers’, will they hold up in local use? Every community has its own acronyms, place names and other local vocabulary What is the protocol to manage local quality? Regular checks against manual review Formal evaluation research

“Data Rights” Issues Legal models exist Make ‘de-identified” data sharing part of informed consent Offer different tiers of consent Publicly-funded research Academic research Commercial research Make the general public aware of the level of existing data sharing Claims data already widely shared and sold

De-identified Database Query Interface QA FIREWALL Building Collaboration

Call to Action: Pathology Informatics Community caBIG and caTIES are models for cross institutional data sharing Major institutions are establishing data repositories of pathology reports Help facilitate data aggregation among other departments Radiology (Radiology Reports) Surgery (Surgical Notes) Medicine (Discharge Summaries) Establish cross-departments “Rapid Learning” teams