Privacy Preserving Record Linkage

Slides:



Advertisements
Similar presentations
Mix and Match: A Simple Approach to General Secure Multiparty Computation + Markus Jakobsson Bell Laboratories Ari Juels RSA Laboratories.
Advertisements

I have a DREAM! (DiffeRentially privatE smArt Metering) Gergely Acs and Claude Castelluccia {gergely.acs, INRIA 2011.
CSCI3170 Introduction to Database Systems
Secure Evaluation of Multivariate Polynomials
Multi-Dimensional Range Query over Encrypted Data Authors: Elaine Shi, Joint work with John Bethencourt, Hubert Chan, Dawn Song, Adrian Perrig Slides originated.
Lect. 18: Cryptographic Protocols. 2 1.Cryptographic Protocols 2.Special Signatures 3.Secret Sharing and Threshold Cryptography 4.Zero-knowledge Proofs.
Efficient Private Techniques for Verifying Social Proximity Michael J. Freedman and Antonio Nicolosi Discussion by: A. Ziad Hatahet.
Reusable Anonymous Return Channels
UTEPComputer Science Dept.1 University of Texas at El Paso Privacy in Statistical Databases Dr. Luc Longpré Computer Science Department Spring 2006.
 Guarantee that EK is safe  Yes because it is stored in and used by hw only  No because it can be obtained if someone has physical access but this can.
CSCE 715 Ankur Jain 11/16/2010. Introduction Design Goals Framework SDT Protocol Achievements of Goals Overhead of SDT Conclusion.
Certificateless encryption and its infrastructures Dr. Alexander W. Dent Information Security Group Royal Holloway, University of London.
A Customizable k-Anonymity Model for Protecting Location Privacy Written by: B. Gedik, L.Liu Presented by: Tal Shoseyov.
CSE 597E Fall 2001 PennState University1 Digital Signature Schemes Presented By: Munaiza Matin.
Privacy-Preserving P2P Data Sharing with OneSwarm -Piggy.
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
Scalable Security and Accounting Services for Content-based Publish/Subscribe Systems Himanshu Khurana NCSA, University of Illinois.
1 TAPAS Workshop Nicola Mezzetti - TAPAS Workshop Bologna Achieving Security and Privacy on the Grid Nicola Mezzetti.
Untraceable Electronic Mail, Return Addresses, and Digital Pseudonyms David Chaum CACM Vol. 24 No. 2 February 1981 Presented by: Adam Lee 1/24/2006 David.
An efficient secure distributed anonymous routing protocol for mobile and wireless ad hoc networks Authors: A. Boukerche, K. El-Khatib, L. Xu, L. Korba.
Li Xiong CS573 Data Privacy and Security Healthcare privacy and security: Genomic data privacy.
OHT 11.1 © Marketing Insights Limited 2004 Chapter 9 Analysis and Design EC Security.
Software Engineering General architecture. Architectural components:  Program organisation overview Major building blocks in a system Definition of each.
10/8/2015CST Computer Networks1 IP Routing CST 415.
On the Practical Feasibility of Secure Distributed Computing A Case Study Gregory Neven, Frank Piessens, Bart De Decker Dept. of Computer Science, K.U.Leuven.
A Survey Based Seminar: Data Cleaning & Uncertain Data Management Speaker: Shawn Yang Supervisor: Dr. Reynold Cheng Prof. David Cheung
Fall, Privacy&Security - Virginia Tech – Computer Science Click to edit Master title style Collusion-Resistant Group Key Management Using Attribute-
Privacy Communication Privacy Confidentiality Access Policies Systems Crypto Enforced Computing on Encrypted Data Searching and Reporting Fully Homomorphic.
Secure Sensor Data/Information Management and Mining Bhavani Thuraisingham The University of Texas at Dallas October 2005.
Data Access and Security in Multiple Heterogeneous Databases Afroz Deepti.
m-Privacy for Collaborative Data Publishing
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Cryptographic methods. Outline  Preliminary Assumptions Public-key encryption  Oblivious Transfer (OT)  Random share based methods  Homomorphic Encryption.
Systems Architecture Receiver Anonymity Matthias Füssel, Dennis Schneider June 5, 2007.
Network security Presentation AFZAAL AHMAD ABDUL RAZAQ AHMAD SHAKIR MUHAMMD ADNAN WEB SECURITY, THREADS & SSL.
Shucheng Yu, Cong Wang, Kui Ren,
MIT – Laboratory for Computer Science
Data Management on Opportunistic Grids
Security Outline Encryption Algorithms Authentication Protocols
Building Distributed Educational Applications using P2P
Advanced Computer Networks
Hybrid Cloud Architecture for Software-as-a-Service Provider to Achieve Higher Privacy and Decrease Securiity Concerns about Cloud Computing P. Reinhold.
Cameron Blashka| Informer Implementation Specialist
Privacy Preserving Similarity Evaluation of Time Series Data
Entity-Relationship Model
Database System Implementation CSE 507
Sindhusha Doddapaneni
pVault Sharing Architecture
CHAPTER 3 Architectures for Distributed Systems
Group theory exercise.
Cor Loef Philips Healthcare
Offline Auditing for Privacy
Untraceable Electronic Mail, Return addresses, and Digital Pseudonyms
Cryptography and Network Security
By (Group 17) Mahesha Yelluru Rao Surabhee Sinha Deep Vakharia
Pooja programmer,cse department
Network Security – Kerberos
Autonomous Aggregate Data Analytics in Untrusted Cloud
Multi-party Authentication in Web Services
Starting Design: Logical Architecture and UML Package Diagrams
Privacy in Content-Oriented Networking: Threats and Countermeasures
Implementing an OpenFlow Switch on the NetFPGA platform
JINI ICS 243F- Distributed Systems Middleware, Spring 2001
Privacy preserving cloud computing
International Data Encryption Algorithm
CSC 774 Advanced Network Security
ITAS Cash Management Integration to an ERP
Helen: Maliciously Secure Coopetitive Learning for Linear Models
Oblivious Transfer.
Presentation transcript:

Privacy Preserving Record Linkage Diptendu Kar, Ibrahim Lazrig, Indrakshi Ray, Indrajit Ray Colorado State University

Motivation Clinical data must be shared for research, data analytics etc. Data must be anonymized as per federal, state policies Data often sanitized to remove the identifying information from patient records Problem with sanitization: Cannot link records belonging to the same patient across multiple institutions Cannot propagate the research/data analytics result back to the patient

Our Progress Propose privacy-preserving record linkage protocol that uses a semi-trusted third party Third party does not learn about identities of patients Different sources and third party do not share keys Allows traceability – we can propagate the research results back to the patients Deterministic linking – records having exactly same attributes can be linked Future work Address approximate match Minimize involvement of third party

Problem Overview A B C D NAME SSN ADDRESS DIAGNOSIS TREATMENT … John Doe 12345 Fairfax DIAG555 TRT444 NAME SSN ADDRESS DIAGNOSIS TREATMENT … John Doe 12345 Fairfax DIAG100 TRT100 NAME SSN ADDRESS DIAGNOSIS TREATMENT … John Doe 12345 Fairfax DIAG100 TRT9 NAME SSN ADDRESS DIAGNOSIS TREATMENT … John Doe 12345 Fairfax DIAG100 TRT100 C D

Problem Overview A B Identifiers C D Sensitive Information NAME SSN ADDRESS DIAGNOSIS TREATMENT SOURCE John Doe 12345 Fairfax DIAG100 TRT100  A DIAG555 TRT444  B TRT9 C D  Identifiers C D Sensitive Information

Traditional Sanitization Different patients ! A B NAME SSN ADDRESS DIAGNOSIS TREATMENT … John Doe 12345 Fairfax DIAG555 TRT444 NAME SSN ADDRESS DIAGNOSIS TREATMENT … John Doe 12345 Fairfax DIAG100 TRT100 Random ID #3 Random ID #1 NAME SSN ADDRESS DIAGNOSIS TREATMENT … John Doe 12345 Fairfax DIAG100 TRT9 NAME SSN ADDRESS DIAGNOSIS TREATMENT … John Doe 12345 Fairfax DIAG100 TRT100 Random ID #2 Random ID #4 C D

Independent Encryption Still Different Patients ! A B NAME SSN ADDRESS DIAGNOSIS TREATMENT … 404040 5050 606060 DIAG555 TRT444 NAME SSN ADDRESS DIAGNOSIS TREATMENT … 101010 202020 303030 DIAG100 TRT100 NAME SSN ADDRESS DIAGNOSIS TREATMENT … 707070 808080 909090 DIAG100 TRT9 NAME SSN ADDRESS DIAGNOSIS TREATMENT … 5555 6666 7777 DIAG100 TRT100 C D

Privacy Preserving Record Linkage Goal: Privacy Preserving Record Linkage across multiple-institutions without revealing individual data records to each other Input: Two or more records with common primary keys or common identifiers Output: One single record that aggregates the other fields based on the common identifier Constraints: No party should learn anything about the commonality of the identifiers Our protocol provides a viable solution to this problem

System Assumptions N competitive health publishers who hold health records M researchers who are interested in aggregated anonymized results. A semi-trusted broker who Blindly link the records and hides the source of information. Cannot learn about the sensitive identifiers on which the linkage is happening Will follow the protocol honestly and will not misbehave Limited in capabilities to keep a secret confidential

Our Protocol Semi-trusted Broker A B Performs an additional computation NAME SSN ADDRESS DIAGNOSIS TREATMENT … 404040 5050 606060 DIAG555 TRT444 NAME SSN ADDRESS DIAGNOSIS TREATMENT … 101010 202020 303030 DIAG100 TRT100 NAME SSN ADDRESS DIAGNOSIS TREATMENT … 707070 808080 909090 DIAG100 TRT9 NAME SSN ADDRESS DIAGNOSIS TREATMENT … 5555 6666 7777 DIAG100 TRT100 C D

Our Protocol A B Compare Same Patients ! C D NAME SSN ADDRESS DIAGNOSIS TREATMENT SOURCE … α ϴ δ DIAG555 TRT444 B NAME SSN ADDRESS DIAGNOSIS TREATMENT SOURCE … α ϴ δ DIAG100 TRT100 D NAME SSN ADDRESS DIAGNOSIS TREATMENT SOURCE … α ϴ δ DIAG100 TRT9 C NAME SSN ADDRESS DIAGNOSIS TREATMENT SOURCE … α ϴ δ DIAG100 TRT100 A  A B Compare NAME SSN ADDRESS DIAGNOSIS TREATMENT α ϴ δ DIAG100 TRT100 DIAG555 TRT444 TRT9 Same Patients ! C D

Our Protocol Major components Protocol phases Institutions / Data Source – Publisher The third party – Broker Researchers / Consumer – Subscriber Protocol phases Setup phase Encryption of query results phase Secure record matching phase El Gamal cryptosystem since it has multiplicative homomorphic property

1. Setup Phase SKn rn SK1 r1 SK2 r2 1 2 n …………………. Publisher 1 starts by encrypting r1-1 with broker public key E β(r1-1) Publisher 2 encrypts its own secret key with β and multiplies the result with the incoming result Publisher “n” performs the same function E β(SKn) * … *E β(SK2) * E β(r1-1) E β(SK2) * E β(r1-1) BROKER shares its primitives with all publishers (p - Prime, g – Generator , β – Public Key). Each publisher choose a secret key and random number

1. Setup Phase Key-converter: a re-encryption key for encrypted records from other publishers Key-converter generation: Publisher 1 receives the final result of the setup phase This is the key-converter for publisher 1 Other key-converters Publisher 2 - E β(SKn* SK3 * SK1 * r2-1 ) …… Publisher n - E β(SKn-1 * SK2 * SK1 * rn-1 ) E β(SKn) * … *E β(SK2) * E β(r1-1) = E β(SKn* SKn-1 * SK2 * r1-1 )

2. Encryption of query results SKn rn SK1 r1 SK2 r2 1 2 n …………………. E(SK1 * r1) (M1) E(SK2 * r2) (M2) E(SKn * rn) (Mn) E β(SKn* SK3 * SK1 * r2-1 ) E β(SKn-1 * SK2 * SK1 * rn-1 ) E β(SKn* SKn-1 * SK2 * r1-1 )

3. Secure record matching phase Decrypt the key-converter from each publisher. D(E β(SKn* SKn-1 * SK2 * r1-1 )) = (SKn* SKn-1 * SK2 * r1-1 ) Re-encrypt the data coming from each publisher with their associated key-converter. E.g., E(SK1 * r1) (M1) and (SKn* SKn-1 * SK2 * r1-1 ) E (SKn* SKn-1 * SK2 * r1-1 )(ESK1 * r1 (M1)) = E (SKn * … * SK2 * SK1) (M1) Similarly, For publisher 2 - E (SKn * … * SK2 * SK1) (M2) … For publisher n - E (SKn * … * SK2 * SK1) (Mn)

Implementation

Implementation Oracle’s JavaTM technology. Java Development Kit (JDK) 1.8.0131 : java.io, java.math, java.net, java.sql, java.util The MySQL database management system acts as the data source Coordinate via passing messages. (event driven) Cryptosystem written to include homomorphic property Assumptions - one instance of the Broker and databases used by the publishers and the subscribers have the same schema

Architecture …………… ……………

Alternate Architecture …………… E β(r1-1) E β(SK1) E β(r2-1) E β(SK2) E β(rn-1) E β(SKn) Compute the “key-converters” Keystore ……………

Workflow …………… Provide IP and Port to connect to Broker Keeps a list of all active publishers. Creates a network overlay containing each publisher and its neighbor’s details Provide IP and Port to connect to Broker ……………

Workflow - Setup …………… Setup Completed Response Setup Forward Request Setup Initiate Request Setup Forward Request Collect all responses and notify Begin setup phase

Workflow – phase 2 and 3 …………… Data Fetch Request Send data along with a data fetch completed message Check if all completed messages is received from the list. Re-encrypt with key-converter and send result Send request to those publishers requested by subscriber Maintains a list of such publishers Fetch Records Display result

Workflow – retrospective query Subscriber requests for more information based on broker’s provided ID. Broker keeps a mapping table of incoming records, converted records and source. Broker finds the source and incoming record ID from its mapping table. Publisher keeps a mapping table of plaintext and encrypted records Publisher finds exact record from its mapping table

Conclusion Performs privacy preserving record linkage without using manually participating honest brokers Can be ported easily to any architecture and OS supporting the Java technology Different components of the application have been implemented as standalone multi-threaded processes

Future Work Include fault-tolerance in the system Continue and recover when one or more publisher is unreachable. Extend this work to “probabilistic linkage” E.g. Jon Smith – John Smith Make the protocol collusion resistant Obviate the need for third parties

Thank You !