Presentation is loading. Please wait.

Presentation is loading. Please wait.

David GalindoEric R. Verheul Computer Science DepartmentPWC Netherlands & University of MalagaUniversity of Nijmegen Microdata Sharing Via Pseudonymization.

Similar presentations


Presentation on theme: "David GalindoEric R. Verheul Computer Science DepartmentPWC Netherlands & University of MalagaUniversity of Nijmegen Microdata Sharing Via Pseudonymization."— Presentation transcript:

1 David GalindoEric R. Verheul Computer Science DepartmentPWC Netherlands & University of MalagaUniversity of Nijmegen Microdata Sharing Via Pseudonymization UNECE Work session on statistical data confidentiality Manchester, 2007 December 18th TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA

2 20-06-2006 2 MotivationMotivation  Individuals microdata is essential for empirical research  Its direct release thwarts the privacy of the individuals  Goal: to build privacy-preserving microdata sharing systems through pseudonymization

3 20-06-2006 3 Problem statement  Suppliers own confidential microdata on individuals ((id 1,D(id 1 )),…, (id n,D(id n ))  Researchers want to correlate microdata from different Suppliers  Example: A Researcher wants to find out the correlation between drug prescription (Chemists) and traffic accidents (Insurers)  Question: How to enable Researchers to correlate microdata without having access to sensitive information?

4 20-06-2006 4 Framework id 1 DataChm(id 1 )...... id n DataChm(id n ) id m DataIns(id m )...... id m DataIns(id t ) Maybe de- identified data? id 1 DataChm(id 1 )...... id n DataChm(id n ) I want to correlate

5 20-06-2006 5 Supplying de-identified data DataChm(id 1 )... DataChm(id n ) DataIns(id m )... DataIns(id t ) If Suppliers de-identify the data by: - removing the identifier field -applying Statistical Disclosure Control (SDC) mechanisms no sensitive information is leaked, but… Matching is not possible!

6 20-06-2006 6 Pseudonymizing data via TTPs  Solution 1: a Trusted Third Party replaces real identifiers by random identifiers (pseudonyms) id 1 P(id 1 )...... id l P(id l ) Where P(id) is random This table is only know to the TTP P(id m ) DataIns(id m )...... P(id t )DataIns(id t ) P(id 1 )DataChm(id 1 )...... P(id n )DataChm(id n ) Matching!

7 20-06-2006 7 Pseudonymizing data via TTPs (II)  Advantages: Unconditional security (w.r.t. pnymization) Matching is possible  Drawback: TTP must store a huge table secretly  Solution 2: Use a block cipher (Enc(K,·),Dec(K,·)), and then P(id)= Enc(K,id)  Advantage: Only the key K must be stored secretly  Drawbacks: Security is not unconditional Different Researchers might not have the same access rights

8 20-06-2006 8 Pseudonymizing data via TTPs (III) P(id m )DataIns(id m ).. P(id*)DataIns(id*).. P(id t )DataIns(id t ) P(id 1 )DataChm(id 1 ) P(id*)DataChm(id* ).... P(id n )DataChm(id n ) Not allowed to match Chemists and Insurers data We share and win!

9 20-06-2006 9 Pseudonymizing data via TTPs (IV)  Solution 3: Allocate a different key K i for every Researcher R i  Pseudonyms are destination-dependant: P(id,R i )=Enc(K i,id) P(id m,R 2 ) DataIns(id m ).. P(id*,R 2 )DataIns(id*).. P(id t,R 2 )DataIns(id t ) P(id 1,R 1 )DataChm(id 1 ) P(id*,R 1 )DataChm(id* ).... P(id n,R 1 )DataChm(id n ) P(id*,R 1 ) and P(id*,R 2 ) look unrelated

10 20-06-2006 10 Pseudonymizing data via TTPs (V)  Advantage: Disallowed matching among malicious Researchers is prevented  Drawbacks: TTP must be on-line to perform sensitive operations (pseudonymization and matching) Let’s see why…

11 20-06-2006 11 Pseudonymization with symmetric encryption Supplying pseudonymized data: Supplier S j sends datablocks D(id 1 ),…,D(id l ) to Researcher R i S j sends the identities id 1,…,id l in the same order to the TTP TTP sends the list P(id,R i )=Enc(K i,id) to R i R i forms the pnymized database (P(id 1,R i ),D(id 1 )),…,(P(id l,R i ),D(id l ))

12 20-06-2006 12 Pseudonymization with symmetric encryption  Matching R i and R d pnymized databases: R i sends to R d the data D(id 1,i),…,D(id l,i) R i sends to TTP P(id 1,R i ),…, P(id l,R i ) TTP decrypts Dec(K i,P(id,R i ))=id and encrypts P(id,R d )=Enc(K d,id). The result is sent to R d R d matches the pnymized databases (P(id 1,R d ),D(id 1,i)),…,(P(id l,R d ),D(id l,i)) (P(id l,R d ),D(id 1,d)),…,(P(id m,R d ),D(id m,d))  As a result the TTP is a bottleneck to the system P(id m,R d ) D(id m,R d ).. P(id*,R d )D(id*,R d ).. P(id t,R d )D(id t,R d ) P(id 1,R i )D(id 1,R i ) P(id*,R i )D(id*,R i ).... P(id n,R i )D(id n,R i )

13 20-06-2006 13 Pseudonymization using public key crypto  Let G= a prime order group. Let H:{0,1}* ! G a hash function  TTP assigns a secret key x i 2 Z p to Researcher R i  P(id,R i )=H(id) x{i}  Supplying pseudonymized data from S j to R i Supplier S j and Researcher R i jointly compute the pnymized database {P(id,R i ),D(id)} TTP allocates pnymizing keys ( ¹, º ) 2 Z p £ Z p, such that ¹ ¢ º =x i ; ¹ is sent to S i, º is sent to R j S j computes and sends H(id 1 ) ¹,…,H(id l ) ¹ to R j R j computes (H(id) ¹ ) º =H(id) x{i} =P(id,R i ) R i forms the pnymized database (P(id 1,R i ),D(id 1 )),…,(P(id l,R i ),D(id l ))

14 20-06-2006 14 Pseudonymization with public key crypto (II)  Matching R i and R d pnymized databases: This can be done by R i and R d with a 1-round interactive protocol provided certain keys are obtained off-line from the TTP R i nor R d learn their pnymizing keys x i, x d even if colluding R d only learns D(id,R i ) for id’s in the intersection Security is based on Decision Diffie-Hellman assumption H(id m ) x{j} D(id m,R d ).. H(id*) x{j} D(id*,R d ).. H(id t ) x{j} D(id t,R d ) H(id 1 ) x{i} D(id 1,R i ) H(id*) x{i} D(id*,R i ).... H(id n ) x{i} D(id n,R i )

15 20-06-2006 15 Pseudonymization with public key crypto (III)  Advantages: Matching is possible Disallowed matching among malicious Researchers is prevented TTP is not a bottleneck (only delivers off-line crypto keys)  Drawbacks: Suppliers must collaborate for every pnymization Interactive protocols (on-line communication)

16 20-06-2006 16 Advanced setting

17 20-06-2006 17 PropertiesProperties  Suppliers and Accumulators are assumed Honest- But-Curious  Researchers are assumed Malicious  Accumulators’ intersection and union operations are non-interactive  Two levels of pseudonymization corresponding to the different levels of trust  It uses ‘composite bilinear groups’

18 20-06-2006 18 GovernanceGovernance  The allowance of these protocols is governed by a Regulatory Privacy Body (RPB) from a functional perspective. A strict licensing infrastructure will be enforced by the RPB, describing: Which parties are allowed to perform what protocols with each What kind of data can be exchanged Which subsets of identities or pnyms are allowed as input to the protocols

19 20-06-2006 19 Thanks!


Download ppt "David GalindoEric R. Verheul Computer Science DepartmentPWC Netherlands & University of MalagaUniversity of Nijmegen Microdata Sharing Via Pseudonymization."

Similar presentations


Ads by Google