Download presentation
Presentation is loading. Please wait.
Published byHelen Gibson Modified over 9 years ago
1
David GalindoEric R. Verheul Computer Science DepartmentPWC Netherlands & University of MalagaUniversity of Nijmegen Microdata Sharing Via Pseudonymization UNECE Work session on statistical data confidentiality Manchester, 2007 December 18th TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA
2
20-06-2006 2 MotivationMotivation Individuals microdata is essential for empirical research Its direct release thwarts the privacy of the individuals Goal: to build privacy-preserving microdata sharing systems through pseudonymization
3
20-06-2006 3 Problem statement Suppliers own confidential microdata on individuals ((id 1,D(id 1 )),…, (id n,D(id n )) Researchers want to correlate microdata from different Suppliers Example: A Researcher wants to find out the correlation between drug prescription (Chemists) and traffic accidents (Insurers) Question: How to enable Researchers to correlate microdata without having access to sensitive information?
4
20-06-2006 4 Framework id 1 DataChm(id 1 )...... id n DataChm(id n ) id m DataIns(id m )...... id m DataIns(id t ) Maybe de- identified data? id 1 DataChm(id 1 )...... id n DataChm(id n ) I want to correlate
5
20-06-2006 5 Supplying de-identified data DataChm(id 1 )... DataChm(id n ) DataIns(id m )... DataIns(id t ) If Suppliers de-identify the data by: - removing the identifier field -applying Statistical Disclosure Control (SDC) mechanisms no sensitive information is leaked, but… Matching is not possible!
6
20-06-2006 6 Pseudonymizing data via TTPs Solution 1: a Trusted Third Party replaces real identifiers by random identifiers (pseudonyms) id 1 P(id 1 )...... id l P(id l ) Where P(id) is random This table is only know to the TTP P(id m ) DataIns(id m )...... P(id t )DataIns(id t ) P(id 1 )DataChm(id 1 )...... P(id n )DataChm(id n ) Matching!
7
20-06-2006 7 Pseudonymizing data via TTPs (II) Advantages: Unconditional security (w.r.t. pnymization) Matching is possible Drawback: TTP must store a huge table secretly Solution 2: Use a block cipher (Enc(K,·),Dec(K,·)), and then P(id)= Enc(K,id) Advantage: Only the key K must be stored secretly Drawbacks: Security is not unconditional Different Researchers might not have the same access rights
8
20-06-2006 8 Pseudonymizing data via TTPs (III) P(id m )DataIns(id m ).. P(id*)DataIns(id*).. P(id t )DataIns(id t ) P(id 1 )DataChm(id 1 ) P(id*)DataChm(id* ).... P(id n )DataChm(id n ) Not allowed to match Chemists and Insurers data We share and win!
9
20-06-2006 9 Pseudonymizing data via TTPs (IV) Solution 3: Allocate a different key K i for every Researcher R i Pseudonyms are destination-dependant: P(id,R i )=Enc(K i,id) P(id m,R 2 ) DataIns(id m ).. P(id*,R 2 )DataIns(id*).. P(id t,R 2 )DataIns(id t ) P(id 1,R 1 )DataChm(id 1 ) P(id*,R 1 )DataChm(id* ).... P(id n,R 1 )DataChm(id n ) P(id*,R 1 ) and P(id*,R 2 ) look unrelated
10
20-06-2006 10 Pseudonymizing data via TTPs (V) Advantage: Disallowed matching among malicious Researchers is prevented Drawbacks: TTP must be on-line to perform sensitive operations (pseudonymization and matching) Let’s see why…
11
20-06-2006 11 Pseudonymization with symmetric encryption Supplying pseudonymized data: Supplier S j sends datablocks D(id 1 ),…,D(id l ) to Researcher R i S j sends the identities id 1,…,id l in the same order to the TTP TTP sends the list P(id,R i )=Enc(K i,id) to R i R i forms the pnymized database (P(id 1,R i ),D(id 1 )),…,(P(id l,R i ),D(id l ))
12
20-06-2006 12 Pseudonymization with symmetric encryption Matching R i and R d pnymized databases: R i sends to R d the data D(id 1,i),…,D(id l,i) R i sends to TTP P(id 1,R i ),…, P(id l,R i ) TTP decrypts Dec(K i,P(id,R i ))=id and encrypts P(id,R d )=Enc(K d,id). The result is sent to R d R d matches the pnymized databases (P(id 1,R d ),D(id 1,i)),…,(P(id l,R d ),D(id l,i)) (P(id l,R d ),D(id 1,d)),…,(P(id m,R d ),D(id m,d)) As a result the TTP is a bottleneck to the system P(id m,R d ) D(id m,R d ).. P(id*,R d )D(id*,R d ).. P(id t,R d )D(id t,R d ) P(id 1,R i )D(id 1,R i ) P(id*,R i )D(id*,R i ).... P(id n,R i )D(id n,R i )
13
20-06-2006 13 Pseudonymization using public key crypto Let G= a prime order group. Let H:{0,1}* ! G a hash function TTP assigns a secret key x i 2 Z p to Researcher R i P(id,R i )=H(id) x{i} Supplying pseudonymized data from S j to R i Supplier S j and Researcher R i jointly compute the pnymized database {P(id,R i ),D(id)} TTP allocates pnymizing keys ( ¹, º ) 2 Z p £ Z p, such that ¹ ¢ º =x i ; ¹ is sent to S i, º is sent to R j S j computes and sends H(id 1 ) ¹,…,H(id l ) ¹ to R j R j computes (H(id) ¹ ) º =H(id) x{i} =P(id,R i ) R i forms the pnymized database (P(id 1,R i ),D(id 1 )),…,(P(id l,R i ),D(id l ))
14
20-06-2006 14 Pseudonymization with public key crypto (II) Matching R i and R d pnymized databases: This can be done by R i and R d with a 1-round interactive protocol provided certain keys are obtained off-line from the TTP R i nor R d learn their pnymizing keys x i, x d even if colluding R d only learns D(id,R i ) for id’s in the intersection Security is based on Decision Diffie-Hellman assumption H(id m ) x{j} D(id m,R d ).. H(id*) x{j} D(id*,R d ).. H(id t ) x{j} D(id t,R d ) H(id 1 ) x{i} D(id 1,R i ) H(id*) x{i} D(id*,R i ).... H(id n ) x{i} D(id n,R i )
15
20-06-2006 15 Pseudonymization with public key crypto (III) Advantages: Matching is possible Disallowed matching among malicious Researchers is prevented TTP is not a bottleneck (only delivers off-line crypto keys) Drawbacks: Suppliers must collaborate for every pnymization Interactive protocols (on-line communication)
16
20-06-2006 16 Advanced setting
17
20-06-2006 17 PropertiesProperties Suppliers and Accumulators are assumed Honest- But-Curious Researchers are assumed Malicious Accumulators’ intersection and union operations are non-interactive Two levels of pseudonymization corresponding to the different levels of trust It uses ‘composite bilinear groups’
18
20-06-2006 18 GovernanceGovernance The allowance of these protocols is governed by a Regulatory Privacy Body (RPB) from a functional perspective. A strict licensing infrastructure will be enforced by the RPB, describing: Which parties are allowed to perform what protocols with each What kind of data can be exchanged Which subsets of identities or pnyms are allowed as input to the protocols
19
20-06-2006 19 Thanks!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.