Download presentation
Presentation is loading. Please wait.
Published byCharlie Holliman Modified over 9 years ago
1
On method-specific record linkage for risk assessment Jordi Nin Javier Herranz Vicenç Torra
2
2 Disclosure Risk Scenario: How an intruder re-identifies an individual Preliminaries: Protection methods and Record Linkage Location record linkage: A new way to compute the disclosure risk Conclusions and future work: On method-specific record linkage for risk assessment Contents
3
3 Disclosure Risk Scenario Preliminaries Location Record Linkage Conclusions and future work
4
4 On method-specific record linkage for risk assessment Disclosure Risk Scenario X n a Attribute classification Identifiers: Passport number Quasi-Identifiers: Age, postal code Confidential: Income idSex Marital status Income 1 2... Male... Single … 13.500 11.000 …
5
5 On method-specific record linkage for risk assessment Disclosure Risk Scenario Re-identification scenario X = id || X nc || X c X’ = X’ nc || X c Privacy is ensured, quasi-identifiers are anonymized Data quality is preserved, confidential attributes are preserved
6
6 On method-specific record linkage for risk assessment Disclosure Risk Scenario Data set 1Data set 2 X 1 X 2 X 3 X 4 X’ 1 X’ 2 X’ 3 X’ 4 Problem: Find a correct mapping between data file 1 and data file 2 Record Linkage
7
7 On method-specific record linkage for risk assessment Disclosure Risk Scenario Distance based Record linkage Probabilistic Record linkage The nearest pairs of record are considered as linked pairs It is very easy to tune Results very dependent of the parameters Moderated time cost Linked pairs are computed using conditional probabilities Tuning is difficult Few parameters High time cost
8
8 Disclosure Risk Scenario Preliminaries Location Record Linkage Conclusions and future work
9
9 On method-specific record linkage for risk assessment Preliminaries Rank swapping - p Algorithm For all attr j where 1 j n Attr j is sorted all values x ij are swapped with x il where i < l l+p Sorting Attr j is reversed End for End algorithm Simple Preserve µ and All combinations disappear
10
10 On method-specific record linkage for risk assessment Preliminaries Rank swapping - p example p = 20% 8 6 10 7 9 2 1 4 5 3 1 2 3 4 5 6 7 8 9 10
11
11 On method-specific record linkage for risk assessment Preliminaries Microaggregation - k a k a a a k k k a = 1 Optimal a > 1, NP-Hard Heuristic k=3
12
12 On method-specific record linkage for risk assessment Preliminaries Optimal univariate Microaggregation Result 1. When the elements are sorted according to an attribute, for any optimal partition, the elements in each cluster are contiguous (non overlapping clusters exist) Result 2. All clusters of any optimal partition have between k and 2k-1 elements. x1x1 x2x2 x3x3 x4x4 k = 2 Clusters are built using the nodes of the shortest path algorithm
13
13 On method-specific record linkage for risk assessment Preliminaries MDAV Microaggregation k=2 XX’ MDAV is multivariate heuristic microaggegation
14
14 On method-specific record linkage for risk assessment Preliminaries Score: Protection method evaluation Score = 0.5 IL + 0.5 DR IL = 100(0.2 IL 1 +0.2 IL 2 +0.2 IL 3 +0.2 IL 4 +0.2 IL 5 ) IL 1 = mean of absolute error IL 2 = mean variation of average IL 3 = mean variation of variance IL 4 = mean variation of covariancie IL 5 = mean variation of correlation DR = 0.25 DLD+0.25 PLD+0.5 ID DLD = number of links using DBRL PLD = number of links using PRL ID = protected values near orginal
15
15 Disclosure Risk Scenario Preliminaries Location Record Linkage Conclusions and future work
16
16 On method-specific record linkage for risk assessment Location Problem Desciption L-RL: Location Record Linkage Standard record linkage compares all records Rank swapping, univariate microaggregation and other methods only use some original records to create the protected data set It is unnecessary to compare all the records
17
17 On method-specific record linkage for risk assessment Location record linkage Method Description X ext X’
18
18 On method-specific record linkage for risk assessment Location record linkage Example: Rank swapping P=20% 17 6 13 14 16 19 12 5 16 Distance
19
19 On method-specific record linkage for risk assessment Location record linkage Rank Swapping Experiments Data sets: Census (1080 records & 13 attributes) EIA (4092 records & 10 attributes) Rank swapping configurations: p = 2 … 20 Score modifications: DR = 0.166 LLD+ 0.166 DLD+ 0.166 PLD+ 0.5 ID
20
20 On method-specific record linkage for risk assessment Location record linkage L-RL: Rank Swapping Linkage Results
21
21 On method-specific record linkage for risk assessment Location record linkage L-RL: Rank Swapping Score Results
22
22 On method-specific record linkage for risk assessment Location record linkage Univariate Microaggregation Experiments Data sets: Census (1080 records & 13 attributes) EIA (4092 records & 10 attributes) Univariate microaggregation configurations: k = 10 … 50 Score modifications: DR = 0.166 LLD+ 0.166 DLD+ 0.166 PLD+ 0.5 ID
23
23 On method-specific record linkage for risk assessment Location record linkage L-RL: Univariate Microaggregation Linkage Results
24
24 On method-specific record linkage for risk assessment Location record linkage L-RL: Univariate Microaggregation Score Results
25
25 On method-specific record linkage for risk assessment Location record linkage MDAV Experiments Data sets: Census (1080 records & 13 attributes) EIA (4092 records & 10 attributes) Univariate microaggregation configurations: k = 10 … 50 Score modifications: DR = 0.166 LLD+ 0.166 DLD+ 0.166 PLD+ 0.5 ID
26
26 On method-specific record linkage for risk assessment Location record linkage L-RL: MDAV Linkage Results
27
27 On method-specific record linkage for risk assessment Location record linkage L-RL: MDAV Score Results
28
28 Disclosure Risk Scenario Preliminaries Location Problem Description Location Record Linkage Conclusions and future work
29
29 On method-specific record linkage for risk assessment Conclusions and future work We have presented a new type of record linkage designed to exploit the limitations of some protection methods L-RL method obtains a more accurate DR evaluation for rank swapping and univariate microaggregation MDAV is immune to the location problem Conclusions We plan to study the DR of MDAV and other protection methods using other ad-hoc methods Future work
30
On method-specific record linkage for risk assessment Jordi Nin Javier Herranz Vicenç Torra
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.