Presentation is loading. Please wait.

Presentation is loading. Please wait.

On method-specific record linkage for risk assessment Jordi Nin Javier Herranz Vicenç Torra.

Similar presentations


Presentation on theme: "On method-specific record linkage for risk assessment Jordi Nin Javier Herranz Vicenç Torra."— Presentation transcript:

1 On method-specific record linkage for risk assessment Jordi Nin Javier Herranz Vicenç Torra

2 2  Disclosure Risk Scenario: How an intruder re-identifies an individual  Preliminaries: Protection methods and Record Linkage  Location record linkage: A new way to compute the disclosure risk  Conclusions and future work: On method-specific record linkage for risk assessment Contents

3 3 Disclosure Risk Scenario Preliminaries Location Record Linkage Conclusions and future work

4 4 On method-specific record linkage for risk assessment Disclosure Risk Scenario X n a Attribute classification Identifiers: Passport number Quasi-Identifiers: Age, postal code Confidential: Income idSex Marital status Income 1 2... Male... Single … 13.500 11.000 …

5 5 On method-specific record linkage for risk assessment Disclosure Risk Scenario Re-identification scenario X = id || X nc || X c X’ = X’ nc || X c Privacy is ensured, quasi-identifiers are anonymized Data quality is preserved, confidential attributes are preserved

6 6 On method-specific record linkage for risk assessment Disclosure Risk Scenario Data set 1Data set 2 X 1 X 2 X 3 X 4 X’ 1 X’ 2 X’ 3 X’ 4 Problem: Find a correct mapping between data file 1 and data file 2 Record Linkage

7 7 On method-specific record linkage for risk assessment Disclosure Risk Scenario Distance based Record linkage Probabilistic Record linkage The nearest pairs of record are considered as linked pairs It is very easy to tune Results very dependent of the parameters Moderated time cost Linked pairs are computed using conditional probabilities Tuning is difficult Few parameters High time cost

8 8 Disclosure Risk Scenario Preliminaries Location Record Linkage Conclusions and future work

9 9 On method-specific record linkage for risk assessment Preliminaries Rank swapping - p Algorithm For all attr j where 1  j  n Attr j is sorted all values x ij are swapped with x il where i < l  l+p Sorting Attr j is reversed End for End algorithm Simple Preserve µ and  All combinations disappear

10 10 On method-specific record linkage for risk assessment Preliminaries Rank swapping - p example p = 20% 8 6 10 7 9 2 1 4 5 3 1 2 3 4 5 6 7 8 9 10

11 11 On method-specific record linkage for risk assessment Preliminaries Microaggregation - k a k a a a k k k a = 1  Optimal a > 1, NP-Hard  Heuristic k=3

12 12 On method-specific record linkage for risk assessment Preliminaries Optimal univariate Microaggregation Result 1. When the elements are sorted according to an attribute, for any optimal partition, the elements in each cluster are contiguous (non overlapping clusters exist) Result 2. All clusters of any optimal partition have between k and 2k-1 elements. x1x1 x2x2 x3x3 x4x4 k = 2 Clusters are built using the nodes of the shortest path algorithm

13 13 On method-specific record linkage for risk assessment Preliminaries MDAV Microaggregation k=2 XX’ MDAV is multivariate heuristic microaggegation

14 14 On method-specific record linkage for risk assessment Preliminaries Score: Protection method evaluation Score = 0.5 IL + 0.5 DR IL = 100(0.2 IL 1 +0.2 IL 2 +0.2 IL 3 +0.2 IL 4 +0.2 IL 5 ) IL 1 = mean of absolute error IL 2 = mean variation of average IL 3 = mean variation of variance IL 4 = mean variation of covariancie IL 5 = mean variation of correlation DR = 0.25 DLD+0.25 PLD+0.5 ID DLD = number of links using DBRL PLD = number of links using PRL ID = protected values near orginal

15 15 Disclosure Risk Scenario Preliminaries Location Record Linkage Conclusions and future work

16 16 On method-specific record linkage for risk assessment Location Problem Desciption L-RL: Location Record Linkage Standard record linkage compares all records Rank swapping, univariate microaggregation and other methods only use some original records to create the protected data set It is unnecessary to compare all the records

17 17 On method-specific record linkage for risk assessment Location record linkage Method Description X ext X’

18 18 On method-specific record linkage for risk assessment Location record linkage Example: Rank swapping P=20% 17 6 13 14 16 19 12 5 16 Distance

19 19 On method-specific record linkage for risk assessment Location record linkage Rank Swapping Experiments Data sets: Census (1080 records & 13 attributes) EIA (4092 records & 10 attributes) Rank swapping configurations: p = 2 … 20 Score modifications: DR = 0.166 LLD+ 0.166 DLD+ 0.166 PLD+ 0.5 ID

20 20 On method-specific record linkage for risk assessment Location record linkage L-RL: Rank Swapping Linkage Results

21 21 On method-specific record linkage for risk assessment Location record linkage L-RL: Rank Swapping Score Results

22 22 On method-specific record linkage for risk assessment Location record linkage Univariate Microaggregation Experiments Data sets: Census (1080 records & 13 attributes) EIA (4092 records & 10 attributes) Univariate microaggregation configurations: k = 10 … 50 Score modifications: DR = 0.166 LLD+ 0.166 DLD+ 0.166 PLD+ 0.5 ID

23 23 On method-specific record linkage for risk assessment Location record linkage L-RL: Univariate Microaggregation Linkage Results

24 24 On method-specific record linkage for risk assessment Location record linkage L-RL: Univariate Microaggregation Score Results

25 25 On method-specific record linkage for risk assessment Location record linkage MDAV Experiments Data sets: Census (1080 records & 13 attributes) EIA (4092 records & 10 attributes) Univariate microaggregation configurations: k = 10 … 50 Score modifications: DR = 0.166 LLD+ 0.166 DLD+ 0.166 PLD+ 0.5 ID

26 26 On method-specific record linkage for risk assessment Location record linkage L-RL: MDAV Linkage Results

27 27 On method-specific record linkage for risk assessment Location record linkage L-RL: MDAV Score Results

28 28 Disclosure Risk Scenario Preliminaries Location Problem Description Location Record Linkage Conclusions and future work

29 29 On method-specific record linkage for risk assessment Conclusions and future work We have presented a new type of record linkage designed to exploit the limitations of some protection methods L-RL method obtains a more accurate DR evaluation for rank swapping and univariate microaggregation MDAV is immune to the location problem Conclusions We plan to study the DR of MDAV and other protection methods using other ad-hoc methods Future work

30 On method-specific record linkage for risk assessment Jordi Nin Javier Herranz Vicenç Torra


Download ppt "On method-specific record linkage for risk assessment Jordi Nin Javier Herranz Vicenç Torra."

Similar presentations


Ads by Google