Presentation is loading. Please wait.

Presentation is loading. Please wait.

Combinations of SDC methods for continuous microdata Anna Oganian National Institute of Statistical Sciences.

Similar presentations


Presentation on theme: "Combinations of SDC methods for continuous microdata Anna Oganian National Institute of Statistical Sciences."— Presentation transcript:

1 Combinations of SDC methods for continuous microdata Anna Oganian National Institute of Statistical Sciences

2 Introduction  SDC methods have two goals: Minimize Disclosure Risk Maximize Data Utility Rankswapping Rankswapping Additive noise Additive noise Resampling Resampling Microaggregation Microaggregation microaggregation based on one variable microaggregation based on one variable microaggregation based on several variables microaggregation based on several variables Methods for continuous microdata

3 Why combinations? Red points – microaggregated data Green points – original data Example: Microaggregation with z-scores projection For data close to normal we can add normal noise: Mic(O) + N(0, Cov(O)-Cov(Mic(O))) Mic(O) + N(0, Cov(O)-Cov(Mic(O))) Methods have very different properties, so combining them we can improve the utility.

4 Performance measures  Propensity score utility measure (Mi-Ja work) (Mi-Ja work)  Two kinds of DR: - identification disclosure - attribute disclosure

5 DR  Identification disclosure It is considered that disclosure occurs when the intruder can correctly identify a record in the released data file, that is to relate it to a particular individual. It is considered that disclosure occurs when the intruder can correctly identify a record in the released data file, that is to relate it to a particular individual.  Attribute disclosure Intruder's target is an original value of a particular attribute, for example a salary of a particular individual. Intruder's target is an original value of a particular attribute, for example a salary of a particular individual. So attribute disclosure measures the gain in information achieved by the intruder about some attribute after releasing masked data. More precisely - how tight can be found the bounds for the original values given masked data. So attribute disclosure measures the gain in information achieved by the intruder about some attribute after releasing masked data. More precisely - how tight can be found the bounds for the original values given masked data.

6 Examples of attribute disclosure for several methods Assumption: SDC method and parameters are released together with the data set Assumption: SDC method and parameters are released together with the data set Upper and lower bound for every value in the masked data are: If the algorithm of rankswapping is known, so the distribution of the values in these intervals could be found by the intruder by the means of running rankswapping large number of times on the vector of length N. Rankswapping

7 Rankswapping example: Rankswapping example: Suppose data set X has 1000 records and variable j in data set X is lognormal. Rankswapping with p=5% was applied to this data set. The range of the variable is [0.04,25.57]. Choose the value in the dense area of masked data: x j =0.50, so lower and upper bounds for the corresponding original are ]0.42, 0.58[. Consider the largest value in the masked data x=25.57, using the distribution for the highest rank we can find 95% confidence interval [5.20, 6.92]. Consider the smallest value in the masked data x=0.04, 95% confidence interval for the corresponding original data is [0.07, 0.019].

8 Noise addition Variance of added noise is and its mean is 0, so 100%*(1-α) confidence regions around masked records x m could be computed based on multivariate normal distribution: Variance of added noise is and its mean is 0, so 100%*(1-α) confidence regions around masked records x m could be computed based on multivariate normal distribution: Example:

9 Several stages of masking  Ideally the security of the SDC method should be guarantied by the masking algorithm and not depend on keeping in secret the parameter or details of the algorithm. In cryptography Data Encryption Standard (DES)

10 Combinations of the methods  Decrease the Risk  We can even increase utility of the resulted data if we combine properly the methods!  For example: Combine microaggregation with noise Original data M1(Original)M2(M1(Original)) Masking M1 Masking M2

11 Several stages of masking Or in general case: where

12 Combinations of methods  Microaggregation using z-scores projection, p=3 Microaggregation using z-scores projection, p=3 ( Micz03_Micz03 )  Microaggregation using z-scores projection, p=3  Microaggregation using principal component projection, p=3 ( Micz03_Micpcp03 )  Microaggregation using z-scores projection, p=3  Multivariate microaggregation, p=10 ( Micz03_Micmul10 )  Microaggregation using z-scores projection, p=3  Rankswapping, p=1% ( Micz03_Rank1 )  Single Microaggregation using z-scores projection, p=3 ( Micz03 )

13 Propensity score utility sym nonsym high cor low cor high cor low cor posnegposnegposnegposneg micz03__micz03 23.528.337.333.945.1180.14094.1 micz03__micpcp03 12.89.37.99.353.48.65.8 micz03__micmul10 18.416.514.8135.59.211.28.7 micz03__rank1 28.634.827.329.414.84239.526.5 micz03 128.1281.5132.1233.4592.1639.4463.8639

14 Identification DR sym nonsym high cor low cor high cor low cor posnegposnegposnegposneg micz03__micz03 0.02750.00150.00350.00230.00330.00040.00330.0009 micz03__micpcp03 0.01980.25160.01330.00290.08060.29260.04770.18 micz03__micmul10 0.00770.00460.02030.00250.11220.09470.00710.1265 micz03__rank1 0.00920.01190.00870.00670.0340.00790.00960.0091 micz03 0.00360.00250.00240.00190.00430.00110.00120.0011


Download ppt "Combinations of SDC methods for continuous microdata Anna Oganian National Institute of Statistical Sciences."

Similar presentations


Ads by Google