Download presentation
Presentation is loading. Please wait.
1
k Beyond k-Anonimity: A Decision Theoretic Framework for Assessing Privacy Risk M.Scannapieco, G.Lebanon, M.R.Fouad and E.Bertino
2
Introduction Release of data –Private organizations can benefit from sharing data with others –Public organizations see data as a value for the society Privacy preservation –Data disclosure can lead to economic damages, threats to national security, etc. –Regulated by law in both private and public sectors
3
Two Facets of Data Privacy Identity disclosure –Uncontrolled data release: even presence of identifiers –Anonymous data release: identifiers suppressed, but no control on possible linking with other sources
4
PrivateIDSSNDOBZIPHealth_Problem a11/20/6700198Shortness of breath b02/07/8100159Headache c02/07/8100156Obesity d08/07/7600198Shortness of breath PrivateIDSSNDOBZIPEmploymentMarital Status 1A11/20/6700198ResearcherMarried 5E08/07/7600114Private Employee Married 3C02/07/8100156Public Employee Widow T1 T2 Linkage of Anonymous Data QUASI-IDENTIFIER
5
Two Facets of Data Privacy (cont.) Sensitive information disclosure –Once identity disclosure occurs, the loss due to such disclosure depends on how much sensitive are the related data –Data sensitivity is subjective E.g.: for women the age is in general more sensitive than for men
6
Our proposal A framework for assessing privacy risk that takes into accounts both facets of privacy –based on statistical decision theory Definition and analysis of: disclosure policies modelled by disclosure rules and several privacy risk functions Estimated risk as an upper-bound of true risk and realted complexity analysis Algorithm for finding the disclosure rule minimizing the privacy risk
7
Disclosure rules A disclosure rule is a function that maps a record to a new record in which some attributes may have been suppressed Z j = The j-th attribute is suppressed otherwise
8
Loss function Let be the side information used by the attacker in the identification attempt The loss function Measures the loss incurred by disclosing the data (z) due to possible identification based on Empirical distribution p associated with records x 1 …x n
9
Risk Definition The risk of the disclosure rule in the presence of the side information is the average loss of disclosing x 1 …x n :
10
Putting the pieces together so far… An hypothetical attacker performs an indentification attempt on a disclosed record y= (x) on the basis of a side information , that can be a dictionary The dictionary is used to link y with some entry present in the dictionary Example: –y has the form (name, surname,phone#), is a phone book – if all attributes revealed, it is likely y linked with one entry –If phone# suppressed (or missing) y may or may not be linked to a single entity, depending on the popularity of (name, surname)
11
Risk formulation Let’s decompose the loss function into an identification part and into a sensitivity part Identification part: formalized by the random variable Z otherwise
12
Risk formulation (cont.) Sensitivity part: where higher value indicate higher sensitivity Therefore the loss is:
13
Risk formulation (cont.) Risk:
14
Disclosure Rule vs. Privacy Risk Suppose that true is the true attacker’s dictionary which is publicly available and that * is the actual database starting from which data will be published Under the following assumptions: – true contains more records than * ( * <= true ) –The non- in true will be more limited than the non- in * Theorem: If θ* contains records that correspond to x1,...,xn and θ*<=θ true, then: R( , θ true )<= R( , θ*)
15
Disclosure Rule vs. Privacy Risk (cont.) The theorem proves that the true risk is bounded by R( , θ*) Under the hypothesis that the distribution underlying factorizes into a product form Theorem: The rule that minimizes the risk *=arg min R( , θ) can be found in O(nNm) computation
16
K-anonimity K anonimity is SIMPLY a special case of our framework in whcih: –θ true =T – is a costant – is underspecified Our framework underlies some questionable hypotheses of k-anonimity!!!
17
Conclusions New framework for privacy risk taking into account sensitivity Risk estimation as an upperbound for the true privacy risk Efficient algorithm for risk computation K-anonimity generalization
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.