K Beyond k-Anonimity: A Decision Theoretic Framework for Assessing Privacy Risk M.Scannapieco, G.Lebanon, M.R.Fouad and E.Bertino.

k Beyond k-Anonimity: A Decision Theoretic Framework for Assessing Privacy Risk M.Scannapieco, G.Lebanon, M.R.Fouad and E.Bertino

Introduction  Release of data –Private organizations can benefit from sharing data with others –Public organizations see data as a value for the society  Privacy preservation –Data disclosure can lead to economic damages, threats to national security, etc. –Regulated by law in both private and public sectors

Two Facets of Data Privacy  Identity disclosure –Uncontrolled data release: even presence of identifiers –Anonymous data release: identifiers suppressed, but no control on possible linking with other sources

PrivateIDSSNDOBZIPHealth_Problem a11/20/6700198Shortness of breath b02/07/8100159Headache c02/07/8100156Obesity d08/07/7600198Shortness of breath PrivateIDSSNDOBZIPEmploymentMarital Status 1A11/20/6700198ResearcherMarried 5E08/07/7600114Private Employee Married 3C02/07/8100156Public Employee Widow T1 T2 Linkage of Anonymous Data QUASI-IDENTIFIER

Two Facets of Data Privacy (cont.)  Sensitive information disclosure –Once identity disclosure occurs, the loss due to such disclosure depends on how much sensitive are the related data –Data sensitivity is subjective E.g.: for women the age is in general more sensitive than for men

Our proposal  A framework for assessing privacy risk that takes into accounts both facets of privacy –based on statistical decision theory  Definition and analysis of: disclosure policies modelled by disclosure rules and several privacy risk functions  Estimated risk as an upper-bound of true risk and realted complexity analysis  Algorithm for finding the disclosure rule minimizing the privacy risk

Disclosure rules  A disclosure rule is a function that maps a record to a new record in which some attributes may have been suppressed Z j =  The j-th attribute is suppressed otherwise

Loss function  Let    be the side information used by the attacker in the identification attempt  The loss function Measures the loss incurred by disclosing the data  (z)  due to possible identification based on     Empirical distribution p associated with records x 1 …x n

Risk Definition  The risk of the disclosure rule  in the presence of the side information  is the average loss of disclosing x 1 …x n :

Putting the pieces together so far…  An hypothetical attacker performs an indentification attempt on a disclosed record y=  (x) on the basis of a side information , that can be a dictionary  The dictionary is used to link y with some entry present in the dictionary  Example: –y has the form (name, surname,phone#),  is a phone book – if all attributes revealed, it is likely y linked with one entry –If phone# suppressed (or missing) y may or may not be linked to a single entity, depending on the popularity of (name, surname)

Risk formulation  Let’s decompose the loss function into an identification part and into a sensitivity part  Identification part: formalized by the random variable Z otherwise

Risk formulation (cont.)  Sensitivity part:  where higher value indicate higher sensitivity  Therefore the loss is:

Risk formulation (cont.)  Risk:

Disclosure Rule vs. Privacy Risk  Suppose that  true is the true attacker’s dictionary which is publicly available and that  * is the actual database starting from which data will be published  Under the following assumptions: –  true contains more records than  * (  * <=  true ) –The non-  in  true will be more limited than the non-  in  * Theorem: If θ* contains records that correspond to x1,...,xn and θ*<=θ true, then:  R( , θ true )<= R( , θ*)

Disclosure Rule vs. Privacy Risk (cont.)  The theorem proves that the true risk is bounded by R( , θ*)  Under the hypothesis that the distribution underlying  factorizes into a product form Theorem: The rule that minimizes the risk  *=arg min  R( , θ) can be found in O(nNm) computation

K-anonimity  K anonimity is SIMPLY a special case of our framework in whcih: –θ true =T –  is a costant –  is underspecified  Our framework underlies some questionable hypotheses of k-anonimity!!!

Conclusions  New framework for privacy risk taking into account sensitivity  Risk estimation as an upperbound for the true privacy risk  Efficient algorithm for risk computation  K-anonimity generalization

K Beyond k-Anonimity: A Decision Theoretic Framework for Assessing Privacy Risk M.Scannapieco, G.Lebanon, M.R.Fouad and E.Bertino.

Similar presentations

Presentation on theme: "K Beyond k-Anonimity: A Decision Theoretic Framework for Assessing Privacy Risk M.Scannapieco, G.Lebanon, M.R.Fouad and E.Bertino."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

K Beyond k-Anonimity: A Decision Theoretic Framework for Assessing Privacy Risk M.Scannapieco, G.Lebanon, M.R.Fouad and E.Bertino.

Similar presentations

Presentation on theme: "K Beyond k-Anonimity: A Decision Theoretic Framework for Assessing Privacy Risk M.Scannapieco, G.Lebanon, M.R.Fouad and E.Bertino."— Presentation transcript:

Similar presentations

About project

Feedback