Presentation is loading. Please wait.

Presentation is loading. Please wait.

SFU Pushing Sensitive Transactions for Itemset Utility (IEEE ICDM 2008) Presenter: Yabo, Xu Authors: Yabo Xu, Benjam C.M. Fung, Ke Wang, Ada. W.C. Fu,

Similar presentations


Presentation on theme: "SFU Pushing Sensitive Transactions for Itemset Utility (IEEE ICDM 2008) Presenter: Yabo, Xu Authors: Yabo Xu, Benjam C.M. Fung, Ke Wang, Ada. W.C. Fu,"— Presentation transcript:

1 SFU Pushing Sensitive Transactions for Itemset Utility (IEEE ICDM 2008) Presenter: Yabo, Xu Authors: Yabo Xu, Benjam C.M. Fung, Ke Wang, Ada. W.C. Fu, Jian Pei Affiliation Simon Fraser University, Canada Concordia University, Canada Chinese University of Hong Kong

2 2 Outline  Motivation: Real privacy outcry on transactions  The problem Privacy attacks on Transactions Research Challenges  Our Approach: (h,k,p)-Coherence Attack Model Privacy, Utility and Anonymization Method A bordered-based algorithm  Related works

3 3 Real Privacy Outcry on Transactions Fact 1: AOL search data scandal, Aug 4, 2006 Transactions of query terms 20 million queries from 650k users within three month The famous searcher with ID 4417749 Thelma Arnold was identified soon after. Fact 2: Neflix movie rating dataset for movie recommendation contest, 2006 Transactions of movie ratings 100 million movie ratings made by 500,000 subscribers A researcher de-anonymize the data two weeks after the data release Fact 3: Google was ordered by a federal judge to hand over Youtube user data ( all the user view logs ) to Viacom for copyright issues, July, 2008 Tansactions of user viewing logs Google claimed they will anonymize the data before giving it to Viacom

4 4 Challenges on Anonymizing Transactions  High dimensional and sparse data characteristic Relational data: only tens, or at most hundreds of attributes Transaction data: 10,000 distinct items or more, and each transaction contains a small portion of the items  Hard to model attacker’s prior knowledge Relational data: a small set of public( identifying ) attributes Transaction data: a large number of items are potentially identifying, and considering all of them will render data useless.

5 5 Re-identification Attack: An example ActivitiesMedical History Jane a c d f gDiabetes Sam a b c fHepatitis Albert b d f xHepatitis Grace b c g y zHIV Tim b c f gHIV Public items( identifying ), possibly known by an attacker Private items ( sensitive ), which the attacker wants to find out Attack 1 {a,b}  T2  Hepatitis Attack 2 {b,g}  {T4,T5}  HIV Examples: financial information, health information, sexual orientation, religion and political beliefs. In specialized industries, well defined guidelines for public/private items often exist, i.e. HIPAA

6 6 Attack Model Attacker’s knowledge: a subset of public items --  Attacker’s goal: infer private items – e Attacker’s power p max number of public items he can obtain, i.e. ||<= p p=2: Attack succeeds when  less than k transactions containing , i.e. support()<k  most of the transactions containing  have some private item e, i.e. P(  e)>h, where k and h are two privacy parameter. We call such  with ||<= p as moles.

7 7 Privacy, Utility and Anonymization Method  Privacy notion: (h,k,p)-Coherence A transaction database is coherent if there is no moles, i.e. for every || p, support()k or P(  e)h  Utility measure: loss of nuggets Frequent itemsets are information nuggets for transaction database, and important for many data mining applications  The choice of anonymization method pertubation - lost truthfulness of the data, NO item generalization – require a item hierarchy which may not exist in many applications, NO. item suppression – preserve itemset support, critical for many data mining applications, YES High-dimensional utility measure, preserving associations rather than items.

8 The problem and a greedy framework  Optimal Coherence ( NP-hard Problem) : Suppress a set of items so that all the moles are eliminated while preserving as much as nuggets.  A greedy approach: In each round, suppress one item v with maximal until coherence is achieved.  Challenges The number of moles/nuggets are both exponential 10,000 distinct query terms, p=3  potentially 10 12 moles!! Suppressing an item will affect other moles/nuggets, so the moles/nuggets have to be maintained and updated efficiently. Maximize the moles suppressed Minimize the loss of nuggets

9 Solution – Border Approach Key contribution: an efficient border-based counting algorithm: never enumerating all the moles and nuggets ae, abe, aef, abef af, abf, afg, abfg ag, abg be, bef Moles/ Nuggets Mole/Nugget Border U= { ae, af, ag, be } L = { abef, abfg } Highly compact structure is needed to deal with both the exponential growth of moles and nuggets edge Minimal itemset Maximal itemset

10 10 Related Works  G. Ghinita etc. ICDE 2008 paper. l-diversity, and public/private items bucketization approach, vulnerable to background knowledge attacks due to its property of preserving original items without generalization  M. Terrovitis, etc. VLDB08 paper. Also assume the attacker’s prior knowledge as a subset of items k-anonymity, no protection for the homogeneity attack Generalization  Y. Xu etc. KDD08 paper Assume the attacker’s prior knowledge as a subset of items K-anonymity and l-diversity Suppression Single dimension item utility.

11 Questions? Please write to yxu@cs.sfu.ca


Download ppt "SFU Pushing Sensitive Transactions for Itemset Utility (IEEE ICDM 2008) Presenter: Yabo, Xu Authors: Yabo Xu, Benjam C.M. Fung, Ke Wang, Ada. W.C. Fu,"

Similar presentations


Ads by Google