SECURED OUTSOURCING OF FREQUENT ITEMSET MINING Hana Chih-Hua Tai Dept. of CSIE, National Taipei University.

Slides:

Advertisements

Similar presentations

Recap: Mining association rules from large datasets

Advertisements

Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,

Publishing Set-Valued Data via Differential Privacy Rui Chen, Concordia University Noman Mohammed, Concordia University Benjamin C. M. Fung, Concordia.

Association Analysis (Data Engineering). Type of attributes in assoc. analysis Association rule mining assumes the input data consists of binary attributes.

Privacy-Preserving Databases and Data Mining Yücel SAYGIN

IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department

PRIVACY AND SECURITY ISSUES IN DATA MINING P.h.D. Candidate: Anna Monreale Supervisors Prof. Dino Pedreschi Dott.ssa Fosca Giannotti University of Pisa.

Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.

1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.

Data Mining Association Analysis: Basic Concepts and Algorithms

Mining Frequent Itemsets from Uncertain Data Presented by Chun-Kit Chui, Ben Kao, Edward Hung Department of Computer Science, The University of Hong Kong.

Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.

An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University This is joint work with Murat.

1 Mining Frequent Patterns Without Candidate Generation Apriori-like algorithm suffers from long patterns or quite low minimum support thresholds. Two.

Data Mining Association Analysis: Basic Concepts and Algorithms

Pattern Lattice Traversal by Selective Jumps Osmar R. Zaïane and Mohammad El-Hajj Department of Computing Science, University of Alberta Edmonton, AB,

Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, D. W. Cheung, B. Kao Department of Computer Science.

Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee David W. Cheung Ben Kao The University of Hong Kong.

Mining Association Rules

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery.

Mining Association Rules between Sets of Items in Large Databases presented by Zhuang Wang.

Mining Frequent Itemsets with Constraints Takeaki Uno Takeaki Uno National Institute of Informatics, JAPAN Nov/2005 FJWCP.

Mining Optimal Decision Trees from Itemset Lattices Dr, Siegfried Nijssen Dr. Elisa Fromont KDD 2007.

Ch5 Mining Frequent Patterns, Associations, and Correlations

Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:

VLDB 2012 Mining Frequent Itemsets over Uncertain Databases Yongxin Tong 1, Lei Chen 1, Yurong Cheng 2, Philip S. Yu 3 1 The Hong Kong University of Science.

Secure Incremental Maintenance of Distributed Association Rules.

EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.

Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.

Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )

Differentially Private Data Release for Data Mining Noman Mohammed*, Rui Chen*, Benjamin C. M. Fung*, Philip S. Yu + *Concordia University, Montreal, Canada.

Parallel Mining Frequent Patterns: A Sampling-based Approach Shengnan Cong.

Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?

CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.

Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework.

Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation.

Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, David W. Cheung, Ben Kao The University of Hong.

Mining Frequent Itemsets from Uncertain Data Presenter : Chun-Kit Chui Chun-Kit Chui [1], Ben Kao [1] and Edward Hung [2] [1] Department of Computer Science.

1 Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining -SIGKDD’03 Mohammad El-Hajj, Osmar R. Zaïane.

Data Mining Find information from data data ? information.

1 AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Advisor ： Dr. Koh Jia-Ling Speaker ： Tu Yi-Lang Date ： Hong.

Privacy-preserving data publishing

Mining Graph Patterns Efficiently via Randomized Summaries Chen Chen, Cindy X. Lin, Matt Fredrikson, Mihai Christodorescu, Xifeng Yan, Jiawei Han VLDB’09.

Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

2015/12/251 Hierarchical Document Clustering Using Frequent Itemsets Benjamin C.M. Fung, Ke Wangy and Martin Ester Proceeding of International Conference.

Secure Data Outsourcing

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

An Energy-Efficient Approach for Real-Time Tracking of Moving Objects in Multi-Level Sensor Networks Vincent S. Tseng, Eric H. C. Lu, & Kawuu W. Lin Institute.

TreeFinder ： a first step towards XML data mining Advisor ： Dr. Hsu Graduate ： Keng-Wei Chang Author ： Alexandre Termier Marie-Christine Michele Sebag.

Using category-Based Adherence to Cluster Market-Basket Data Author : Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen Graduate : Chien-Ming Hsiao.

Indexing and Mining Free Trees Yun Chi, Yirong Yang, Richard R. Muntz Department of Computer Science University of California, Los Angeles, CA {

1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.

Privacy Preserving Outlier Detection using Locality Sensitive Hashing

1 Mining the Smallest Association Rule Set for Predictions Jiuyong Li, Hong Shen, and Rodney Topor Proceedings of the 2001 IEEE International Conference.

Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),

Searching for Pattern Rules Guichong Li and Howard J. Hamilton Int'l Conf on Data Mining (ICDM),2006 IEEE Advisor ： Jia-Ling Koh Speaker ： Tsui-Feng Yen.

CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets

Security in Outsourcing of Association Rule Mining

Data Mining Association Analysis: Basic Concepts and Algorithms

Byung Joon Park, Sung Hee Kim

CARPENTER Find Closed Patterns in Long Biological Datasets

CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets

Mining Frequent Itemsets over Uncertain Databases

Privacy Preserving Data Mining

Farzaneh Mirzazadeh Fall 2007

Privacy preserving cloud computing

DENSE ITEMSETS JOUNI K. SEPPANEN, HEIKKI MANNILA SIGKDD2004

CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets

Presentation transcript:

SECURED OUTSOURCING OF FREQUENT ITEMSET MINING Hana Chih-Hua Tai Dept. of CSIE, National Taipei University

OUTLINE Preliminary – Frequent ItemSet Mining Motivation Privacy Model – K-Support Anonymity Algorithm Performance Studies Conclusion 2

OUTLINE Preliminary – Frequent ItemSet Mining Motivation 3

FREQUENT ITEMSET MINING (FIM) Discover what happened frequently 4 Trans. IDItems 1wine 2cigar, wine 3cigar, tea 4beer, cigar, wine 5beer, tea, wine When threshold set as 3 (=60%), {wine} and {cigar} are frequent. When threshold set as 2 (=40%), {wine}, {cigar}, {tea}, {beer}, {wine, cigar}, and {wine, beer} are frequent.

FREQUENT ITEMSET MINING (FIM) Discover what happened frequently Frequent itemset mining (FIM) 5 Trans. IDItems 1wine 2cigar, wine 3cigar, tea 4beer, cigar, wine 5beer, tea, wine When threshold set as 3 (=60%), {wine} and {cigar} are frequent. When threshold set as 2 (=40%), {wine}, {cigar}, {tea}, {beer}, {wine, cigar}, and {wine, beer} are frequent.

THE NEEDS OF OUTSOURCING FIM For those who lack of expertise in FIM and/or computing resources, they have the need of outsourcing the mining tasks to a professional third party. 6 Data Owner Mining Services Provider (Cloud Computing)

THE NEEDS OF OUTSOURCING FIM For those who lack of expertise in FIM and/or computing resources, they have the need of outsourcing the mining tasks to a professional third party. 7 Data Owner Mining Services Provider (Cloud Computing) Privacy?!

THE RISKS OF OUTSOURCING FIM Encryption/decryption method is believed as the possible solution. 8 Mining Services Provider (Cloud Computing) Data Owner

THE RISKS OF OUTSOURCING FIM Encryption/decryption method is believed as the possible solution. 9 Mining Services Provider (Cloud Computing) Data Owner How to achieve the encryption and decryption? Privacy protected Correct mining results Reasonable overhead

THE RISKS OF OUTSOURCING FIM 10 Trans. IDItems 1wine 2cigar, wine 3cigar, tea 4beer, cigar, wine 5beer, tea, wine Trans. IDItems 1a 2a, c 3c, d 4a, b, c 5a, b, d Encrypt

THE RISKS OF OUTSOURCING FIM Top frequency attack Wine is the most frequent item  ‘a’ is ‘wine’ Approximate support attack The support of cigar is about 55%~60%  ‘c’ is ‘cigar’ 11 Trans. IDItems 1wine 2cigar, wine 3cigar, tea 4beer, cigar, wine 5beer, tea, wine Trans. IDItems 1a 2a, c 3c, d 4a, b, c 5a, b, d Encrypt

THE RISKS OF OUTSOURCING FIM Top frequency attack Wine is the most frequent item  ‘a’ is ‘wine’ Approximate support attack The support of cigar is about 55%~60%  ‘c’ is ‘cigar’ 12 Trans. IDItems 1wine 2cigar, wine 3cigar, tea 4beer, cigar, wine 5beer, tea, wine Trans. IDItems 1a 2a, c 3c, d 4a, b, c 5a, b, d Encrypt The Risks of Outsourcing FIM The support information about the frequent itemsets can be utilized to effectively reveal the raw data as well as the sensitive information from the anonymized transactions. T. Mielik¨ainen. Privacy problems with anonymized transaction databases. In Proc. of Discovery Science, The support information about the frequent itemsets can be utilized to effectively reveal the raw data as well as the sensitive information from the anonymized transactions. T. Mielik¨ainen. Privacy problems with anonymized transaction databases. In Proc. of Discovery Science, 2004.

RELATED WORKS Encrypt each real items by a one-many mapping function. Wong, W. K., Cheung, D. W., Hung, E., Kao, B., Mamoulis, N.: Security in Outsourcing of Association Rule Mining. In: Proc. of VLDB, However, it does not try to anonymize the support information. Recently it is cracked. Molloy, I., Li, N., Li, T.: On the (In)Security and (Im)Practicality of Outsourcing Precise Association Rule Mining. In: Proc. of ICDM,

OUTLINE Preliminary – Frequent ItemSet Mining Motivation Privacy Model – K-Support Anonymity 14

K-SUPPORT ANONYMITY & ANONYMIZATION For every sensitive item, there are at least k-1 other items of the same support. The probability of an item being correctly re-identified is limited to 1/k, even when the precise support information is known. Given a transactional database T, encrypt T into E(T) such that There exist a decryption function D such that MiningResult(T, Δ)= D (MiningResult(E(T), Δ)), for any minimal support Δ. E(T) is k-support anonymous. 15

SOLUTION 1: A NAÏVE APPROACH For each set of real items of the same support, add enough fake items randomly into transactions to make the fake items as frequent as real ones. 16 Trans. IDItems 1wine 2cigar, wine 3cigar, tea 4beer, cigar, wine 5beer, tea, wine For k = 3, 16 additional items are required. For k = 3, 16 additional items are required. 4 x 2 = 8 (e, f) for wine 3 x 2 = 6 (g, h) for cigar 2 x 1 = 2 (i) for beer and tea Items a, e, g, h, i a, c, e, f, h, i c, d, e, f, g a, b, c, f, h a, b, d, e, f, g

A NAÏVE SOLUTION For each set of real items of the same support, add enough fake items randomly into transactions to make the fake items as frequent as real ones. 17 Trans. IDItems 1wine 2cigar, wine 3cigar, tea 4beer, cigar, wine 5beer, tea, wine For k = 3, 16 additional items are required. For k = 3, 16 additional items are required. 4 x 2 = 8 (e, f) for wine 3 x 2 = 6 (g, h) for cigar 2 x 1 = 2 (i) for beer and tea Items a, e, g, h, i a, c, e, f, h, i c, d, e, f, g a, b, c, f, h a, b, d, e, f, g There could be too large storage overhead when k is large.

GENERALIZED FIM Discover all frequent items across concept levels, given a taxonomy indicating the hierarchical concepts between items 18 Trans. IDItems 1wine 2cigar, wine 3cigar, tea 4beer, cigar, wine 5beer, tea, wine beer tea alcoholic wine beverage all prod. cigar When threshold set as 3 (=60%), {wine}, {cigar}, {alcoholic}, {beverage} and {all prod.} are frequent. {beverage, cigar} are also frequent.

GENERALIZED FIM Discover all frequent items across concept levels, given a taxonomy indicating the hierarchical concepts between items 19 Trans. IDItems 1wine 2cigar, wine 3cigar, tea 4beer, cigar, wine 5beer, tea, wine beer tea alcoholic wine beverage all prod. cigar When threshold set as 3 (=60%), {wine}, {cigar}, {alcoholic}, {beverage} and {all prod.} are frequent. {beverage, cigar} are also frequent. 1. The support of a parent node comes from the supports of it child nodes. 2. Only lead nodes need to appear in the transactions. 1. The support of a parent node comes from the supports of it child nodes. 2. Only lead nodes need to appear in the transactions.

OUTLINE Preliminary – Frequent ItemSet Mining Motivation Privacy Model – K-Support Anonymity Algorithm 20

ANONYMIZATION: OVERVIEW For storage efficiency, we suggest to convert FIM to GFIM. 21 Pseudo Taxonomy Generation in the Encryption Encrypt Transaction Data Transaction Data Frequent Itemsets Pseudo Taxonomy Transaction Data Encrypted Decrypt Frequent Itemsets Data OwnerThird Party Generalized Frequent Itemset Mining

wine {e, f, j} cigar {b, c, d} beer and tea {a, g, h} ANONYMIZATION: STORAGE EFFICIENCY In GFIM, items can be at multiple levels of a taxonomy and only the items at leaf level need to appear in the database. 22 Trans. IDItems 1wine 2cigar, wine 3cigar, tea 4beer, cigar, wine 5beer, tea, wine Encrypt with k=3 4 additional items required a f f beer wine j j cigar e e b i k cd gh tea Trans. ID Items 1c, d, g 2b, d, g 3b, h 4a, b, c 5a, c, d, h

In GFIM, items can be at multiple levels of a taxonomy and only the items at leaf level need to appear in the database. wine {e, f, j} cigar {b, c, d} beer and tea {a, g, h} ANONYMIZATION: STORAGE EFFICIENCY 23 Trans. IDItems 1wine 2cigar, wine 3cigar, tea 4beer, cigar, wine 5beer, tea, wine Encrypt with k=3 4 additional items required a f f beer wine j j cigar e e b i k cd gh tea Trans. ID Items 1c, d, g 2b, d, g 3b, h 4a, b, c 5a, c, d, h Small storage overhead compared to the naïve method.

ANONYMIZATION: EASY DECRYPTION The real frequent itemsets can be obtained by filtering out patterns containing any fake item in 1 scan of the returned results. 24 Trans. IDItems 1wine 2cigar, wine 3cigar, tea 4beer, cigar, wine 5beer, tea, wine min_sup = 2 a f beer wine j cigar e b i k cd gh tea Trans. IDItems 1c, d, g 2b, d, g 3b, h 4a, b, c 5a, c, d, h Results = {{beer}, {cigar}, {wine}, {tea}, {beer, wine}, {cigar, wine}} Results = {a, b, c, d, e, f, g, h, i, j, k, ac, af, bf, ce, …}

a f beer wine j cigar e b i k cd gh tea ANONYMIZATION: EASY DECRYPTION The real frequent itemsets can be obtained by filtering out patterns containing any fake item in 1 scan of the returned results. 25 Trans. IDItems 1wine 2cigar, wine 3cigar, tea 4beer, cigar, wine 5beer, tea, wine min_sup = 2 Trans. IDItems 1c, d, g 2b, d, g 3b, h 4a, b, c 5a, c, d, h Results = {{beer}, {cigar}, {wine}, {tea}, {beer, wine}, {cigar, wine}} Results = {a, b, c, d, e, f, g, h, i, j, k, ac, af, bf, ce, …} The data owner can obtain the real results in 1 scan of the returned itemsets.

ANONYMIZATION: ENCRYPTION 26 Trans. IDItems 1wine 2cigar, wine 3cigar, tea 4beer, cigar, wine 5beer, tea, wine Encrypt with k=3 a f beer wine j cigar e b i k cd gh tea Trans. ID Items 1c, d, g 2b, d, g 3b, h 4a, b, c 5a, c, d, h The problem is how to build the taxonomy and encrypt T for k-support anonymity.

ANONYMIZATION: ENCRYPTION 1: Generalization of the Mining Task To generate a pseudo taxonomy that can (a) conserve the correct and complete mining results, (b) facilitate k-support anonymization. 2: Anonymization with Taxonomy Tree To encrypt T for k-support anonymity with the help of the constructed taxonomy tree. 27

1: GENERALIZATION OF THE MINING TASK Build a k-bud tree of T All real items at the leaf level The number of nodes in three categories is equal to or greater than k Let x M denote the most frequent real item in T A > = { v | sup(v) > sup(x M ) and v is leaf}, A = = { v | sup(v) = sup(x M )}, and A < = { v | sup(v) < sup(x M ) < sup(u), where u is the parent node of v }. 2 4 (beer) (wine) 2 (cigar) (tea) 3-bud tree Trans. IDItems 1wine 2cigar, wine 3cigar, tea 4beer, cigar, wine 5beer, tea, wine 28

1: GENERALIZATION OF THE MINING TASK 29 beer cigar beer cigar wine tea 3 groups Trans. IDItems 1wine 2cigar, wine 3cigar, tea 4beer, cigar, wine 5beer, tea, wine

1: GENERALIZATION OF THE MINING TASK 30 Trans. IDItems 1wine 2cigar, wine 3cigar, tea 4beer, cigar, wine 5beer, tea, wine 2 4 (beer)(wine) 2 (cigar) subtrees (tea)

1: GENERALIZATION OF THE MINING TASK 31 Trans. IDItems 1wine 2cigar, wine 3cigar, tea 4beer, cigar, wine 5beer, tea, wine 2 (tea)(beer) 2 4 (wine) (cigar) Iteratively connect a subtree which sup(root) ≧ sup(wine) with the other subtree

1: GENERALIZATION OF THE MINING TASK 32 Trans. IDItems 1wine 2cigar, wine 3cigar, tea 4beer, cigar, wine 5beer, tea, wine 2 4 (beer) (wine) 2 (cigar) (tea) 3 bud-tree

2: ANONYMIZATION WITH TAXONOMY TREE Alternate k-bud tree and modify T simultaneously to achieve k-support anonymity Insertion Split Increase 33

2: ANONYMIZATION WITH TAXONOMY TREE Alternate k-bud tree and modify T simultaneously to achieve k-support anonymity Insertion (Ex.) Split Increase 34 u p p v q q u v v p: the node with target support q: randomly select sup(p) – sup(v) transactions from T(u) – T(v) T(x) is the set of transactions containing the item x. sup(v) < target-sup < sup(u) sup(u) and sup(v) should not be changed.

TIDItems 1wine 2cigar, wine 3cigar, tea 4beer, cigar, wine 5beer, tea, wine 2 4 (beer) (wine) 2 (cigar) (tea) 3-bud tree (tea) insertion Items wine, p1 cigar, wine, p1 cigar, tea beer, cigar, wine beer, tea, wine p1 x y 35 2: ANONYMIZATION WITH TAXONOMY TREE For wine

2: ANONYMIZATION WITH TAXONOMY TREE Alternate k-bud tree and modify T simultaneously to achieve k-support anonymity Insertion Split (Ex.) Increase 36 v q q p p v p: randomly select target-sup transactions from T(v) q: T(p) = T(v) – T(q) T(x) is the set of transactions containing the item x. target-sup < sup(v) sup(v) should not be change. Split operation can raise up leaf nodes to internal nodes!

TIDItems 1wine 2cigar, wine 3cigar, tea 4beer, cigar, wine 5beer, tea, wine 2 4 (beer) (wine) 2 (cigar) (tea) 3-bud tree (tea) 4 (wine) insertion split Items wine, p1 cigar, wine, p1 cigar, tea beer, cigar, wine beer, tea, wine Items p1, p2 cigar, p1, p3 cigar, tea beer, cigar, p2 beer, tea, p2 p1 p2p3 x y 37 2: ANONYMIZATION WITH TAXONOMY TREE For wineFor cigar

2: ANONYMIZATION WITH TAXONOMY TREE Alternate k-bud tree and modify T simultaneously to achieve k-support anonymity Insertion Split Increase (Ex.) 38 u v u v v sup(v) < target-sup randomly select target-sup – sup(v) transactions from T(u) – T(v) sup(v) should not be changed. So, Increase operation is applicable only on node that does not belong to any anonymous group!

TIDItems 1wine 2cigar, wine 3cigar, tea 4beer, cigar, wine 5beer, tea, wine 2 4 (beer) (wine) 2 (cigar) (tea) 3-bud tree (tea) 4 (wine) insertion 4 (wine) split increase Items wine, p1 cigar, wine, p1 cigar, tea beer, cigar, wine beer, tea, wine Items p1, p2 cigar, p1, p3 cigar, tea beer, cigar, p2 beer, tea, p2 Items p1, p2, p3 cigar, p1, p3 cigar, tea beer, cigar, p2 beer, tea, p2, p3 p1 p2p3 x y 39 2: ANONYMIZATION WITH TAXONOMY TREE For wineFor cigar

TIDItems 1wine 2cigar, wine 3cigar, tea 4beer, cigar, wine 5beer, tea, wine 2 4 (beer) (wine) 2 (cigar) (tea) 3-bud tree TIDItems 1c, d, g 2b, d, g 3b, h 4a, b, c 5a, c, d, h 3-support anonymity (tea) 4 (wine) insertion 2 4 a (beer) f (wine) 4 b (cigar) h (tea) cd g e ij k 4 (wine) split increase Items wine, p1 cigar, wine, p1 cigar, tea beer, cigar, wine beer, tea, wine Items p1, p2 cigar, p1, p3 cigar, tea beer, cigar, p2 beer, tea, p2 Items p1, p2, p3 cigar, p1, p3 cigar, tea beer, cigar, p2 beer, tea, p2, p3 p1 p2p3 x y 40 2: ANONYMIZATION WITH TAXONOMY TREE For wineFor cigar

OUTLINE Preliminary – Frequent ItemSet Mining Motivation Privacy Model – K-Support Anonymity Algorithm Performance Studies Conclusion 41

PERFORMANCE STUDIES Data sets Retail dataset transactions with 2117 different items T10I1kD100k dataset 100k transactions with 1000 different items Security Against precise item support attacks Against precise itemset support attacks Storage overhead Execution efficiency 42

SECURITY Against precise item support attacks Item accuracy: The ratio of items being re-identified DB accuracy: The avg. ratio of items in a transaction being re-identified 43 (a) Retail dataset(b) T10I1kD100k dataset

SECURITY Against precise itemset support attacks Item accuracy: The ratio of items being re-identified DB accuracy: The avg. ratio of items in a transaction being re-identified 44 (a) Retail dataset(b) T10I1kD100k dataset

STORAGE OVERHEAD & EXECUTION EFFICIENCY 45 (a) Retail dataset(b) T10I1kD100k dataset(a) Retail dataset(b) T10I1kD100k dataset

SUMMARY We proposed k-support anonymity to enhance the privacy protection in outsourcing of frequent itemset mining (FIM). For storage efficiency, we transformed FIM to GFIM, and proposed a taxonomy-based anonymization algorithm. Our method allows the data owner to obtain the real frequent itemsets in 1 scan of the returned results. Experimental results on both real and synthetic data sets showed that our method can achieve very good privacy protection with moderate storage overhead. 46

Q & A