Reconstruction-Based Association Rule Hiding Author: Yuhong Guo (MS-Ph.D. Candidate, Peking Univ., China) Advisor: Prof. Shiwei Tang Co-Advisors:

Slides:



Advertisements
Similar presentations
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Advertisements

A distributed method for mining association rules
Data Mining (Apriori Algorithm)DCS 802, Spring DCS 802 Data Mining Apriori Algorithm Spring of 2002 Prof. Sung-Hyuk Cha School of Computer Science.
gSpan: Graph-based substructure pattern mining
Correlation Search in Graph Databases Yiping Ke James Cheng Wilfred Ng Presented By Phani Yarlagadda.
Frequent Closed Pattern Search By Row and Feature Enumeration
PRIVACY AND SECURITY ISSUES IN DATA MINING P.h.D. Candidate: Anna Monreale Supervisors Prof. Dino Pedreschi Dott.ssa Fosca Giannotti University of Pisa.
Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.
Efficiency concerns in Privacy Preserving methods Optimization of MASK Shipra Agrawal.
Rakesh Agrawal Ramakrishnan Srikant
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Efficient Mining of Both Positive and Negative Association Rules Xindong Wu (*), Chengqi Zhang (+), and Shichao Zhang (+) (*) University of Vermont, USA.
An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University This is joint work with Murat.
1 Mining Frequent Patterns Without Candidate Generation Apriori-like algorithm suffers from long patterns or quite low minimum support thresholds. Two.
SAC’06 April 23-27, 2006, Dijon, France On the Use of Spectral Filtering for Privacy Preserving Data Mining Songtao Guo UNC Charlotte Xintao Wu UNC Charlotte.
Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.
2/8/00CSE 711 data mining: Apriori Algorithm by S. Cha 1 CSE 711 Seminar on Data Mining: Apriori Algorithm By Sung-Hyuk Cha.
Fast Algorithms for Association Rule Mining
CSE 634 Data Mining Techniques Association Rules Hiding (Not Mining) Prateek Duble ( ) Course Instructor: Prof. Anita Wasilewska State University.
An Experimental Study of Association Rule Hiding Techniques Emmanuel Pontikakis* Dept. of Computer Engineering and Informatics.
A Privacy Preserving Efficient Protocol for Semantic Similarity Join Using Long String Attributes Bilal Hawashin, Farshad Fotouhi Traian Marius Truta Department.
1 Prediction of Software Reliability Using Neural Network and Fuzzy Logic Professor David Rine Seminar Notes.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Chirag N. Modi and Prof. Dhiren R. Patel NIT Surat, India Ph. D Colloquium, CSI-2011 Signature Apriori based Network.
Alert Correlation for Extracting Attack Strategies Authors: B. Zhu and A. A. Ghorbani Source: IJNS review paper Reporter: Chun-Ta Li ( 李俊達 )
COMPGZ07 Project Management Presentations Graham Collins, UCL
ICMLC2007, Aug. 19~22, 2007, Hong Kong 1 Incremental Maintenance of Ontology- Exploiting Association Rules Ming-Cheng Tseng 1, Wen-Yang Lin 2 and Rong.
Mining Optimal Decision Trees from Itemset Lattices Dr, Siegfried Nijssen Dr. Elisa Fromont KDD 2007.
Efficient Discovery of Concise Association Rules from Large Databases Vikram Pudi IIIT Hyderabad.
Self-Enforcing Private Inference Control Yanjiang Yang (I2R, Singapore) Yingjiu Li (SMU, Singapore) Jian Weng (Jinan Univ. China) Jianying Zhou (I2R, Singapore)
Ch5 Mining Frequent Patterns, Associations, and Correlations
APPLYING EPSILON-DIFFERENTIAL PRIVATE QUERY LOG RELEASING SCHEME TO DOCUMENT RETRIEVAL Sicong Zhang, Hui Yang, Lisa Singh Georgetown University August.
Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:
1 Apriori Algorithm Review for Finals. SE 157B, Spring Semester 2007 Professor Lee By Gaurang Negandhi.
Secure Incremental Maintenance of Distributed Association Rules.
AR mining Implementation and comparison of three AR mining algorithms Xuehai Wang, Xiaobo Chen, Shen chen CSCI6405 class project.
1 Verifying and Mining Frequent Patterns from Large Windows ICDE2008 Barzan Mozafari, Hetal Thakkar, Carlo Zaniolo Date: 2008/9/25 Speaker: Li, HueiJyun.
EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)
m-Privacy for Collaborative Data Publishing
MINING FREQUENT ITEMSETS IN A STREAM TOON CALDERS, NELE DEXTERS, BART GOETHALS ICDM2007 Date: 5 June 2008 Speaker: Li, Huei-Jyun Advisor: Dr. Koh, Jia-Ling.
Mining Frequent Patterns without Candidate Generation.
Mining Frequent Patterns without Candidate Generation : A Frequent-Pattern Tree Approach 指導教授:廖述賢博士 報 告 人:朱 佩 慧 班 級:管科所博一.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
SECURED OUTSOURCING OF FREQUENT ITEMSET MINING Hana Chih-Hua Tai Dept. of CSIE, National Taipei University.
HIDING EMERGING PATTERNS WITH LOCAL RECODING GENERALIZATION Presented by: Michael Cheng Supervisor: Dr. William Cheung Co-Supervisor: Dr. Byron Choi.
Randomization in Privacy Preserving Data Mining Agrawal, R., and Srikant, R. Privacy-Preserving Data Mining, ACM SIGMOD’00 the following slides include.
Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.
Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation.
1 Efficient Algorithms for Incremental Update of Frequent Sequences Minghua ZHANG Dec. 7, 2001.
1 AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Hong.
1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation.
MaskIt: Privately Releasing User Context Streams for Personalized Mobile Applications SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference.
Privacy-preserving data publishing
Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.
Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy.
+ Moving Targets: Security and Rapid-Release in Firefox Presented by Carlos Bernal-Cárdenas.
m-Privacy for Collaborative Data Publishing
Privacy Protection in Social Networks Instructor: Assoc. Prof. Dr. DANG Tran Khanh Present : Bui Tien Duc Lam Van Dai Nguyen Viet Dang.
1 Top Down FP-Growth for Association Rule Mining By Ke Wang.
1 Maintaining Data Privacy in Association Rule Mining Speaker: Minghua ZHANG Oct. 11, 2002 Authors: Shariq J. Rizvi Jayant R. Haritsa VLDB 2002.
Security in Outsourcing of Association Rule Mining
Chang-Hung Lee, Jian Chih Ou, and Ming Syan Chen, Proc
Action Association Rules Mining
Targeted Association Mining in Time-Varying Domains
Association Rule Mining
Amer Zaheer PC Mohammad Ali Jinnah University, Islamabad
Farzaneh Mirzazadeh Fall 2007
Discriminative Frequent Pattern Analysis for Effective Classification
Privacy preserving cloud computing
Presentation transcript:

Reconstruction-Based Association Rule Hiding Author: Yuhong Guo (MS-Ph.D. Candidate, Peking Univ., China) Advisor: Prof. Shiwei Tang Co-Advisors: Prof. Dongqing Yang, Jian Pei Sunday, June 10, 2007

SIGMOD Ph.D. Workshop IDAR ’ 072 Association Rule Hiding: what? why?? and how??? Problem: hide sensitive association rules in data without losing non-sensitives Motivations: large repositories of data contain confidential rules disclosed with serious adverse effects Solutions Data modification  distortion blocking Data reconstruction Traditional: fine-tuning, control the hiding effects indirectly New promising: knowledge sanitization, control effects directly

SIGMOD Ph.D. Workshop IDAR ’ 073 Outline Background Motivation Problem statement Related work Proposed Solution Current Progress Evaluation Plan

SIGMOD Ph.D. Workshop IDAR ’ 074 Motivation Two problems addressed in PPDM the protection of private data the protection of sensitive rules (knowledge) contained in the data Background Data mining Data sharing Privacy preserving Privacy Preserving Data mining (PPDM)

SIGMOD Ph.D. Workshop IDAR ’ 075 Problem statement Given a database D to be released minimum threshold “ MST ”, “ MCT ” a set of association rules R mined from D a set of sensitive rules R h R to be hided Find a new database D ’ such that the rules in R h cannot be mined from D ’ the rules in R-R h can still be mined as many as possible KHD (Knowledge Hiding in Database) problem Background

SIGMOD Ph.D. Workshop IDAR ’ 076 Related work Data modification approaches Basic idea: data sanitization D->D ’ Current status:distortion,blocking, prosperous Drawbacks Cannot control hiding effects intuitively, lots of I/O Data reconstruction approaches Basic idea:knowledge sanitization D->K->D ’ Current status:limited, 3 papers Advantages Can easily control the availability of rules and control the hiding effects directly, intuitively, handily Background

SIGMOD Ph.D. Workshop IDAR ’ 077 Classification of current algorithms Hide rules Hide large itemsets Data modification Data- Distortion Algo1a Algo1b Algo2a WSDA PDA Algo2b Algo2c Naïve MinFIA MaxFIA IGA RRA RA SWA Border-Based Integer-Programing Sanitization-Matrix Data- Blocking CR CR2 GIH Data reconstructionCIILM Background lots of reconstruction-based work is expected

SIGMOD Ph.D. Workshop IDAR ’ 078 Outline Background Proposed Solution Framework Example Discussion Current Progress Evaluation Plan

SIGMOD Ph.D. Workshop IDAR ’ 079 Framework of our approach Proposed Solution D ’ D D. 1Frequent Set Mining FS R R - Rh ’ FS. 2Perform sanitization Algorithm 3. FP - tree-based Inverse Frequent Set Mining FP-tree

SIGMOD Ph.D. Workshop IDAR ’ 0710 The first two phases 1. Frequent set mining Generate all frequent itemsets with their supports and support counts FS from original database D 2. Perform sanitization algorithm Input: FS output in phase 1, R, R h Output: sanitized frequent itemsets FS ’ Process Select hiding strategy Identify sensitive frequent sets Perform sanitization Proposed Solution In best cases, sanitization algorithm can ensure from FS ’,we can exactly get the non-sensitive rules set R-R h

SIGMOD Ph.D. Workshop IDAR ’ 0711 The third phase: FP-tree-based inverse mining Basic idea: use FP-tree as a transition “ bridge ”, which reduces the gap between a database and its frequent itemsets and makes transformation more easily Proposed Method FS D1 TempD D2 Frequent Itemsets FP-Tree Temporary Database... A set of Compatible databases (i)(ii)(iii) (i) Generate a compatible FP-tree (ii) Generate a TempD that only includes frequent items (iii) Scatter infrequent items into TempD

SIGMOD Ph.D. Workshop IDAR ’ 0712 Example: the first two phases Proposed Solution TIDItems T1ABCE T2ABC T3ABCD T4ABD T5AD T6ACD Oiginal Database: D σ =4 MST=66% MCT=75% Frequent Itemsets: FS' A:6 100% C:4 66% D:4 66% AC:4 66% AD:4 66% rules confid- ence support C  A 100%66% D  A 100%66% Association Rules: R-R h 1. Frequent set mining 2. Perform sanitization algorithm

SIGMOD Ph.D. Workshop IDAR ’ 0713 Example: the third phase Proposed Solution Frequent Itemsets: FS' A:6 100% C:4 66% D:4 66% AC:4 66% AD:4 66% D TIDItems T1ACD T2ACD T3AC T4AC T5AD T6AD Released Database: ' A:6 C:4 FP D:2 D:2 σ=4 Difficulties : 1.How to find the target FP-tree 2.How to control |D’|

SIGMOD Ph.D. Workshop IDAR ’ 0714 Discussion Sanitization algorithm Compared with early popular data sanitization : performs sanitization directly on knowledge level of data Inverse frequent set mining algorithm Deals with frequent items and infrequent items separately: more efficiently, a large number of outputs Proposed Solution Our solution provides user with a knowledge level window to perform sanitization handily and generates a number of secure databases

SIGMOD Ph.D. Workshop IDAR ’ 0715 Outline Background Proposed Solution Current Progress Work to date Future work Expected contributions Evaluation Plan

SIGMOD Ph.D. Workshop IDAR ’ 0716 Work to date FP-tree-based method for inverse frequent set mining (used in the 3rd phase of our framework) First effort Published in Proc. of BNCOD'06 Provides a good heuristic search strategy to rapidly find a FP-tree satisfying the given constraints, leading to rapidly finding a set of compatible databases Further work Accepted by Journal of Software (JOS) A more mature and well-designed FP-tree-based method for inverse frequent set mining by iteratively solving a sub linear constraint problem Current Progress

SIGMOD Ph.D. Workshop IDAR ’ 0717 Future work Develop a sound sanitization algorithm with the following considerations The support and confidence of the rules in R- R h should remain unchanged as much as possible Can select appropriate hiding strategies according to different kinds of correlations among the rules in R and R h Can prevent rule-based reasoning Investigate how to restrict the number of transactions in the new released database Develop an integrated secure association rule mining tool Can protect privacy data Can protect sensitive rules contained in the data Current Progress DHD KHD Integrated secure tool

SIGMOD Ph.D. Workshop IDAR ’ 0718 Expected contributions Current Progress Reconstruction-based ARH Framework Inverse Frequent Set Mining Algorithm CHART: Credible Hiding Association Rule Tool Rule sanitization Algorithm ARH Evaluation Metrics

SIGMOD Ph.D. Workshop IDAR ’ 0719 Outline Background Proposed Solution Current Progress Evaluation Plan

SIGMOD Ph.D. Workshop IDAR ’ 0720 Evaluation Plan Dataset BMS-POS BMS-WebView-1 BMS-WebView-2 … Evaluation Hiding effects ① Hiding Failure Ratio R h (D’)/R h (D) ② Lost Rules Ratio ③ Ghost Rules Ratio Data utility Time performance ① Hiding Failure ② Lost Rules ③ Ghost Rules R R ’ R h R~ h (~R h (D) − ~R h (D’))/ ~R h (D) ( ∣ R’ ∣ − ∣ R∩R’ ∣ )/ ∣ R’ ∣

SIGMOD Ph.D. Workshop IDAR ’ FP-tree- 3.FP-tree-based Inverse Frequent Set Mining Summary D ’ D D. 1Frequent Set Mining FS R R - Rh ’ FS. 2Perform sanitization Algorithm FP-tree 3. FP-tree-based Inverse Frequent Set Mining Basically completed! Ongoing! Reconstruction-based Association Rule Hiding

Thanks for your attention Any suggestion or question?