An integer programming approach for frequent itemset hiding

Slides:



Advertisements
Similar presentations
Minimization of AND-OR-EXOR Three Level Networks with AND gate Sharing Hasnain Heickal (SH-223)
Advertisements

Association Rule Mining
Recap: Mining association rules from large datasets
Salvatore Ruggieri SIGKDD2010 Frequent Regular Itemset Mining 2010/9/2 1.
Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Data Mining (Apriori Algorithm)DCS 802, Spring DCS 802 Data Mining Apriori Algorithm Spring of 2002 Prof. Sung-Hyuk Cha School of Computer Science.
CSEE 4823 Advanced Logic Design Handout: Lecture #2 1/22/15
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
6.830 Lecture 10 Query Optimization 10/6/2014. Selinger Optimizer Algorithm algorithm: compute optimal way to generate every sub-join: size 1, size 2,...
6.830 Lecture 11 Query Optimization & Automatic Database Design 10/8/2014.
Zeev Dvir – GenMax From: “ Efficiently Mining Frequent Itemsets ” By : Karam Gouda & Mohammed J. Zaki.
FP (FREQUENT PATTERN)-GROWTH ALGORITHM ERTAN LJAJIĆ, 3392/2013 Elektrotehnički fakultet Univerziteta u Beogradu.
Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Reducing the collection of itemsets: alternative representations and combinatorial problems.
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms.
Data Mining Association Analysis: Basic Concepts and Algorithms
Approximation Algorithms
Association Rule Mining - MaxMiner. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and.
2/8/00CSE 711 data mining: Apriori Algorithm by S. Cha 1 CSE 711 Seminar on Data Mining: Apriori Algorithm By Sung-Hyuk Cha.
Branch and Bound Algorithm for Solving Integer Linear Programming
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Nogood Recording for Static and Dynamic Constraint Satisfaction Problems Thomas Schiex, Gerard Verfaillie C.E.R.T.-O.N.E.R.A.(France)
Mining Optimal Decision Trees from Itemset Lattices Dr, Siegfried Nijssen Dr. Elisa Fromont KDD 2007.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
Algorithms and Running Time Algorithm: Well defined and finite sequence of steps to solve a well defined problem. Eg.,, Sequence of steps to multiply two.
Examples. Examples (1/11)  Example #1: f(A,B,C,D) =  m(2,3,4,5,7,8,10,13,15) Fill in the 1’s. 1 1 C A B CD AB D 1 1.
Two Level Networks. Two-Level Networks Slide 2 SOPs A function has, in general many SOPs Functions can be simplified using Boolean algebra Compare the.
Converting to Minterms Form
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining ARM: Improvements March 10, 2009 Slide.
Spatial Forest Planning with Integer Programming Lecture 10 (5/4/2015)
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Δ-Tolerance Closed Frequent Itemsets James Cheng,Yiping Ke,and Wilfred Ng ICDM ’ 06 報告者:林靜怡 2007/03/15.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
1 Mining the Smallest Association Rule Set for Predictions Jiuyong Li, Hong Shen, and Rodney Topor Proceedings of the 2001 IEEE International Conference.
DATA MINING: ASSOCIATION ANALYSIS (2) Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
A LAZY APPROACH TO ASSOCIATIVE CLASSIFICATION Elena Baralis, Silvia Chiusano, Paolo Garza Dipartimento di Automatica e Informatica, Politecnico di Torino,
Reducing Number of Candidates
Data Mining Association Analysis: Basic Concepts and Algorithms
Frequent Pattern Mining
Spatial Forest Planning with Integer Programming
Chang-Hung Lee, Jian Chih Ou, and Ming Syan Chen, Proc
Market Basket Many-to-many relationship between different objects
Dynamic Itemset Counting
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
An Efficient Algorithm for Incremental Mining of Association Rules
A Parameterised Algorithm for Mining Association Rules
ICS 353: Design and Analysis of Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
CH7 Multilevel Gate Network
Optimization Algorithm
Association Analysis: Basic Concepts and Algorithms
Frequent-Pattern Tree
Lecture 11 (Market Basket Analysis)
Closed Itemset Mining CSCI-7173: Computational Complexity & Algorithms, Final Project - Spring 16 Supervised By Dr. Tom Altman Presented By Shahab Helmi.
Association Analysis: Basic Concepts
Presentation transcript:

An integer programming approach for frequent itemset hiding Aris Gkoulalas-Divanis Vassilos S. Verykios CIKM’06

outline Introduction Basic definitions Methodology Experimental results Conclusions

introduction It based on the notion of distance between original database and the sanitized database goal: minimized the distance based on the integer programming while hiding the sensitive itemsets and minimally affecting non-sensitive itemsets

Basic definitions :the support count of itemsets in bitmap representation a b c 1 Maximizing the number of 1 left in D’ non-sensitive itemsets should satisfy this rule in D’ sensitive itemsets should satisfy this rule in D’

(cont.) Solving this problem is NP-hard ,there are 2m-1 inequalities (m:transactions lists)

(cont.) SI={e,ae,bc} (sensitive itemsets) S={e,bc} (minimal sensitive itemsets) SS={e,ae,bc,ce,abc,……} set of all sensitive itemsets and their supersets Ideal case : F‘=F-SS ,santized database D’ to contain all the frequent itemsets of D expect from the sensitive ones

(cont.) Negative border Positive border ex: acd:infrequent ac,cd,ad:frequent ex: ac:frequent ac#:infrequent (#:anyitem)

Border revision B- (F)={CD,ABD} B+ (F)={AD,BD,ABC} Original border frequent infrequent null A B C D revised border AB AC AD BC BD CD ABC ABD ACD BCD ABCD

Problem size minimization C:the total set of affected itemsets Lc: the set of solutions of the corresponding inequalities :remove the inequality of C2 without affecting the global solution of the system then C2 covers C1

(cont.) Corollary :any itemset belonging in the positive border of F-SS covers all its subsets =>B+(F’) cover all itemset of F’ B-(F’) cover all itemsets of Ideal solution Lc:

(cont.)

example F={A,B,C,D,AB,AC,AD,CD,ACD} SI={AB},S={AB} F’={A,B,C,D,AC,AD,CD,ACD} B+(F’)={B,ACD} B:frequent ACD:frequent AB:infrequent msup=0.2

Constraint satisfaction problem A solution of a CSP is a complete assignment of values to the variables that satisfies all the constraints In CSP we usually wish to maximize or minimize an objective function subject to a number of constraints To solve this problem we use “binary integer programming (BIP)” that transform the CSP to an optimization problem

Binary integer problem

Experimental results 10,000 transactions,10items,msup=0.1

conclusions Defined a new metric to quantify the distance of the initial database D and its sanitized version D’ It has benefit of being exact when ideal solution can be identified

Exact knowledge hiding through database extension Aris Gkoulalas-Divanis Vassilos S. Verykios TKDE’08

introduction The goal of the hiding algorithm is to create a minimal extension DX to the original database DO D

(cont.) S={e,ae,bc}

methodology P=|D| N=|Do| Q=|Dx| ex: e:4,ae:3,bc:4

(cont.) The distance between Do and D is measured based on the extension Dx (minimize)

(cont.) Optimal solution set c: S={e,ae,bc} mfreq=0.3 Q=4 C={e,f,bc,bd,ab,acd} 0.3*(10+4)-4

Safety margin The lower bound of Q under certain circumstances be insufficient to allow for the identification of an exact solution Safety margin(SM): Expand the size of Q of Dx, it can be predefined or be computed dynamically Ex:s={abc} only 1 transaction is insufficient to provide an exact solution

(cont.) Null transaction: (i) an unnecessarily large safety margin Should be removed from Dx (ii) a large value of Q essential for proper hiding Need to be validated ,since Q denotes the lower bound in the number of transactions to ensure proper hiding

(cont.) To ensure minimum size of Dx, the hiding algorithm keeps only k null transactions Qinv:null transaction V=Q+SM-Qinv Ex: s={abc} ,Q=1 ,SM=3 K=max(1-3,0)=1 Null transaction

Experimental results

(cont.)

conclusions Use a minimal extension to the original database It has benefit of being exact when ideal solution can be identified