Download presentation
Presentation is loading. Please wait.
Published byAubrey Watkins Modified over 5 years ago
1
An integer programming approach for frequent itemset hiding
Aris Gkoulalas-Divanis Vassilos S. Verykios CIKM’06
2
outline Introduction Basic definitions Methodology
Experimental results Conclusions
3
introduction It based on the notion of distance between original database and the sanitized database goal: minimized the distance based on the integer programming while hiding the sensitive itemsets and minimally affecting non-sensitive itemsets
4
Basic definitions :the support count of itemsets in bitmap representation a b c 1 Maximizing the number of 1 left in D’ non-sensitive itemsets should satisfy this rule in D’ sensitive itemsets should satisfy this rule in D’
5
(cont.) Solving this problem is NP-hard ,there are 2m-1 inequalities (m:transactions lists)
6
(cont.) SI={e,ae,bc} (sensitive itemsets)
S={e,bc} (minimal sensitive itemsets) SS={e,ae,bc,ce,abc,……} set of all sensitive itemsets and their supersets Ideal case : F‘=F-SS ,santized database D’ to contain all the frequent itemsets of D expect from the sensitive ones
7
(cont.) Negative border Positive border
ex: acd:infrequent ac,cd,ad:frequent ex: ac:frequent ac#:infrequent (#:anyitem)
8
Border revision B- (F)={CD,ABD} B+ (F)={AD,BD,ABC} Original border
frequent infrequent null A B C D revised border AB AC AD BC BD CD ABC ABD ACD BCD ABCD
9
Problem size minimization
C:the total set of affected itemsets Lc: the set of solutions of the corresponding inequalities :remove the inequality of C2 without affecting the global solution of the system then C2 covers C1
10
(cont.) Corollary :any itemset belonging in the positive border of F-SS covers all its subsets =>B+(F’) cover all itemset of F’ B-(F’) cover all itemsets of Ideal solution Lc:
11
(cont.)
12
example F={A,B,C,D,AB,AC,AD,CD,ACD} SI={AB},S={AB}
F’={A,B,C,D,AC,AD,CD,ACD} B+(F’)={B,ACD} B:frequent ACD:frequent AB:infrequent msup=0.2
13
Constraint satisfaction problem
A solution of a CSP is a complete assignment of values to the variables that satisfies all the constraints In CSP we usually wish to maximize or minimize an objective function subject to a number of constraints To solve this problem we use “binary integer programming (BIP)” that transform the CSP to an optimization problem
14
Binary integer problem
15
Experimental results 10,000 transactions,10items,msup=0.1
16
conclusions Defined a new metric to quantify the distance of the initial database D and its sanitized version D’ It has benefit of being exact when ideal solution can be identified
17
Exact knowledge hiding through database extension
Aris Gkoulalas-Divanis Vassilos S. Verykios TKDE’08
18
introduction The goal of the hiding algorithm is to create a minimal extension DX to the original database DO D
19
(cont.) S={e,ae,bc}
20
methodology P=|D| N=|Do| Q=|Dx| ex: e:4,ae:3,bc:4
21
(cont.) The distance between Do and D is measured based on the extension Dx (minimize)
22
(cont.) Optimal solution set c: S={e,ae,bc} mfreq=0.3 Q=4
C={e,f,bc,bd,ab,acd} 0.3*(10+4)-4
23
Safety margin The lower bound of Q under certain circumstances be insufficient to allow for the identification of an exact solution Safety margin(SM): Expand the size of Q of Dx, it can be predefined or be computed dynamically Ex:s={abc} only 1 transaction is insufficient to provide an exact solution
24
(cont.) Null transaction: (i) an unnecessarily large safety margin
Should be removed from Dx (ii) a large value of Q essential for proper hiding Need to be validated ,since Q denotes the lower bound in the number of transactions to ensure proper hiding
25
(cont.) To ensure minimum size of Dx, the hiding algorithm keeps only k null transactions Qinv:null transaction V=Q+SM-Qinv Ex: s={abc} ,Q=1 ,SM=3 K=max(1-3,0)=1 Null transaction
26
Experimental results
27
(cont.)
28
conclusions Use a minimal extension to the original database
It has benefit of being exact when ideal solution can be identified
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.