Presentation is loading. Please wait.

Presentation is loading. Please wait.

An integer programming approach for frequent itemset hiding

Similar presentations


Presentation on theme: "An integer programming approach for frequent itemset hiding"— Presentation transcript:

1 An integer programming approach for frequent itemset hiding
Aris Gkoulalas-Divanis Vassilos S. Verykios CIKM’06

2 outline Introduction Basic definitions Methodology
Experimental results Conclusions

3 introduction It based on the notion of distance between original database and the sanitized database goal: minimized the distance based on the integer programming while hiding the sensitive itemsets and minimally affecting non-sensitive itemsets

4 Basic definitions :the support count of itemsets in bitmap representation a b c 1 Maximizing the number of 1 left in D’ non-sensitive itemsets should satisfy this rule in D’ sensitive itemsets should satisfy this rule in D’

5 (cont.) Solving this problem is NP-hard ,there are 2m-1 inequalities (m:transactions lists)

6 (cont.) SI={e,ae,bc} (sensitive itemsets)
S={e,bc} (minimal sensitive itemsets) SS={e,ae,bc,ce,abc,……} set of all sensitive itemsets and their supersets Ideal case : F‘=F-SS ,santized database D’ to contain all the frequent itemsets of D expect from the sensitive ones

7 (cont.) Negative border Positive border
ex: acd:infrequent ac,cd,ad:frequent ex: ac:frequent ac#:infrequent (#:anyitem)

8 Border revision B- (F)={CD,ABD} B+ (F)={AD,BD,ABC} Original border
frequent infrequent null A B C D revised border AB AC AD BC BD CD ABC ABD ACD BCD ABCD

9 Problem size minimization
C:the total set of affected itemsets Lc: the set of solutions of the corresponding inequalities :remove the inequality of C2 without affecting the global solution of the system then C2 covers C1

10 (cont.) Corollary :any itemset belonging in the positive border of F-SS covers all its subsets =>B+(F’) cover all itemset of F’ B-(F’) cover all itemsets of Ideal solution Lc:

11 (cont.)

12 example F={A,B,C,D,AB,AC,AD,CD,ACD} SI={AB},S={AB}
F’={A,B,C,D,AC,AD,CD,ACD} B+(F’)={B,ACD} B:frequent ACD:frequent AB:infrequent msup=0.2

13 Constraint satisfaction problem
A solution of a CSP is a complete assignment of values to the variables that satisfies all the constraints In CSP we usually wish to maximize or minimize an objective function subject to a number of constraints To solve this problem we use “binary integer programming (BIP)” that transform the CSP to an optimization problem

14 Binary integer problem

15 Experimental results 10,000 transactions,10items,msup=0.1

16 conclusions Defined a new metric to quantify the distance of the initial database D and its sanitized version D’ It has benefit of being exact when ideal solution can be identified

17 Exact knowledge hiding through database extension
Aris Gkoulalas-Divanis Vassilos S. Verykios TKDE’08

18 introduction The goal of the hiding algorithm is to create a minimal extension DX to the original database DO D

19 (cont.) S={e,ae,bc}

20 methodology P=|D| N=|Do| Q=|Dx| ex: e:4,ae:3,bc:4

21 (cont.) The distance between Do and D is measured based on the extension Dx (minimize)

22 (cont.) Optimal solution set c: S={e,ae,bc} mfreq=0.3 Q=4
C={e,f,bc,bd,ab,acd} 0.3*(10+4)-4

23 Safety margin The lower bound of Q under certain circumstances be insufficient to allow for the identification of an exact solution Safety margin(SM): Expand the size of Q of Dx, it can be predefined or be computed dynamically Ex:s={abc} only 1 transaction is insufficient to provide an exact solution

24 (cont.) Null transaction: (i) an unnecessarily large safety margin
Should be removed from Dx (ii) a large value of Q essential for proper hiding Need to be validated ,since Q denotes the lower bound in the number of transactions to ensure proper hiding

25 (cont.) To ensure minimum size of Dx, the hiding algorithm keeps only k null transactions Qinv:null transaction V=Q+SM-Qinv Ex: s={abc} ,Q=1 ,SM=3 K=max(1-3,0)=1 Null transaction

26 Experimental results

27 (cont.)

28 conclusions Use a minimal extension to the original database
It has benefit of being exact when ideal solution can be identified


Download ppt "An integer programming approach for frequent itemset hiding"

Similar presentations


Ads by Google