Presentation is loading. Please wait.

Presentation is loading. Please wait.

Security in Outsourcing of Association Rule Mining

Similar presentations


Presentation on theme: "Security in Outsourcing of Association Rule Mining"— Presentation transcript:

1 Security in Outsourcing of Association Rule Mining

2 Agenda Introduction Item mapping Algorithms and experiments Summary

3 Introduction Data mining in company Types of data mining
know about the past activities of their customers make strategic decisions Types of data mining Association rules mining Clustering Classification

4 Association rules Possible solutions: Complexity of exponential order
Apriori, FPclose, … Complexity of exponential order Required a long time for simple mining algorithm Motivated to outsource mining responsibility

5 Concerns in outsourcing
Correctness Security Privacy of records Information of the company Company DB Data Miner

6 Item mapping

7 Item mapping Substitution cipher has been widely used in encryption in text Each letter is replaced by another letter In encryption of transactions Each item is replaced by another item

8 Example bread -> 54 chocolate -> 165
What is <8, 69, 153, 756>? <cheese, fork, ice-cream, clock>

9 Security Substitution cipher is not secure Frequency analysis
Frequency analysis can be done to recover the original plain text Frequency analysis We know the relative frequencies of each letter in dictionary Find frequencies of letters in the cipher text Perform matching using both frequencies

10 Security in item substitution
Item substitution is more secure than substitution in text The number of possible mappings is much larger (|I|! where I is the entire set of items) Relative frequencies of itemset is not known There is no ways to determine whether a guess on the mappings is correct

11 Architecture Association Rules Association Rules Mappings DB DB
Transformer

12 Simple scheme (one-to-one item mapping)
T: the database I: a set of items in T D: another set of items with |D| = |I| T’: the transformed database m: I -> D : one-to-one item mapping Encryption For each transaction t in T = <x1, x2, …, xn> t’ = <m(x1), m(x2), …, m(xn)> Decryption of association rules For every association rule in T’ <a1, .., ak> => <b1, …bn> <m-1(a1), …, m-1(ak)> => <m-1(b1), …, m-1(bn)> is a rule in T

13 Privacy breach Statistical information is known by the third party miner Counts of some itemsets Number of rules in the database Number of large itemsets

14 Fake items m: I -> D : one-to-one item mapping
F: a set of items not in D For each transaction t, a subset of F is added to t’

15 Example I = {a, b}, D = {1, 2} m(a) = 1, m(b) = 2 F = {3}
T = {<a>, <b>} T’ = {<1, 3>, <2>}

16 Functions of fake items
Increase the difficulty to get a correct guess on a mapping of item (from 1/|I| to 1/(|I| + |F|)) Protect statistical information of the data owner The miner do not know the exact information

17 Better way to utilize “fake” items
A one-to-n item mapping B: a set of items m: I -> 2B Example m(a) = {1, 4, 5} m(b) = {2} m(c) = {3, 5}

18 Advantage In an ideal situation, it is even more difficulty to get a correct guess on a mapping of item One-to-one item mapping (no fake items): 1/|I| With fake items :1/(|I| + |F|) One-to-n item mapping: 1 / (2|B| - 1) It does not introduce additional overheads compared to adding fake items

19 Itemset mapping m: I -> 2B : one-to-n item mapping
M: 2I -> 2B : itemset mapping X: an itemset with (x1, x2,…, xn) M(X) = Ux in X m(x) M-1(Y) = {x | x in I and m(x) is subset of Y}

20 Example itemset mapping
m(b) = {2} m(c) = {3, 5} M(<a, b>) = <1, 2, 4, 5> M(<a, c>) = <1, 3, 4, 5> M(<b, c>) = <2, 3, 5>

21 Example inverse itemset mapping
m(b) = {2} m(c) = {3, 5} M-1(<1, 2, 4, 5>) = <a, b> M-1(<1, 3, 4, 5>) = <a, c> M-1(<1, 2, 3, 5>) is not well-defined

22 Correctness – restrictions on one-to-n mapping
m(b) = {2, 3} m(c) = {1, 3} M(<a, b>) = <1, 2, 3> M(<a, b, c>) = <1, 2, 3> Each mapping of an item must contain unique stuff

23 Admissible item mapping
m: I -> 2B m is admissible iff For all x in I m(x) – U y in I – {x} m(y) is not empty Guarantee M-1(M(X)) = X (correct decryption)

24 Security of one-to-n item mapping
Function cover M1: 2I -> 2D1 M2: 2I -> 2D2 M1 covers M2 iff for all X is a subset of I Y = M2(X) M2-1(Y ) = M1-1(Y intersect D1)

25 Function cover If M1 covers M2 Our results
Any transaction encrypted by M2 can be decrypted by using the inverse of M1 Our results A one-to-n itemset mapping built on an admissible one-to-n item mapping is covered by some one-to-one itemset mapping

26 Example transformation
{b} {c} {a, b} {a, c} {b, c} {a, b, c} T’ = {1, 4, 5} {2} {3, 5} {1, 2, 4, 5} {1, 3, 5} {2, 3, 5} {1, 2, 3, 4, 5} M: a -> {1, 4, 5} b -> {2} c -> {3, 5} M’: a -> {1} b -> {2} c -> {3}

27 Function cover Suppose a data owner encrypted his own database using a one-to-n item mapping m An adversary do not need to find m, but instead he look for a one-to-one item mapping m’ Conclusion One-to-n item mapping is not more secure than a one-to-one item mapping

28 Transaction transformation
M: 2I -> 2B : One-to-n itemset mapping F: a set of items not in B To hide B N: transaction transformation Maps from 2I to 2BUF N(t) = M(t) U E E is a random subset of B U F N-1(t’) = {x | m(x) in t’}

29 Transaction transformation
Valid N-1(N(t)) = t Complete (based on valid) s is a program to generate E outs(t) is the set of all possible outputs of s with transaction t If for all t For any E such that t’ = M(t) U E and N-1(t’) = t E in outs(t)

30 Valid and complete transaction transformation
N: valid and complete transaction transformation There does not exist a one-to-one itemset mappint M’ which covers N Cannot analyze it as a one-to-one mapping Increased difficulty in cryptanalysis

31 Example transformation
{b} {c} {a, b} {a, c} {b, c} {a, b, c} T’ = {1, 4, 5} {2, 1, 3} {3, 5, 4} {1, 2, 4, 5} {1, 3, 5} {2, 3, 5, 1} {1, 2, 3, 4, 5} M: a -> {1, 4, 5} b -> {2} c -> {3, 5} M’: a -> ?

32 Conclusion on transaction transformation
Correct results More difficult to perform cryptanalysis Overhead comparable to one-to-one item mapping with fake items Transaction transformation is a non-deterministic approach

33 Recovering association rules
Given an encrypted rule in T’ X => Y Find A = M-1(X) and B = M-1(X U Y) If both are well-defined The rule in T is A => B - A

34 Example Given 1 => 4 (rejected) 2 => 1, 5 (rejected)
b  2 c  3, 5 Given 1 => 4 (rejected) 2 => 1, 5 (rejected) 2 => 1, 3, 5 (rejected) 2 => 1, 3, 4, 5 (b => ac) 2, 3, 5 => 1, 4 (bc => a) 2, 3, 5 => bc 1, 2, 3, 4, 5 => abc

35 Algorithms

36 Algorithms Developed Generate an admissible one-to-n item mapping
Perform a transaction transformation given a transaction t and an admissible one-to-n item mapping Valid and complete

37 Experiments

38 Frequency analysis One-to-one item mapping vs one-to-n transaction transformation One-to-n mapping Limited to size 2 Each item can be mapped to at most 2 items Adversary: Genetic algorithm Automated frequency analysis tools using a random approach

39 Parameters One-to-one: 5 fake items
One-to-n: |I| items maps to |I|+5 items Two initialization strategies Good Each mapping of item has a chance to be initiated to be a correct one Random Each mapping of item is initiated totally random

40

41 20 items 35 items

42 Observation One-to-n requires more information to be prepared
Takes a longer time to calculate the goodness of current state Long time spent in each iteration Shows slow progress when near the optimal solution Similar progress but in fact a poorer result

43 Background knowledge Assume the adversary has an approximation from a global result of mining alpha: knows alpha% of large itemsets in original database beta: the support in the knowledge is in the range real support * (1 beta)

44 beta alpha beta alpha

45 Overhead measure Measure the overhead introduced by transaction transformation Parameters Em: expected size of a mapping of an item NE: expected size of B intersect E NF: expected size of F intersect E

46 Em = 1.6, Em = 1.6, Em = 1.4,

47 Summary The idea of substitution cipher is used in the problem of encryption of transaction database One-to-n item mapping cannot be directly applied since it is effectively a one-to-one item mapping Transaction transformation is proposed Experiments show that it is suitable for outsourcing

48 The End


Download ppt "Security in Outsourcing of Association Rule Mining"

Similar presentations


Ads by Google