Download presentation
Presentation is loading. Please wait.
Published byJosephine Poole Modified over 9 years ago
1
Security in Outsourced Association Rule Mining
2
Agenda Introduction Approximate randomized technique Encryption Summary and future work
3
Introduction Data mining in company know about the past activities of their customers make strategic decisions Types of data mining Association rules mining Clustering Classification
4
Association rules “X => Y” If a transaction contains itemset X, the transaction will probably contain itemset Y Support: number of supporting transactions Confidence: proportion of transactions containing X which also contains Y
5
Performing data mining Build application Development cost? Time? Buy software Fit requirements? Maintenance? Outsource
6
Concerns in outsourcing Output Execution Assurance Correctness Security Privacy of records Information of the company Company DB Data Miner
7
Approximate randomized technique
8
Approximate solution Privacy Preserving Mining of Association Rules SIGKDD 2002 Authors: Alexandre Evfimievski, Ramakrishnan Srikant, Rakesh Agrawal, Johannes Gehrke
9
Problem formulation Let the set of transactions be T = {t 1, t 2, … t N } Transform T to T’ = {t’ 1, t’ 2, … t’ N } Mine in T’ Privacy breaches Itemset A cause a privacy breach of level p if for some item a in A P[a in t i |A in t’ i ] >= p
10
Select-a-size randomization For each transaction t i in T m = length of t i Select (non-uniformly) randomly an integer j from [0, m] Copy uniformly at random j items in t i to t’ i Consider every item a not in t i, add a to t’ i with a given probability p m
11
Run on real data Privacy breach of level <= 50% P[a in t i |A in t’ i ] <= 50% Accuracy = # true positive / (# found itemsets) Set 1 Itemset Size True Itemset True Positive False Drops False Positive Accuracy 165 00100% 2228212162888% 322184578%
12
Accuracy Set 2: Itemset Size True Itemset True Positive False Drops False Positive Accuracy 1266254123189% 2217195224581% 3484352662%
13
Problems Estimated counts of large itemsets varies Lower accuracy of association rules "beer and diaper" story customers who buy diapers tend also to buy beer hard to believe some strange rules Expensive to make wrong decision Supermarket: layout design Health center: identify new disease
14
Security concerns Individual transaction is protected Private association rules can be estimated by other parties Adversary actions may be based on found association rules
15
Encryption
16
Problem formulation Let the set of transactions be T = {t 1, t 2, … t N } I is the entire set of items All t i is a subset of I Transform T to T’ = {t’ 1, t’ 2, … t’ N } A third party mines in T’ and gets AR’ Transform AR’ to AR
17
Architecture DB Transformer Association Rules Association Rules Mappings
18
Encryption To protect a message, simple encryption can be applied “GOOD DOG” can be encrypted as “PLLX XLP” Association rule encryption 752 => 891? Milk => Bread Transaction encryption ?
19
Simple scheme Encryption For every transaction t i For every item x in t i Add f(x) to t’ i where f is a bi-jective function Decryption For every association rule r i For every item y in r Replace y by f -1 (y)
20
Problems with simple encryption They are easy to crack “PLLX XLP” 26 P 3 combinations, with at least one vowel Association rules # Bread > # Car # association rules, # large itemsets are disclosed Solution Use a more complex scheme
21
Fake items Probability to make a correct guess of a single mapping = 1 / |I| Randomly add some fake items to each transaction Decrease the above probability to 1 / (|I| + |F|)
22
One-to-n Mapping Originally, we are “one-to-one” mapping One item One item A 1 B 2 C 3 We form “one-to-n” mapping A 1, 4, 5 B 2 C 3, 5 Greatly increase the number of possible mapping of an item |I|+|F| C 1 + |I|+|F| C 2 + … |I|+|F| C |F|
23
Example transformation T = {A} {B} {C} {A, B} {A, C} {B, C} {A, B, C} T’ = {1, 4, 5} {2} {3, 5} {1, 2, 4, 5} {1, 3, 5} {2, 3, 5} {1, 2, 3, 4, 5} A 1, 4, 5 B 2 C 3, 5
24
Limitation on the mapping f For any item x, there does not exist items y 1, y 2, …, y k (x ≠ y 1 ≠ … ≠ y k ) Such that f(x) subset in f(y 1 ) U f(y 2 ) U…f(y k ) Consider an example A 1, 2 B 2, 3 C 3, 4 AC 1, 2, 3, 4 ABC 1, 2, 3, 4
25
Limitation on the mapping f For any item x f(x) – U i != x, i in I f(i) != empty Every item must map to something unique
26
Mapping generation – Item Extend Initialize every item to map to something unique I’ For every item x in IE Randomly pick some mappings Extend each mapping by x
27
Example run A 1 B 2 C 3 IE = {4, 5}
28
Considering item 4 A 1 B 2 C 3 A 1, 4 B 2 C 3 Pick A
29
Considering item 5 A 1 B 2 C 3 A 1, 4, 5 B 2 C 3, 5 Pick A, C
30
Item Extend Every item must map to something unique Say 1 is unique to f(A) supp T (A) = supp T’ (1) For a transaction t without item A Add a subset of unique mapping set to t’ with some probabilitysome probability {1, 4} is unique mapping set in f(A) {}, {1}, {4}, {1, 4} may be added A 1, 4, 5 B 2 C 3, 5
31
Fake items again Now, every item in t’ i must be in some mappings Randomly add some fake items in |F| to each transaction Mapping f: I -> |I’| U |IE| U |F| |I’|: core “unique” items |IE|: expanding items |F|: fake items
32
Basic transformation framework For each transaction t For each item x in t Add f(x) to t’ For item i in I - t Add randomly subset of unique mapping set of f(i) to t’ For item f in F Toss a biased coin for each item, add f to t’ if head (probability should be difference)
33
Recovering association rules Given an encrypted rule in AR’ r’: X => Y If there exists i 1, i 2, …, i m in I U k=1 m f(i k ) = X And there exists j 1, j 2, …, j n in I U k=1 n f(j k ) = XUY r: {i 1, i 2, … i m } => {j 1, j 2, …, j n } – {i 1, i 2, … i m } is a rule in AR Otherwise, the rule is not correct
34
Example Given 1 => 4 (rejected) 2 => 1, 5 (rejected) 2 => 1, 3, 5 (rejected) 2 => 1, 3, 4, 5 (B => AC) 2, 3, 5 => 1, 4 (BC => A) 2, 3, 5 => BC 1, 2, 3, 4, 5 => ABC Mapping f A 1, 4, 5 B 2 C 3, 5
35
Correctness Proposition For any item x, y, f is transformation mapping supp T (x) = supp T’ (f(x)) supp T (xUy) = supp T’ (f(x) U f(y)) For any itemset X, Y, F is the transformation mapping supp T (X) = supp T’ (F(X)) supp T (XUY) = supp T’ (F(X) U F(Y)) No false drops and false positives
36
Summary Generation of mappings One-to-n mappings Item Extend Transformation of transactions Mapping f(x) Subsets of unique mapping set Fake items Recovering association rules Reverse mappings and filtering
37
Test run # Items = 1k, |T| = 1k Without transformation One rule Time: 8s Item Extend 147 rules Total times: 26s Mappings generation and transformation: 219ms
38
Future Work Define parameters to the problem Size of |IE| Size of |F| Give a clear measure of security Give a clear measure of overhead Correctness of association rules Query execution proof Result verification
39
The End
40
Choosing probability Uniform distribution or any fixed distribution give patterns which may be easily identified Random probability distribution {}: 70%, {1}: 5%, {4}: 15%, {1, 4}: 20% Storage: need additional storage Back
41
Algorithm for transformation Transformation is the most costly process Execution time linear to database size |T| Should be as fast as possible
42
Optimization Mapping Retrieval For an item x, use a hash table to retrieve the mapping, h(x) Adding fake items First randomly (according to the probability of adding items) determine the number of items to add Randomly pick in the set (non-uniform distribution) Gives a much shorter runtime in average
43
Choice of mapped items 12…|I|+|IE|+|F| * (1+ δ) Acceptable as long as it is not easy to identify I’, IE, F One way is to use random permutation of first |I| + |IE| + |F| natural numbers First |I| numbers are mapped to |I’| Next |IE| numbers are IE
44
Cut and paste randomization One case of select-a-size randomization The way to perform selection of j Given an integer K m > 0 Randomly choose j in [0, K m ] If (j > m) Set j = m Overall input parameters K m p m
45
Effects on support Support of A in T’ A in t, without replaced A’ in t, randomly add A Support of AB in T’ AB in t, without replaced A and B AB’ in t, randomly add B A’B in t, randomly add A A’B’ in t, randomly add A and B
46
Estimating original support Support of A in T, x Support of A in T’, y x * P(A remains in original transaction) + (|DB| - x) * p m = y Support of AB in T Support of AB in T’ Support of AB’, A’B in T’ Support of A’B’ in T’
47
Apriori property Suppose m = 2 for all t in T |T| = 10, |I| = {A, B} p m = 0, j = 1, Support of B in T’ supp T’ (B)= 0 E(supp T (B)) = 0 supp T’ (A)= 10 supp T’ (AB)= 0 E(supp T (AB)) = supp T’ (A) * 1 = 10
48
Apriori property An expected large itemset may have an expected small sub-set But generally the support of subsets are not too small Instead of using the support threshold to filter all small candidates, use a smaller value
49
Apriori algorithm Generate candidate sets Scan database for counts Recover the predicted support Discard candidates with support smaller than <= candidate limit Save for output candidates with support >= support threshold Apriori_gen(remaining candidate)
50
Candidate limit A high value Increase numbers of false drops Poor correctness A small value Increase number of candidate sets High running time Experiment Support threshold: s min estimated s.d.: δ s min – δ is found to be a good value
51
Other applications Outsourced transaction database (secure) storage Outsourced association rule mining using data stream Secure distributed association rule mining with third party miner
52
Outsourced database with association rule mining service DB Transformer Association Rules Association Rules Mappings Transactions Query
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.