Download presentation
Presentation is loading. Please wait.
1
Privacy Preserving Data Mining Yehuda Lindell & Benny Pinkas
2
Summary Objective Various components / tools needed Algorithm
3
Objective Perform Data-mining on union of two private databases Data stays private i.e. no party learns anything but output
4
Assumptions Large Databases – Generic Solutions not possible Semi-Honest Parties
5
Classification by Decision Tree Learning Transaction Attributes Class Attribute Want to Predict Class, using only non-class attributes
6
Decision Tree Rooted tree with nodes/edges Internal Nodes => Attributes Edges leaving nodes => Possible values Leaves => Expected Class for transaction – Traverse tree using known attributes – Predict class given leaf node’s value
7
Constructing Tree Top-down At each level – find attribute that “best” classifies transactions => gives least overhead – Best => Attribute that minimizes entropy (maximizes information gain) – Entropy = -xlnx – Entropy of class = 0
8
Entropy calcluations Entropy – H(T) = sum (-x ln x ) – Hc(T) => Info needed to ID class of transaction T X = set of transactions for each class Sum over all possible classes – Hc(T | A) => Info needed to ID class of transaction T, Given value v of attribute A X = transactions with value = v for attribute A – Gain = Hc(T) – Hc(T | A)
9
Private Computation Given only x1 and f1(x1,y), function S1 exists s.t.: – P2 provides input x1 to P1 – P2 can compute corresponding view of P1’s DB (desired pairs) S1 f1 Party 2 Party 1 x1 f1(x1,y) View
10
Oblivious Evaluation What if in previous example: Party 2 does not want Party 1 to know what input (x1) it is providing? Oblivious Evaluation: Receiver obtains P(x) without learning anything else about polynomial P. Sender learns nothing about x.
11
Oblivious Evaluation (2) – Simplified Version ri = receiver’s random number Ri = sender’s random number X = input from rcvr SenderReceiver s (secret key) (a ri, a s*rj a x ) (a Ri, a s*R a P(x) a sri ) Divide 2 nd element by 1 st element raised to power s to get P(x) a P(x) = (a Ri, a s*R a P(x) a sri ) / (a Ri * a ri ) s
12
Algorithm Step 1 - Each party computes ID3 – decision tree learning – (O(# attributes)) Step 2 - Combine results using cryptographic protocols like oblivious evaluation - (O(log(#transactions))) Result - Each party gains results of data- mining without learning more than necessary
13
Algorithm (2) Finding “best” attribute is hardest part Each party computes their “share” of entropy – For each attribute, combine values from each party – Results in private computation of Entropy (-xlnx) Choose attribute that minimizes entropy – Provides maximum information gain – Ensures most efficient tree with least overhead – Use oblivious Evaluation
14
Discussion of Algorithm Efficient: – Large Databases accommodated: Algorithm relies on number of possible values for attributes – NOT number of transactions in database Private: – Each step depends on local computation and private protocol – Uses techniques like oblivious transfer / evaluation to exchange information – Paper proves individual steps are private, AND can predict control flow between steps ONLY based on input/output – so also private
15
Discussion of Algorithm (2) Approximate ID3 used instead of actual ID3 – shown to be as secure and provide same information
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.