Download presentation
Presentation is loading. Please wait.
Published byMolly Williams Modified over 9 years ago
1
Mining Multiple Private Databases Topk Queries Across Multiple Private Databases (2005) Li Xiong (Emory University) Subramanyam Chitti (GA Tech) Ling Liu (GA Tech) Presented by: Cesar Gutierrez
2
2 About Me ISYE Senior and CS minor Graduating December, 2008 Humanitarian Logistics and/or Supply Chain Originally from Lima, Peru Travel, paintball and politics
3
3 Outline Intro. & Motivation Problem Definition Important Concepts & Examples Private Algorithm Conclusion
4
4 Introduction ↓ of information-sharing restrictions due to technology ↑ need for distributed data-mining tools that preserve privacy Trade-off Accuracy EfficiencyPrivacy
5
5 Motivating Scenarios CDC needs to study insurance data to detect disease outbreaks Disease incidents Disease seriousness Patient Background Legal/Commercial Problems prevent release of policy holder's information
6
6 Motivating Scenarios (cont'd) Industrial trade group collaboration Useful pattern: "manufacturing using chemical supplies from supplier X have high failure rates" Trade secret: "manufacturing process Y gives low failure rate"
7
7 Model: n nodes, horizontal partitioning Assume Semi-honesty: Nodes follow specified protocol Nodes attempt to learn additional information about other nodes Problem & Assumptions...
8
8 Challenges Why not use a Trusted Third Party (TTP)? Difficult to find one that is trusted Increased danger from single point of compromise Why not use secure multi-party computation techniques? High communication overhead Feasible for small inputs only
9
9 Recall Our 3-D Goal Privacy Accuracy Efficiency
10
10 Private Max 1 3 2 4 30 20 40 10 30 40 start Actual Data sent on first pass Static Starting Point Known
11
11 Multi-Round Max Start 183532 4035 D2D2 D3D3 D2D2 D4D4 30 2040 10 183532 4035 0 Randomly perturbed data passed to successor during multiple passes No successor can determine actual data from it's predecessor Randomized Starting Point
12
12 Evaluation Parameters Large k = "avoid information leaks" Large d = more randomization = more privacy Small d = more accurate (deterministic) Large r = "as accurate as ordinary classifier"
13
13 Accuracy Results
14
14 Varying Rounds
15
15 Privacy Results
16
16 Conclusion Problems Tackled Preserving efficiency and accuracy while introducing provable privacy to the system Improving a naive protocol Reducing privacy risk in an efficient manner
17
17 Critique Dependency on other research papers in order to obtain a full understanding Few/No Illustrations A real life example would have created a better understanding of the charts
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.