Mining Multiple Private Databases Topk Queries Across Multiple Private Databases (2005) Li Xiong (Emory University) Subramanyam Chitti (GA Tech) Ling Liu.

Mining Multiple Private Databases Topk Queries Across Multiple Private Databases (2005) Li Xiong (Emory University) Subramanyam Chitti (GA Tech) Ling Liu (GA Tech) Presented by: Cesar Gutierrez

2 About Me ISYE Senior and CS minor Graduating December, 2008 Humanitarian Logistics and/or Supply Chain Originally from Lima, Peru Travel, paintball and politics

3 Outline Intro. & Motivation Problem Definition Important Concepts & Examples Private Algorithm Conclusion

4 Introduction ↓ of information-sharing restrictions due to technology ↑ need for distributed data-mining tools that preserve privacy Trade-off Accuracy EfficiencyPrivacy

5 Motivating Scenarios CDC needs to study insurance data to detect disease outbreaks  Disease incidents  Disease seriousness  Patient Background Legal/Commercial Problems prevent release of policy holder's information

6 Motivating Scenarios (cont'd) Industrial trade group collaboration  Useful pattern: "manufacturing using chemical supplies from supplier X have high failure rates"  Trade secret: "manufacturing process Y gives low failure rate"

7 Model: n nodes, horizontal partitioning Assume Semi-honesty:  Nodes follow specified protocol  Nodes attempt to learn additional information about other nodes Problem & Assumptions...

8 Challenges Why not use a Trusted Third Party (TTP)?  Difficult to find one that is trusted  Increased danger from single point of compromise Why not use secure multi-party computation techniques?  High communication overhead  Feasible for small inputs only

9 Recall Our 3-D Goal Privacy Accuracy Efficiency

10 Private Max 1 3 2 4 30 20 40 10 30 40 start Actual Data sent on first pass Static Starting Point Known

11 Multi-Round Max Start 183532 4035 D2D2 D3D3 D2D2 D4D4 30 2040 10 183532 4035 0 Randomly perturbed data passed to successor during multiple passes No successor can determine actual data from it's predecessor Randomized Starting Point

12 Evaluation Parameters Large k = "avoid information leaks" Large d = more randomization = more privacy Small d = more accurate (deterministic) Large r = "as accurate as ordinary classifier"

13 Accuracy Results

14 Varying Rounds

15 Privacy Results

16 Conclusion Problems Tackled  Preserving efficiency and accuracy while introducing provable privacy to the system  Improving a naive protocol  Reducing privacy risk in an efficient manner

17 Critique Dependency on other research papers in order to obtain a full understanding Few/No Illustrations A real life example would have created a better understanding of the charts

Mining Multiple Private Databases Topk Queries Across Multiple Private Databases (2005) Li Xiong (Emory University) Subramanyam Chitti (GA Tech) Ling Liu.

Similar presentations

Presentation on theme: "Mining Multiple Private Databases Topk Queries Across Multiple Private Databases (2005) Li Xiong (Emory University) Subramanyam Chitti (GA Tech) Ling Liu."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Mining Multiple Private Databases Topk Queries Across Multiple Private Databases (2005) Li Xiong (Emory University) Subramanyam Chitti (GA Tech) Ling Liu.

Similar presentations

Presentation on theme: "Mining Multiple Private Databases Topk Queries Across Multiple Private Databases (2005) Li Xiong (Emory University) Subramanyam Chitti (GA Tech) Ling Liu."— Presentation transcript:

Similar presentations

About project

Feedback