Mining Multiple Private Databases Topk Queries Across Multiple Private Databases (2005) Mining Multiple Private Databases Using a kNN Classifier (2007)

Slides:

Advertisements

Similar presentations

Publishing Set-Valued Data via Differential Privacy Rui Chen, Concordia University Noman Mohammed, Concordia University Benjamin C. M. Fung, Concordia.

Advertisements

Secure Multiparty Computations on Bitcoin

Danzhou Liu Ee-Peng Lim Wee-Keong Ng

Distribution and Revocation of Cryptographic Keys in Sensor Networks Amrinder Singh Dept. of Computer Science Virginia Tech.

Location Privacy Preservation in Collaborative Spectrum Sensing Shuai Li, Haojin Zhu, Zhaoyu Gao, Xinping Guan, Shanghai Jiao Tong University Kai Xing.

Data Mining Classification: Alternative Techniques

Data Mining Classification: Alternative Techniques

Indian Statistical Institute Kolkata

Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques.

INTRODUCTION PROBLEM FORMULATION FRAMEWORK AND PRIVACY REQUIREMENTS FOR MRSE PRIVACY-PRESERVING AND EFFICIENT MRSE PERFORMANCE ANALYSIS RELATED WORK CONCLUSION.

K nearest neighbor and Rocchio algorithm

Data Security against Knowledge Loss *) by Zbigniew W. Ras University of North Carolina, Charlotte, USA.

Basic Data Mining Techniques Chapter Decision Trees.

Privacy Preserving K-means Clustering on Vertically Partitioned Data Presented by: Jaideep Vaidya Joint work: Prof. Chris Clifton.

Probabilistic Data Aggregation Ling Huang, Ben Zhao, Anthony Joseph Sahara Retreat January, 2004.

Tracking Moving Objects in Anonymized Trajectories Nikolay Vyahhi 1, Spiridon Bakiras 2, Panos Kalnis 3, and Gabriel Ghinita 3 1 St. Petersburg State University.

Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.

Privacy Preserving Data Mining Yehuda Lindell & Benny Pinkas.

Privacy Preserving Data Mining: An Overview and Examination of Euclidean Distance Preserving Data Transformation Chris Giannella cgiannel AT acm DOT org.

Privacy-Aware Computing Introduction. Outline  Brief introduction Motivating applications Major research issues  Tentative schedule  Reading assignments.

Privacy Preserving Learning of Decision Trees Benny Pinkas HP Labs Joint work with Yehuda Lindell (done while at the Weizmann Institute)

Database k-Nearest Neighbors in Uncertain Graphs Lin Yincheng VLDB10.

CS Instance Based Learning1 Instance Based Learning.

Database Laboratory Regular Seminar TaeHoon Kim.

CS573 Data Privacy and Security

A Distributed and Privacy Preserving Algorithm for Identifying Information Hubs in Social Networks M.U. Ilyas, Z Shafiq, Alex Liu, H Radha Michigan State.

Link Recommendation In P2P Social Networks Yusuf Aytaş, Hakan Ferhatosmanoğlu, Özgür Ulusoy Bilkent University, Ankara, Turkey.

Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.

1 Privacy-Preserving Distributed Information Sharing Nan Zhang and Wei Zhao Texas A&M University, USA.

Data mining and machine learning A brief introduction.

Secure Incremental Maintenance of Distributed Association Rules.

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

Similarity based Retrieval from Sequence Databases using Automata as Queries 作者 : A. Prasad Sistla, Tao Hu, Vikas howdhry 出處 :CIKM 2002 ACM 指導教授 : 郭煌政老師.

On the Practical Feasibility of Secure Distributed Computing A Case Study Gregory Neven, Frank Piessens, Bart De Decker Dept. of Computer Science, K.U.Leuven.

Tools for Privacy Preserving Distributed Data Mining

Incentive Compatible Privacy-Preserving Data Analysis.

Chapter 8 The k-Means Algorithm and Genetic Algorithm.

Mining Multiple Private Databases Topk Queries Across Multiple Private Databases (2005) Li Xiong (Emory University) Subramanyam Chitti (GA Tech) Ling Liu.

Multiplicative Data Perturbations. Outline  Introduction  Multiplicative data perturbations Rotation perturbation Geometric Data Perturbation Random.

1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”

Secure Sensor Data/Information Management and Mining Bhavani Thuraisingham The University of Texas at Dallas October 2005.

Distributed Classification in Peer-to-Peer Networks Ping Luo, Hui Xiong, Kevin Lü, Zhongzhi Shi Institute of Computing Technology, Chinese Academy of Sciences.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Statistical Inference (By Michael Jordon) l Bayesian perspective –conditional perspective—inferences.

Security Control Methods for Statistical Database Li Xiong CS573 Data Privacy and Security.

Additive Data Perturbation: the Basic Problem and Techniques.

Privacy Preserving Schema and Data Matching Scannapieco, Bertino, Figotin and Elmargarmid Presented by : Vidhi Thapa.

Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation.

Privacy-Preserving K-means Clustering over Vertically Partitioned Data Reporter ： Ximeng Liu Supervisor: Rongxing Lu School of EEE, NTU

Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.

Query Sensitive Embeddings Vassilis Athitsos, Marios Hadjieleftheriou, George Kollios, Stan Sclaroff.

Illustration: 3-Party Secure Sum Compare, match, and analyze data from different organizations without disclosing the private data to any other party Experimental.

Bloom Cookies: Web Search Personalization without User Tracking Authors: Nitesh Mor, Oriana Riva, Suman Nath, and John Kubiatowicz Presented by Ben Summers.

Privacy preserving data mining – multiplicative perturbation techniques Li Xiong CS573 Data Privacy and Anonymity.

1 Privacy Preserving Data Mining Introduction August 2 nd, 2013 Shaibal Chakrabarty.

Differential Privacy Xintao Wu Oct 31, Sanitization approaches Input perturbation –Add noise to data –Generalize data Summary statistics –Means,

Secure Data Outsourcing

Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.

Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.

Second Price Auctions A Case Study of Secure Distributed Computing Bart De Decker Gregory Neven Frank Piessens Erik Van Hoeymissen.

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.

Privacy Preserving Outlier Detection using Locality Sensitive Hashing

IIIT Hyderabad Private Outlier Detection and Content based Encrypted Search Nisarg Raval MS by Research, CSE Advisors : Prof. C. V. Jawahar & Dr. Kannan.

Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.

VIEWS b.ppt-1 Managing Intelligent Decision Support Networks in Biosurveillance PHIN 2008, Session G1, August 27, 2008 Mohammad Hashemian, MS, Zaruhi.

DECISION TREE INDUCTION CLASSIFICATION AND PREDICTION What is classification? what is prediction? Issues for classification and prediction. What is decision.

Computer Science and Engineering Parallelizing Feature Mining Using FREERIDE Leonid Glimcher P. 1 ipdps’04 Scaling and Parallelizing a Scientific Feature.

SEEM5770/ECLT5840 Course Review

CS573 Data Privacy and Security

Privacy Preserving Data Mining

Physics-guided machine learning for milling stability:

Presentation transcript:

Mining Multiple Private Databases Topk Queries Across Multiple Private Databases (2005) Mining Multiple Private Databases Using a kNN Classifier (2007) Li Xiong (Emory University) Subramanyam Chitti (GA Tech) Ling Liu (GA Tech) Presented by: Chris Baker

2 About Me 4th year Undergraduate, Junior Graduating December, 2009 Data Management, Software Engineering Macon, GA (Very) Small Business Owner (Web Development) SCUBA Dive, Travel

3 Outline Intro. & Motivation Problem Definition Important Concepts & Examples kNN Algorithm kNN Experiment Conclusion

4 Introduction ↓ of information-sharing restrictions due to technology ↑ need for distributed data-mining tools that preserve privacy Trade-off Accuracy EfficiencyPrivacy

5 Motivating Scenarios CDC needs to study insurance data to detect disease outbreaks  Disease incidents  Disease seriousness  Patient Background Legal/Commercial Problems prevent release of policy holder's information

6 Motivating Scenarios (cont'd) Industrial trade group collaboration  Useful pattern: "manufacturing using chemical supplies from supplier X have high failure rates"  Trade secret: "manufacturing process Y gives low failure rate"

7 Model: n nodes, horizontal partitioning Assume Semi-honesty:  Nodes follow specified protocol  Nodes attempt to learn additional information about other nodes Problem & Assumptions...

8 Challenges Why not use a Trusted Third Party (TTP)?  Difficult to find one that is trusted  Increased danger from single point of compromise Why not use secure multi-party computation techniques?  High communication overhead  Feasible for small inputs only

9 Recall Our 3-D Goal Privacy Accuracy Efficiency

10 Important Concepts Successor Multi-round Local Computation Randomization (Probabilistic Protocols) Starting Point Private Data D1 Private Data D3 Private Data D2 Private Data Dn … OutputInput OutputInput Output Local Computation Local Computation Local Computation Local Computation

11 Multi-Round Protocol Examples Primitive  Min/Max  Top K  Sum  Union  Join/Intersection Complex  kNN Classification  K-Means Clustering  k-Anonymization

12 Naїve Max start Actual Data sent on first pass Static Starting Point Known

13 Multi-Round Max Start D2D2 D3D3 D2D2 D4D Randomly perturbed data passed to successor during multiple passes No successor can determine actual data from it's predecessor Randomized Starting Point

14 kNN Sub-Problems Nearest-Neighbor Selection  ID k nearest neighbors of query x Classification  Each node classifies x and cooperates to determine global classification Both parts must function in a privacy-preserving manner.  Nodes should not be able to tell what information is from any other node  Local classification must not be revealed during global classification

15 Parameters Large k = "avoid information leaks" Large d = more randomization = more privacy Small d = more accurate (deterministic) Large r = "as accurate as ordinary classifier"

16 KNN Classification Algorithm Each node computes the distance between x and each point y in its database, d(x,y), selects k smallest distances (locally), and stores them in a local distance vector ldv. Using ldv as input, the nodes use the privacy preserving nearest distance selection protocol to select k nearest distances (globally), and stores them in gdv. Each node selects the kth nearest distance ∆: ∆ = gdv(k). Assuming there are v classes, each node calculates a local classification vector lcv for all points y in its database:, where d(x,y) is the distance between x and y, f(y) is the classification of point y, and [p] is a function that evaluates to 1 if the predicate p is true, and 0 otherwise. Using lcv as input, the nodes use the privacy preserving classification protocol to calculate the global classification vector gcv. Each node assigns the classification of x as, Input: x, an instance to be classified Output: classification(x), classification of x

17 Experimental Datasets GLASS  214 Inst.  7 Classes  9 Attr. PIMA  768 Inst.  8 Classes  8 Attr. ABALONE  4177 Inst.  29 Classes  8 Attr. Broken glass in crime scene investigation Diabetes determination Age prediction of mollusks (mother-of-pearl)

18 Accuracy Results

19 Varying Rounds

20 Privacy Results

21 Conclusion Problems Tackled  Preserving efficiency and accuracy while introducing provable privacy to the system  Constructing k-nearest neighbor classifier over horizontally partitioned databases

22 Critique Weakness: assuming honesty (unrealistic) Few/No Illustrations Dependency of Research