Privacy-Protecting Statistics Computation: Theory and Practice

Slides:



Advertisements
Similar presentations
1 Cryptography: on the Hope for Privacy in a Digital World Omer Reingold VVeizmann and Harvard CRCS.
Advertisements

Efficient Private Approximation Protocols Piotr Indyk David Woodruff Work in progress.
Oblivious Branching Program Evaluation
ITIS 6200/ Secure multiparty computation – Alice has x, Bob has y, we want to calculate f(x, y) without disclosing the values – We can only do.
CS555Topic 241 Cryptography CS 555 Topic 24: Secure Function Evaluation.
Ethical and Social...J.M.Kizza 1 Module 5: Anonymity, Security, Privacy and Civil Liberties IntroductionAnonymitySecurityPrivacy Ethical and Social Issues.
Lect. 18: Cryptographic Protocols. 2 1.Cryptographic Protocols 2.Special Signatures 3.Secret Sharing and Threshold Cryptography 4.Zero-knowledge Proofs.
Client/Server Computing Model of computing in which very powerful personal computers (clients) are connected in a network with one or more server computers.
CSCE 715 Ankur Jain 11/16/2010. Introduction Design Goals Framework SDT Protocol Achievements of Goals Overhead of SDT Conclusion.
FIT3105 Smart card based authentication and identity management Lecture 4.
How Should We Solve Search Problems Privately? Kobbi Nissim – BGU A. Beimel, T. Malkin, and E. Weinreb.
Privacy-Preserving Trust Negotiations Mikhail Atallah Department of Computer Science Purdue University.
Using Digital Credentials On The World-Wide Web M. Winslett.
Aggregation in Sensor Networks NEST Weekly Meeting Sam Madden Rob Szewczyk 10/4/01.
Privacy in Today’s World: Solutions and Challenges Rebecca Wright Stevens Institute of Technology 26 June, 2003.
Privacy and Integrity Preserving in Distributed Systems Presented for Ph.D. Qualifying Examination Fei Chen Michigan State University August 25 th, 2009.
Privacy-Preserving Computation and Verification of Aggregate Queries on Outsourced Databases Brian Thompson 1, Stuart Haber 2, William G. Horne 2, Tomas.
BCOR 1020 Business Statistics Lecture 21 – April 8, 2008.
Private Information Retrieval Amos Beimel – Ben-Gurion University Tel-Hai, June 4, 2003 This talk is based on talks by:
Privacy Preserving Learning of Decision Trees Benny Pinkas HP Labs Joint work with Yehuda Lindell (done while at the Weizmann Institute)
Control of Personal Information in a Networked World Rebecca Wright Boaz Barak Jim Aspnes Avi Wigderson Sanjeev Arora David Goodman Joan Feigenbaum ToNC.
Security on the Internet Jan Damsgaard Dept. of Informatics Copenhagen Business School
Electronic Commerce. On-line ordering---an e-commerce application On-line ordering assumes that: A company publishes its catalog on the Internet; Customers.
Connecting Diverse Web Search Facilities Udi Manber, Peter Bigot Department of Computer Science University of Arizona Aida Gikouria - M471 University of.
 MODERN DATABASE MANAGEMENT SYSTEMS OVERVIEW BY ENGINEER BILAL AHMAD
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 11 Slide 1 Architectural Design.
ERP Risks, Security Checklist, and Priorities for Change Joy R. Hughes VPIT and CIO George Mason University Co-chair STF.
Practice and Experience in the Application of Cryptography Bao Feng Cryptography and Security Department.
EVALUATION David Kauchak CS 451 – Fall Admin Assignment 3 - change constructor to take zero parameters - instead, in the train method, call getFeatureIndices()
Chapter 1 Database Systems. Good decisions require good information derived from raw facts Data is managed most efficiently when stored in a database.
Lecture 19 Page 1 CS 111 Online Symmetric Cryptosystems C = E(K,P) P = D(K,C) E() and D() are not necessarily the same operations.
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
Technology Panel What technical tools are in our disposal for achieving privacy Privacy: Technology + Policy –Technology can Implement Policy –Without.
E-Commerce Security Professor: Morteza Anvari Student: Xiaoli Li Student ID: March 10, 2001.
On the Practical Feasibility of Secure Distributed Computing A Case Study Gregory Neven, Frank Piessens, Bart De Decker Dept. of Computer Science, K.U.Leuven.
RESOURCES, TRADE-OFFS, AND LIMITATIONS Group 5 8/27/2014.
Privacy-Preserving Trust Negotiations* Mikhail Atallah CERIAS and Department of Computer Sciences Purdue University * Joint work with Keith Frikken and.
Privacy in computing Material/text on the slides from Chapter 10 Textbook: Pfleeger.
Secure Sensor Data/Information Management and Mining Bhavani Thuraisingham The University of Texas at Dallas October 2005.
Privacy-Preserving Credit Checking Keith Frikken, Mikhail Atallah, and Chen Zhang Purdue University June 7, 2005.
Introduction to Database Systems1. 2 Basic Definitions Mini-world Some part of the real world about which data is stored in a database. Data Known facts.
Collusion-Resistant Anonymous Data Collection Method Mafruz Zaman Ashrafi See-Kiong Ng Institute for Infocomm Research Singapore.
Securing Data in Transit and Storage Sanjay Beri Co-Founder & Senior Director of Product Management Ingrian Networks.
Mining Multiple Private Databases Topk Queries Across Multiple Private Databases (2005) Mining Multiple Private Databases Using a kNN Classifier (2007)
Ahmed Osama Research Assistant. Presentation Outline Winc- Nile University- Privacy Preserving Over Network Coding 2  Introduction  Network coding 
Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation.
Query Sensitive Embeddings Vassilis Athitsos, Marios Hadjieleftheriou, George Kollios, Stan Sclaroff.
Data Integrity Proofs in Cloud Storage Author: Sravan Kumar R and Ashutosh Saxena. Source: The Third International Conference on Communication Systems.
Secure Messenger Protocol using AES (Rijndael) Sang won, Lee
Privacy-Preserving Location- Dependent Query Processing Mikhail J. Atallah and Keith B. Frikken Purdue University.
Efficient Private Matching and Set Intersection Mike Freedman, NYU Kobbi Nissim, MSR Benny Pinkas, HP Labs EUROCRYPT 2004.
Private Information Retrieval Based on the talk by Yuval Ishai, Eyal Kushilevitz, Tal Malkin.
Network Security Celia Li Computer Science and Engineering York University.
Erik Jonsson School of Engineering and Computer Science The University of Texas at Dallas Cyber Security Research on Engineering Solutions Dr. Bhavani.
Privacy Preserving Outlier Detection using Locality Sensitive Hashing
3M Display & Graphics © 3M All Rights Reserved. Predict. Measure. Optimize. Control. Realizing the Full Potential of Digital Communication Networks.
Big Data Security Issues in Cloud Management. BDWG Big Data Working Group Researchers 1: Data analytics for security 2: Privacy preserving 3: Big data-scale.
Multi-Party Computation r n parties: P 1,…,P n  P i has input s i  Parties want to compute f(s 1,…,s n ) together  P i doesn’t want any information.
Lower bounds for Unconditionally Secure MPC Ivan Damgård Jesper Buus Nielsen Antigoni Polychroniadou Aarhus University.
Web Applications Security Cryptography 1
Security Outline Encryption Algorithms Authentication Protocols
Security in Outsourcing of Association Rule Mining
Outline What does the OS protect? Authentication for operating systems
Outline What does the OS protect? Authentication for operating systems
Chapter 12: Automated data collection methods
Ethical questions on the use of big data in official statistics
Security for Distributed Computer Systems
Outline Using cryptography in networks IPSec SSL and TLS.
ONLINE SECURE DATA SERVICE
Helen: Maliciously Secure Coopetitive Learning for Linear Models
Presentation transcript:

Privacy-Protecting Statistics Computation: Theory and Practice Rebecca Wright Stevens Institute of Technology 27 March, 2003

Erosion of Privacy “You have zero privacy. Get over it.” - Scott McNealy, 1999 Changes in technology are making privacy harder. reduced cost for data storage increased ability to process large amounts of data Especially critical now (given increased need for security-related surveillance and data mining)

Overview Announcements Introduction Privacy-preserving statistics computation Selective private function evaluation

Announcements DIMACS working group on secure efficient extraction of data from multiple datasets. Initial workshop to be scheduled for Fall 2003. DIMACS crypto and security tutorials to kick off Special Focus on Communication Security and Information Privacy: August 4-7, 2003. NJITES Cybersecurity Symposium, Stevens Institute of Technology, April 28, 2003.

What is Privacy? May mean different things to different people seclusion: the desire to be left alone property: the desire to be paid for one’s data autonomy: the ability to act freely Generally: the ability to control the dissemination and use of one’s personal information.

Different Types of Data Transaction data created by interaction between stakeholder and enterprise current privacy-oriented solutions useful Authored data created by stakeholder digital rights management (DRM) useful Sensor data stakeholders not clear at time of creation growing rapidly

Sensor Data Examples surveillance cameras (especially with face recognition software) desktop monitoring software (e.g. for intrusion or misbehavior detection) GPS transmitters, RFID tags wireless sensors (e.g. for location-based PDA services)

Sensor Data Can be difficult to identify stakeholders and even data collectors Cross boundary between “real world” and cyberspace Boundary between transaction data and sensor data can be blurry (e.g. Web browsing data) Presents a real and growing privacy threat

Product Design as Policy Decision product decisions by large companies or public organizations become de facto policy decisions often such decisions are made without conscious thought to privacy impacts, and without public discussion this is particularly true in the United States, where there is not much relevant legislation

Example: Metro Cards Washington, DC - no record kept of per card transactions - damaged card can be replaced if printed value still visible New York City - transactions recorded by card ID - damaged card can be replaced if card ID still readable - have helped find suspects, corroborate alibis

Transactions without Disclosure Don’t disclose information in first place! Anonymous digital cash [Chaum et al] Limited-use credit cards [Sha01, RW01] Anonymous web browsing [Crowds, Anonymizer] Secure multiparty computation and other cryptographic protocols perceived (often correctly) as too cumbersome or inefficient to use but, same advances in computing change this

Privacy-Preserving Data Mining Allow multiple data holders to collaborate to compute important (e.g., security-related) information while protecting the privacy of other information. Particularly relevant now, with increasing focus on security even at the expense of some privacy.

Advantages of privacy protection protection of personal information protection of proprietary or sensitive information fosters collaboration between different data owners (since they may be more willing to collaborate if they need not reveal their information)

Privacy Tradeoffs? Privacy vs. security: maybe, but doesn’t mean giving up one gets the other (who is this person? is this a dangerous person?) Privacy vs. usability: reasonable defaults, easy and extensive customizations, visualization tools Tradeoffs are to cost or power, rather than inherent conflict with privacy.

Privacy/Security Tradeoff? Claim: No inherent tradeoff between security and privacy, though the cost of having both may be significant. Experimentally evaluate the practical feasibility of strong (cryptographic) privacy-preserving solutions.

Examples Privacy-preserving computation of decision trees [LP00] Secure computation of approximate Hamming distance of two large data sets [FIMNSW01] Privacy-protecting statistical analysis [CIKRRW01] Selective private function evaluation [CIKRRW01]

Similarity of Two Data Sets PARTY ONE Holds Large Database PARTY TWO Holds Large Database Parties can efficiently and privately determine whether their data sets are similar Current measure of similarity is approximate Hamming distance [FIMNSW01] Securing other measures is topic for future research

Privacy-Protecting Statistics [CIKRRW01] CLIENT Wishes to compute statistics of servers’ data SERVERS Each holds large database Parties communicate using cryptographic protocols designed so that: Client learns desired statistics, but learns nothing else about data (including individual values or partial computations for each database) Servers do not learn which fields are queried, or any information about other servers’ data Computation and communication are very efficient

Privacy Concerns Protect clients from revealing type of sample population, type of specific data used Protect database owners from revealing unnecessary information or providing a higher quality of service than paid for Protect individuals from large-scale dispersal of their personal information

Privacy-Protecting Statistics (single DB) Database contains public information (e.g. zip code) and private information (e.g. income): Client wants to compute statistics on private data, of subset selected by public data. Doesn’t want to reveal selection criteria or private values used. Database wants to reveal only outcome, not personal data. ... ... ...

Non-Private and Inefficient Solutions Database sends client entire database (violates database privacy) For sample size m, use SPIR to learn m values (violates database privacy) Client sends selections to database, database does computation (violates client privacy, doesn’t work for multiple databases) general secure multiparty computation (not efficient for large databases)

Secure Multiparty Computation Allows k players to privately compute a function f of their inputs. Overhead is polynomial in size of inputs and complexity of f [Yao, GMW, BGW, CCD, ...] P1 P2 Pk

Symmetric Private Information Retrieval Allows client with input i to interact with database server with input x to learn (only) Overhead is polylogarithmic in size of database x [KO,CMS,GIKM] Client Server i Learns

Homomorphic Encryption Certain computations on encrypted messages correspond to other computations on the cleartext messages. For additive homomorphic encryption, E(m1) • E(m2) = E (m1+ m2) also implies E(m)x = E(mx)

Privacy-Protecting Statistics Protocol To learn mean and variance: enough to learn sum and sum of squares. Server stores: ... ... and responds to queries from both efficient protocol for sum efficient protocol for mean and variance

Weighted Sum Client wants to compute selected linear combination of m items: Client Server Homomorphic encryption E, D if computes o/w v decrypts to obtain

Efficiency Linear communication and computation (feasible in many cases) If n is large and m is small, would like to do better

Selective Private Function Evaluation Allows client to privately compute a function f over m inputs client learns only server does not learn Unlike general secure multiparty computation, we want communication complexity to depend on m, not n. (More accurately, polynomial in m, polylogarithmic in n).

Security Properties Correctness: If client and server follow the protocol, client’s output is correct. Client privacy: malicious server does not learn client’s input selection. Database privacy: weak: malicious client learns no more than output of some m-input function g strong: malicious client learns no more than output of specified function f

Solutions based on MPC Input selection phase: server obtains blinded version of each Function evaluation phase client and server use MPC to compute f on the m blinded items

Input selection phase Client Server ... Homomorphic encryption D,E Computes encrypted database Retrieves using SPIR ... SPIR(m,n), E Picks random computes Decrypts received values:

Function Evaluation Phase Client has Server has Use MPC to compute: Total communication cost polylogarithmic in n, polynomial in m, | f |

Distributed Databases Same approach works to compute function over distributed databases. Input selection phase done in parallel with each database server Function evaluation phase done as single MPC only final outcome is revealed to client.

Performance Current experimentation to understand whether these methods are efficient in real-world settings.

Conclusions Privacy is in danger, but some important progress has been made. Important challenges ahead: Usable privacy solutions Sensor data better use of hybrid approach: decide what can safely be disclosed, use cryptographic protocols to protect critical information, weaker and more efficient solutions for the rest Technology, policy, and education must work together.