Privacy in Today’s World: Solutions and Challenges Rebecca Wright Stevens Institute of Technology 26 June, 2003.

Slides:



Advertisements
Similar presentations
1 Cryptography: on the Hope for Privacy in a Digital World Omer Reingold VVeizmann and Harvard CRCS.
Advertisements

Operating System Security
Oblivious Branching Program Evaluation
 Physical Logical Access  Physical and Logical Access  Total SSO and Password Automation  Disk/Data Encryption  Centralized management system  Biometric.
CSE331: Introduction to Networks and Security Lecture 22 Fall 2002.
Overview of IS Controls, Auditing, and Security Fall 2005.
CS555Topic 241 Cryptography CS 555 Topic 24: Secure Function Evaluation.
Lect. 18: Cryptographic Protocols. 2 1.Cryptographic Protocols 2.Special Signatures 3.Secret Sharing and Threshold Cryptography 4.Zero-knowledge Proofs.
Unit 7: Store and Retrieve it Database Management Systems (DBMS)
Machine Learning and Data Mining Course Summary. 2 Outline  Data Mining and Society  Discrimination, Privacy, and Security  Hype Curve  Future Directions.
Fundamentals of Computer Security Geetika Sharma Fall 2008.
2 An Overview of Telecommunications and Networks Telecommunications: the _________ transmission of signals for communications (home net) (home net)
Chapter 17 Controls and Security Measures
Client/Server Computing Model of computing in which very powerful personal computers (clients) are connected in a network with one or more server computers.
 Guarantee that EK is safe  Yes because it is stored in and used by hw only  No because it can be obtained if someone has physical access but this can.
CSCE 715 Ankur Jain 11/16/2010. Introduction Design Goals Framework SDT Protocol Achievements of Goals Overhead of SDT Conclusion.
Managing Data Resources
FIT3105 Smart card based authentication and identity management Lecture 4.
CMSC 414 Computer (and Network) Security Lecture 2 Jonathan Katz.
Using Digital Credentials On The World-Wide Web M. Winslett.
Privacy-Protecting Statistics Computation: Theory and Practice
Security Internet Management & Security 06 Learning outcomes At the end of this session, you should be able to: –Describe the reasons for having system.
Privacy-Preserving Computation and Verification of Aggregate Queries on Outsourced Databases Brian Thompson 1, Stuart Haber 2, William G. Horne 2, Tomas.
Privacy Preserving Learning of Decision Trees Benny Pinkas HP Labs Joint work with Yehuda Lindell (done while at the Weizmann Institute)
Control of Personal Information in a Networked World Rebecca Wright Boaz Barak Jim Aspnes Avi Wigderson Sanjeev Arora David Goodman Joan Feigenbaum ToNC.
Security on the Internet Jan Damsgaard Dept. of Informatics Copenhagen Business School
Network Infrastructure Security. LAN Security Local area networks facilitate the storage and retrieval of programs and data used by a group of people.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Lecture 12 Electronic Business (MGT-485). Recap – Lecture 11 E-Commerce Security Environment Security Threats in E-commerce Technology Solutions.
Chapter 1 Database Systems. Good decisions require good information derived from raw facts Data is managed most efficiently when stored in a database.
PayDox Corporate Document Management System Rotech AB Interface Ltd Business Software Integration.
Lecture 19 Page 1 CS 111 Online Symmetric Cryptosystems C = E(K,P) P = D(K,C) E() and D() are not necessarily the same operations.
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
Technology Panel What technical tools are in our disposal for achieving privacy Privacy: Technology + Policy –Technology can Implement Policy –Without.
Implications of Information Technology for the Audit Process
Slides prepared by Cyndi Chie and Sarah Frye1 A Gift of Fire Third edition Sara Baase Chapter 2: Privacy.
Computer and Internet privacy (2) University of Palestine University of Palestine Eng. Wisam Zaqoot Eng. Wisam Zaqoot Feb 2011 Feb 2011 ITSS 4201 Internet.
Cryptography, Authentication and Digital Signatures
An Approach To Automate a Process of Detecting Unauthorised Accesses M. Chmielewski, A. Gowdiak, N. Meyer, T. Ostwald, M. Stroiński
1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”
Secure Sensor Data/Information Management and Mining Bhavani Thuraisingham The University of Texas at Dallas October 2005.
- Ahmad Al-Ghoul Data design. 2 learning Objectives Explain data design concepts and data structures Explain data design concepts and data structures.
Chapter 1 Overview The NIST Computer Security Handbook defines the term Computer Security as:
Securing Data in Transit and Storage Sanjay Beri Co-Founder & Senior Director of Product Management Ingrian Networks.
Chapter 4 Using Encryption in Cryptographic Protocols & Practices.
Mining Multiple Private Databases Topk Queries Across Multiple Private Databases (2005) Mining Multiple Private Databases Using a kNN Classifier (2007)
Major Disciplines in Computer Science Ken Nguyen Department of Information Technology Clayton State University.
Lecture 16 Page 1 CS 236 Online Web Security CS 236 On-Line MS Program Networks and Systems Security Peter Reiher.
DIGITAL SIGNATURE.
Information Systems, Security, and e-Commerce* ACCT7320, Controllership C. Bailey *Ch in Controllership : The Work of the Managerial Accountant,
06/02/06 Workshop on knowledge sharing using the new WWW tools May 30 – June 2, 2006 GROUP Presentation Group 5 Group Members Ambrose Ruyooka Emmanuel.
Data Integrity Proofs in Cloud Storage Author: Sravan Kumar R and Ashutosh Saxena. Source: The Third International Conference on Communication Systems.
Secure Messenger Protocol using AES (Rijndael) Sang won, Lee
1 Privacy Preserving Data Mining Introduction August 2 nd, 2013 Shaibal Chakrabarty.
Erik Jonsson School of Engineering and Computer Science The University of Texas at Dallas Cyber Security Research on Engineering Solutions Dr. Bhavani.
Lecture 15 Page 1 CS 236 Online Evaluating Running Systems Evaluating system security requires knowing what’s going on Many steps are necessary for a full.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 1 Database Systems.
Key management issues in PGP
Web Applications Security Cryptography 1
Security Outline Encryption Algorithms Authentication Protocols
TRUST Area 3 Overview: Privacy, Usability, & Social Impact
The Client/Server Database Environment
Systems Design Chapter 6.
PBKM: A Secure Knowledge Management Framework
ONLINE SECURE DATA SERVICE
Computer Science and Engineering
Helen: Maliciously Secure Coopetitive Learning for Linear Models
Instructor Materials Chapter 5: Ensuring Integrity
Presentation transcript:

Privacy in Today’s World: Solutions and Challenges Rebecca Wright Stevens Institute of Technology 26 June, 2003

Talk Outline Overview of privacy landscape Privacy-preserving data mining Privacy-protecting statistical analysis of large databases Selective private function evaluation Conclusions

“You have zero privacy. Get over it.” - Scott McNealy, 1999 Changes in technology are making privacy harder. – reduced cost for data storage – increased ability to process lots of data Increased need for security may make privacy seem less critical. Erosion of Privacy

Historical Changes Small towns, little movement: – very little privacy, social mechanisms helped prevent abuse Large cities, increased movement: – lost social mechanisms, but gained privacy through anonymity Now: – advancing technology is reducing privacy, social mechanisms not replaced.

What Can We Do? Use technology, policy, and education to –maintain/increase privacy –provide new social mechanisms –create new mathematical models for better understanding Problem: Using old models and old modes of thought in dealing with situations arising from new technology.

What is Privacy? May mean different things to different people – seclusion: the desire to be left alone – property: the desire to be paid for one’s data – autonomy: the ability to act freely Generally: the ability to control the dissemination and use of one’s personal information.

Privacy of Data Stored data – encryption, computer security, intrusion detection, etc. Data in transit – encryption, network security, etc. Release of data – current privacy-oriented work: P3P, privacy bird, EPA, Internet Explorer 6.0, etc.

Internet Explorer V.6 Fairly simple, deals only with cookies, limited info

Different Types of Data Transaction data – created by interaction between stakeholder and enterprise –current privacy-oriented solutions useful Authored data – created by stakeholder – digital rights management (DRM) useful Sensor data – stakeholders not clear at time of creation – presents a real and growing privacy threat

Sensor Data Examples surveillance cameras (especially with face recognition software) desktop monitoring software (e.g. for intrusion or misbehavior detection) GPS transmitters, RFID tags wireless sensors (e.g. for location-based PDA services)

Sensor Data Can be difficult to identify stakeholders and even data collectors Cross boundary between “real world” and cyberspace Boundary between transaction data and sensor data can be blurry (e.g. Web browsing data) Presents a real and growing privacy threat

Product Design as Policy Decision product decisions by large companies or public organizations become de facto policy decisions often such decisions are made without conscious thought to privacy impacts, and without public discussion this is particularly true in the United States, where there is not much relevant legislation

Example: Metro Cards Washington, DC - no record kept of per card transactions - damaged card can be replaced if printed value still visible New York City - transactions recorded by card ID - damaged card can be replaced if card ID still readable - have helped find suspects, corroborate alibis

Privacy Tradeoffs? Privacy vs. security: maybe, but doesn’t mean giving up one gets the other (who is this person? is this a dangerous person?) Privacy vs. usability: reasonable defaults, easy and extensive customizations, visualization tools Tradeoffs are to cost or power, rather than inherent conflict with privacy.

Surveillance and Data Mining Analyze large amounts of data from diverse sources. Law enforcement and homeland security: – detect and thwart possible incidents before they occur – identify and prosecute criminals after incidents occur Companies like to do this, too. – Marketing, personalized customer service

Privacy-Preserving Data Mining Allow multiple data holders to collaborate to compute important (e.g. security-related) information while protecting the privacy of other information. Particularly relevant now, with increasing focus on security even at the expense of privacy (e.g. TIA).

Advantages of privacy protection protection of personal information protection of proprietary or sensitive information fosters collaboration between different agencies (since they may be more willing to collaborate if they need not reveal their information)

Cryptographic Approach Using cryptography, provably does not reveal anything except output of computation. –Privacy-preserving computation of decision trees [LP00] –Secure computation of approximate Hamming distance of two large data sets [FIMNSW01] –Privacy-protecting statistical analysis [CIKRRW01] –Privacy-preserving association rule mining [KC02]

Randomization Approach Randomizes data before computation (which can then either be distributed or centralized). Induces a tradeoff between privacy and computation error. –Distribution reconstruction algorithm from randomized data [AS00] –Association rule mining [ESAG02]

Comparison of Approaches inefficiency privacy loss inaccuracy randomization approach cryptographic approach

Comparison of Approaches inefficiency privacy loss inaccuracy randomization approach cryptographic approach

Privacy-Protecting Statistics [CIKRRW01] Parties communicate using cryptographic protocols designed so that: –Client learns desired statistics, but learns nothing else about data (including individual values or partial computations for each database) –Servers do not learn which fields are queried, or any information about other servers’ data –Computation and communication are very efficient CLIENT Wishes to compute statistics of servers’ data SERVERS Each holds large database

Non-Private and Inefficient Solutions Database sends client entire database (violates database privacy) For sample size m, use SPIR to learn m values (violates database privacy) Client sends selections to database, database does computation (violates client privacy) General secure multiparty computation (not efficient for large databases)

Secure Multiparty Computation Allows k players to privately compute a function f of their inputs. Overhead is polynomial in size of inputs and complexity of f [Yao, GMW, BGW, CCD,...] P1 P2 Pk

Symmetric Private Information Retrieval Allows client with input i to interact with database server with input x to learn (only) Overhead is polylogarithmic in size of database x [CMS,GIKM] Client Server i Learns

Homomorphic Encryption Certain computations on encrypted messages correspond to other computations on the cleartext messages. For additive homomorphic encryption, –E(m 1 ) E(m 2 ) = E (m 1 + m 2 ) –also implies E(m) x = E(mx) Paillier encryption is an example.

Privacy-Protecting Statistics Protocol To learn mean and variance: enough to learn sum and sum of squares. Server stores: and responds to queries from both efficient protocol for sumefficient protocol for mean and variance...

Weighted Sum Client wants to compute selected linear combination of m items: ClientServer Homomorphic encryption E, D if o/w decrypts to obtain computes v

Efficiency Linear communication and computation (feasible in many cases) If n is large and m is small, would like to do better

Selective Private Function Evaluation Allows client to privately compute a function f over m inputs client learns only server does not learn Unlike general secure multiparty computation, we want communication complexity to depend on m, not n. (More accurately, polynomial in m, polylogarithmic in n).

Security Properties Correctness: If client and server follow the protocol, client’s output is correct. Client privacy: malicious server does not learn client’s input selection. Database privacy: – weak: malicious client learns no more than output of some m-input function g – strong: malicious client learns no more than output of specified function f

Solutions based on MPC Input selection phase: –server obtains blinded version of each Function evaluation phase –client and server use MPC to compute f on the m blinded items

Input selection phase ClientServer Homomorphic encryption D,E Computes encrypted database... Retrieves using SPIR SPIR(m,n), E Picks random computes Decrypts received values:

Function Evaluation Phase Client has Server has Use MPC to compute: Total communication cost polylogarithmic in n, polynomial in m, | f |

Distributed Databases Same approach works to compute function over a distributed database. – Input selection phase done in parallel with each database server – Function evaluation phase done as single MPC – Database privacy means only final outcome is revealed to client.

Performance Current experimentation to understand whether these methods are efficient in real-world settings.

Initial Experimental Results Initial implementation of linear computation and communication solution [H. Subramaniam & Z. Yang] –implementation in Java and C++ –uses Paillier encryption –uses synthetic data, with client and server as separate processes on the same machine (2-year old Toshiba laptop).

Initial Experimental Results

Conclusions Privacy is in danger, but some important progress has been made. Important challenges ahead: – Usable privacy solutions (efficiency and user interface) – Sensor data – Better use of hybrid approach: decide what can safely be disclosed, what needs moderate protection, and use cryptographic protocols to protect most critical information. –Mathematical/formal models to understand and compare different solutions.

Research Directions Investigate integration of cryptographic approach and randomization approach: –seek to maintain strong privacy and accuracy of cryptographic approach,... –while benefitting from improved efficiency of randomization approach Understand mathematically what the resulting privacy and/or accuracy compromises are. Technology, policy, and education must work together.