When small data is better data

Slides:



Advertisements
Similar presentations
SplitX: High-Performance Private Analytics Ruichuan Chen (Bell Labs / Alcatel-Lucent) Istemi Ekin Akkus (MPI-SWS) Paul Francis (MPI-SWS)
Advertisements

Non-tracking Web Analytics Istemi Ekin Akkus 1, Ruichuan Chen 1, Michaela Hardt 2, Paul Francis 1, Johannes Gehrke 3 1 Max Planck Institute for Software.
Protecting Privacy in Terrorist Tracking Applications Teresa Lunt, PI Jessica Staddon, Dirk Balfanz Glenn Durfee, Tomas Uribe (SRI) Diana Smetters, Jim.
Location Based Trust for Mobile User – Generated Content : Applications, Challenges and Implementations Presented By : Anand Dipakkumar Joshi USC.
Telnet and FTP. Telnet Lets you use the resources of some other computer on the Internet to access files, run programs, etc. Creates interactive connection.
CSCE 715 Ankur Jain 11/16/2010. Introduction Design Goals Framework SDT Protocol Achievements of Goals Overhead of SDT Conclusion.
WiFi-Reports: Improving Wireless Network Selection Jeffrey Pang (CMU) with Ben Greenstein (IRS) Michael Kaminsky (IRP) Damon McCoy (U. Colorado) Srinivasan.
Lecture 21: Privacy and Online Advertising. References Challenges in Measuring Online Advertising Systems by Saikat Guha, Bin Cheng, and Paul Francis.
Overview: Humans are unique creatures. Everything we do is slightly different from everyone else. Even though many times these differences are so minute.
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
Privacy-Aware Personalization for Mobile Advertising
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Non-tracking Web Analytics Istemi Ekin Akkus, Ruichuan Chen, Michaela Hardt, Paul Francis, Johannes Gehrke Presentation by David Ferreras.
AJ Guardado Comp. Sci. 49S.  In order to correctly manage programs (AdSense, AdWords), properly charge for the PPC revenue model, and detect invalid.
Differential Privacy Some contents are borrowed from Adam Smith’s slides.
1 Limiting Privacy Breaches in Privacy Preserving Data Mining In Proceedings of the 22 nd ACM SIGACT – SIGMOD – SIFART Symposium on Principles of Database.
Lecture 17 Page 1 CS 236 Online Onion Routing Meant to handle issue of people knowing who you’re talking to Basic idea is to conceal sources and destinations.
Introduction Web analysis includes the study of users’ behavior on the web Traffic analysis – Usage analysis Behavior at particular website or across.
The Hacking Suite For Governmental Interception.
Chapter 8 E-Commerce Technologies Introduction to Business Information Systems by Mark Huber, Craig Piercy, Patrick McKeown, and James Norrie.
CAM: Cloud-Assisted Privacy Preserving Mobile Health Monitoring.
The Hacking Suite For Governmental Interception.
Privacy in Mobile Systems Karthik Dantu and Steve Ko.
CMSC 818J: Privacy enhancing technologies Lecture 2.
AP CSP: Data and Trends.
BUILD SECURE PRODUCTS AND SERVICES
3.6 Fundamentals of cyber security
Improving searches through community clustering of information
Grid Security.
Boomerang Adds Smart Calendar Assistant and Reminders to Office 365 That Increase Productivity and Simplify Meeting Scheduling OFFICE 365 APP BUILDER.
Cryptographic Hash Function
Trial.iO Makes it Easy to Provision Software Trials, Demos and Training Environments in the Azure Cloud in One Click, Without Any IT Involvement MICROSOFT.
Anonymous Communication
Open for Business: Website User Testing.
Outline Introduction Characteristics of intrusion detection systems
---On the ‘Vuvuzela’ Scheme
Vocabulary Big Data - “Big data is a broad term for datasets so large or complex that traditional data processing applications are inadequate.” Moore’s.
Cloud Computing By P.Mahesh
Year 10 ICT ECDL/ICDL IT Security.
#01 Client/Server Computing
The OpenMOOC project A free software platform for an open education
563.10: Bloom Cookies Web Search Personalization without User Tracking
Course Business I am traveling April 25-May 3rd
Running on the Powerful Microsoft Azure Platform,
به نام خدا Big Data and a New Look at Communication Networks Babak Khalaj Sharif University of Technology Department of Electrical Engineering.
What is Cookie? Cookie is small information stored in text file on user’s hard drive by web server. This information is later used by web browser to retrieve.
1 Demand of your DB is changing Presented By: Ashwani Kumar
Differential Privacy in Practice
Understanding Randomness
Utilizing the Capabilities of Microsoft Azure, Skipper Offers a Results-Based Platform That Helps Digital Advertisers with the Marketing of Their Mobile.
0x1A Great Papers in Computer Security
Homework #5 Solutions Brian A. LaMacchia
Spyware. By: Katheryn L. Gaston.
Google Privacy Policy Karen Tao.
Unit 6: Application Development
Anonymous Communication
OOA&D II Bo Wang, Kan Qi Adapted from Alexey Tregubov’s Slides.
“Location Privacy Protection for Smartphone Users”
Personalization & Privacy: Flow of Information
Computer Supported Cooperative Work
List of the benefits why WordPress is best platform for building Website.
Dark Data Are we at risk?.
Marcial Quinones-Cardona
Anonymous Communication
Some contents are borrowed from Adam Smith’s slides
#01 Client/Server Computing
The Ethics of Selling User Data
Security in Wide Area Networks
Differential Privacy (1)
Cloud Computing for Wireless Networks
Presentation transcript:

When small data is better data Paul Francis, MPI-SWS Ruichuan Chen, Ekin Akkus, Johannes Gehrke

When small data is better data private When small data is better data Paul Francis, MPI-SWS Ruichuan Chen, Ekin Akkus, Johannes Gehrke

The user data “understanding” There is a tacit understanding among users that if you send data to a company, they are free to use it how they wish OK LESS OK NOT OK Facebook “knowing” all kinds of personal information Doubleclick monitoring your browsing behavior Google gathering WLAN traffic during drive-by

Leads to data gathering services Companies build (free) services designed to gather as much data about users as they can And often secretly gather data about users when they can’t Then try to monetize that data Mainly through advertising Though Jean Bolot had some interesting ideas

Leads to “big data” (mining) Companies gather what they can, but don’t always get what they want Google knows your searches, but not your relationship status Facebook knows your relationship status, but not what you buy Amazon knows what you buy, but not what you search for So they use big data mining to infer what they don’t know

A new user data understanding It is ok to monetize (or otherwise benefit) from user data if: The user data is very expensive to collected in any identifiable form Users can know what is going on, and users can opt-out

Why is this interesting? Keeping user data on the user device is the key to user privacy Most user data is at, or has passed through, the user device Search and browsing in browser history Facebook user profile easily scraped Amazon purchases easily scraped

Premise of “Private by Design” If we can monetize user data, without collecting user data, then we have legitimate access to far more user data Less need to deal with big data Better monetization, less overhead

My group’s research agenda “Private by Design” behavioral advertising “Private by Design” aggregate analytics

My group’s research agenda “Private by Design” behavioral advertising “Private by Design” aggregate analytics

Aggregate Analytics Web analytics: want to know demographics of user base, what other websites users visit, etc. App analytics: want to know what other apps user runs (competitors) Mobile analytics, general analytics,….

Typical database privacy settings: trusted component sees database Analyst Untrusted Analyst query query Database’ Query Module (add noise) query anonymize Traditional differential privacy assumes a centralized database front-ended by a trusted query module. There is, however, no centralized database existing in a distributed setting with individual users maintaining their own data. Some form of distributed differential privacy is therefore required. Trusted Database Database 13

Our setting: nobody (except user) sees individual user data Untrusted Data Analyst ? ? ? Untrusted Traditional differential privacy assumes a centralized database front-ended by a trusted query module. There is, however, no centralized database existing in a distributed setting with individual users maintaining their own data. Some form of distributed differential privacy is therefore required. 14

Previous work in our setting Assumed differential privacy Poor scaling characteristics, and/or Could not tolerate user fraud Data Analyst ? ? ? Our goal: Assume differential privacy, but fix scaling and user fraud problems.

Differential privacy Differential privacy adds noise to the output of a computation (i.e., query). Database Analyst Query Module (add noise) After a lot of effort in the past decade or two, an approach called Another way to protect users’ privacy is to add noise to the output of a computation, instead of adding noise to the original user data DB1 DB2 (differs by one user) 16

Components & assumptions Analyst Analyst is potentially malicious (violating user privacy) Proxy is honest but curious 1) Follows the specified protocol (does not collude) 2) Tries to exploit additional info that can be learned in so doing Proxy (add DP noise blindly) Data Data Data The PDDP system consists of three components: analysts, clients, and proxy. Analysts make queries to the system, and collect answers. Clients locally maintain their own data, and answer queries. The proxy mediates between the analysts and clients, and adds differentially private noise to clients’ answers to preserve privacy. Analysts are assumed to be potentially malicious, with a goal of violating individual users’ privacy. An analyst may collude with other analysts, or pretend to be multiple distinct analysts. An analyst may take control of clients, and attempt to use the PDDP protocol to reveal information about those clients. An analyst may deploy its own clients and manipulate their answers. An analyst may also publish its collected answers. Analysts can intercept and modify all messages (e.g., an ISP posing as an analyst). Clients are also assumed to be potentially malicious, with a goal of distorting the statistical results learned by analysts. Clients may generate false or illegitimate answers under coordinated control (e.g., as a botnet), and may act as Sybils [11]. The proxy is assumed to be honest but curious (HbC). It will faithfully follow the specified protocol, but may try to exploit additional information that can be learned in so doing. The proxy does not collude with other components. We discuss how we may be able to relax the HbC assumption by using trusted hardware in §6. Clients are user devices. Clients are potentially malicious (distorting the final results) 17

Actually, two proxies! Honest-but-Curious proxy must not see user data If one proxy, need expensive public key encryption between clients and analyst If two proxies, can use much cheaper form of encryption (one time pad) Analyst Blind Proxy Blind Proxy Data Data Data The PDDP system consists of three components: analysts, clients, and proxy. Analysts make queries to the system, and collect answers. Clients locally maintain their own data, and answer queries. The proxy mediates between the analysts and clients, and adds differentially private noise to clients’ answers to preserve privacy. Analysts are assumed to be potentially malicious, with a goal of violating individual users’ privacy. An analyst may collude with other analysts, or pretend to be multiple distinct analysts. An analyst may take control of clients, and attempt to use the PDDP protocol to reveal information about those clients. An analyst may deploy its own clients and manipulate their answers. An analyst may also publish its collected answers. Analysts can intercept and modify all messages (e.g., an ISP posing as an analyst). Clients are also assumed to be potentially malicious, with a goal of distorting the statistical results learned by analysts. Clients may generate false or illegitimate answers under coordinated control (e.g., as a botnet), and may act as Sybils [11]. The proxy is assumed to be honest but curious (HbC). It will faithfully follow the specified protocol, but may try to exploit additional information that can be learned in so doing. The proxy does not collude with other components. We discuss how we may be able to relax the HbC assumption by using trusted hardware in §6. 18

Message XOR Random_String = Result Proxy 1 Result Sender Receiver Result Random_String Proxy 2 Random_String Result XOR Random_String = Message

Queries are counting queries: Analyst Queries are counting queries: Ex: How many users…..are male and between ages of 10-20? Blind Proxy Blind Proxy Data The PDDP system consists of three components: analysts, clients, and proxy. Analysts make queries to the system, and collect answers. Clients locally maintain their own data, and answer queries. The proxy mediates between the analysts and clients, and adds differentially private noise to clients’ answers to preserve privacy. Analysts are assumed to be potentially malicious, with a goal of violating individual users’ privacy. An analyst may collude with other analysts, or pretend to be multiple distinct analysts. An analyst may take control of clients, and attempt to use the PDDP protocol to reveal information about those clients. An analyst may deploy its own clients and manipulate their answers. An analyst may also publish its collected answers. Analysts can intercept and modify all messages (e.g., an ISP posing as an analyst). Clients are also assumed to be potentially malicious, with a goal of distorting the statistical results learned by analysts. Clients may generate false or illegitimate answers under coordinated control (e.g., as a botnet), and may act as Sybils [11]. The proxy is assumed to be honest but curious (HbC). It will faithfully follow the specified protocol, but may try to exploit additional information that can be learned in so doing. The proxy does not collude with other components. We discuss how we may be able to relax the HbC assumption by using trusted hardware in §6. Data Data 20

Clients answer ‘yes’ or ‘no’ only Analyst Clients answer ‘yes’ or ‘no’ only Blind Proxy Blind Proxy Data The PDDP system consists of three components: analysts, clients, and proxy. Analysts make queries to the system, and collect answers. Clients locally maintain their own data, and answer queries. The proxy mediates between the analysts and clients, and adds differentially private noise to clients’ answers to preserve privacy. Analysts are assumed to be potentially malicious, with a goal of violating individual users’ privacy. An analyst may collude with other analysts, or pretend to be multiple distinct analysts. An analyst may take control of clients, and attempt to use the PDDP protocol to reveal information about those clients. An analyst may deploy its own clients and manipulate their answers. An analyst may also publish its collected answers. Analysts can intercept and modify all messages (e.g., an ISP posing as an analyst). Clients are also assumed to be potentially malicious, with a goal of distorting the statistical results learned by analysts. Clients may generate false or illegitimate answers under coordinated control (e.g., as a botnet), and may act as Sybils [11]. The proxy is assumed to be honest but curious (HbC). It will faithfully follow the specified protocol, but may try to exploit additional information that can be learned in so doing. The proxy does not collude with other components. We discuss how we may be able to relax the HbC assumption by using trusted hardware in §6. Data Data 21

Proxies adds N additional random yes/no answers (coins) Analyst Proxies adds N additional random yes/no answers (coins) N = 2σ2 But, must not know how many yes’s and no’s it added! Blind Proxy Blind Proxy Data The PDDP system consists of three components: analysts, clients, and proxy. Analysts make queries to the system, and collect answers. Clients locally maintain their own data, and answer queries. The proxy mediates between the analysts and clients, and adds differentially private noise to clients’ answers to preserve privacy. Analysts are assumed to be potentially malicious, with a goal of violating individual users’ privacy. An analyst may collude with other analysts, or pretend to be multiple distinct analysts. An analyst may take control of clients, and attempt to use the PDDP protocol to reveal information about those clients. An analyst may deploy its own clients and manipulate their answers. An analyst may also publish its collected answers. Analysts can intercept and modify all messages (e.g., an ISP posing as an analyst). Clients are also assumed to be potentially malicious, with a goal of distorting the statistical results learned by analysts. Clients may generate false or illegitimate answers under coordinated control (e.g., as a botnet), and may act as Sybils [11]. The proxy is assumed to be honest but curious (HbC). It will faithfully follow the specified protocol, but may try to exploit additional information that can be learned in so doing. The proxy does not collude with other components. We discuss how we may be able to relax the HbC assumption by using trusted hardware in §6. Data Data 22

Each proxy independently adds N random coins Analyst Each proxy independently adds N random coins XOR at analyst will produce random result But neither proxy knows what the result will be Blind Proxy Blind Proxy Data The PDDP system consists of three components: analysts, clients, and proxy. Analysts make queries to the system, and collect answers. Clients locally maintain their own data, and answer queries. The proxy mediates between the analysts and clients, and adds differentially private noise to clients’ answers to preserve privacy. Analysts are assumed to be potentially malicious, with a goal of violating individual users’ privacy. An analyst may collude with other analysts, or pretend to be multiple distinct analysts. An analyst may take control of clients, and attempt to use the PDDP protocol to reveal information about those clients. An analyst may deploy its own clients and manipulate their answers. An analyst may also publish its collected answers. Analysts can intercept and modify all messages (e.g., an ISP posing as an analyst). Clients are also assumed to be potentially malicious, with a goal of distorting the statistical results learned by analysts. Clients may generate false or illegitimate answers under coordinated control (e.g., as a botnet), and may act as Sybils [11]. The proxy is assumed to be honest but curious (HbC). It will faithfully follow the specified protocol, but may try to exploit additional information that can be learned in so doing. The proxy does not collude with other components. We discuss how we may be able to relax the HbC assumption by using trusted hardware in §6. Data Data 23

Coins and answers Analyst Blind Proxy Blind Proxy Data Data Data 24 The PDDP system consists of three components: analysts, clients, and proxy. Analysts make queries to the system, and collect answers. Clients locally maintain their own data, and answer queries. The proxy mediates between the analysts and clients, and adds differentially private noise to clients’ answers to preserve privacy. Analysts are assumed to be potentially malicious, with a goal of violating individual users’ privacy. An analyst may collude with other analysts, or pretend to be multiple distinct analysts. An analyst may take control of clients, and attempt to use the PDDP protocol to reveal information about those clients. An analyst may deploy its own clients and manipulate their answers. An analyst may also publish its collected answers. Analysts can intercept and modify all messages (e.g., an ISP posing as an analyst). Clients are also assumed to be potentially malicious, with a goal of distorting the statistical results learned by analysts. Clients may generate false or illegitimate answers under coordinated control (e.g., as a botnet), and may act as Sybils [11]. The proxy is assumed to be honest but curious (HbC). It will faithfully follow the specified protocol, but may try to exploit additional information that can be learned in so doing. The proxy does not collude with other components. We discuss how we may be able to relax the HbC assumption by using trusted hardware in §6. Data Data 24

Decrypt and tabulate Analyst Blind Proxy Blind Proxy Data Data Data 25 The PDDP system consists of three components: analysts, clients, and proxy. Analysts make queries to the system, and collect answers. Clients locally maintain their own data, and answer queries. The proxy mediates between the analysts and clients, and adds differentially private noise to clients’ answers to preserve privacy. Analysts are assumed to be potentially malicious, with a goal of violating individual users’ privacy. An analyst may collude with other analysts, or pretend to be multiple distinct analysts. An analyst may take control of clients, and attempt to use the PDDP protocol to reveal information about those clients. An analyst may deploy its own clients and manipulate their answers. An analyst may also publish its collected answers. Analysts can intercept and modify all messages (e.g., an ISP posing as an analyst). Clients are also assumed to be potentially malicious, with a goal of distorting the statistical results learned by analysts. Clients may generate false or illegitimate answers under coordinated control (e.g., as a botnet), and may act as Sybils [11]. The proxy is assumed to be honest but curious (HbC). It will faithfully follow the specified protocol, but may try to exploit additional information that can be learned in so doing. The proxy does not collude with other components. We discuss how we may be able to relax the HbC assumption by using trusted hardware in §6. Data Data 25

Buckets Not “is your age between 10-20?”, but “are you 1?”, “are you 2?”, “are you 3?”…. Query is generally a vector of yes/no questions Answer a vector of 1’s and 0’s Vector can be big: List of 20K websites 185K combinations of 10 of 20 attributes

Proxies add coins and shuffle user answers (per bucket) Analyst b1: u4, u12, c2, …… b2: u6, c3, u19, …… b3: u12, c7, u6, …… Proxies add coins and shuffle user answers (per bucket) Blind Proxy Blind Proxy u1: b1, b2, b3, …… u2: b1, b2, b3, …… u3: b1, b2, b3, …… ……. Data The PDDP system consists of three components: analysts, clients, and proxy. Analysts make queries to the system, and collect answers. Clients locally maintain their own data, and answer queries. The proxy mediates between the analysts and clients, and adds differentially private noise to clients’ answers to preserve privacy. Analysts are assumed to be potentially malicious, with a goal of violating individual users’ privacy. An analyst may collude with other analysts, or pretend to be multiple distinct analysts. An analyst may take control of clients, and attempt to use the PDDP protocol to reveal information about those clients. An analyst may deploy its own clients and manipulate their answers. An analyst may also publish its collected answers. Analysts can intercept and modify all messages (e.g., an ISP posing as an analyst). Clients are also assumed to be potentially malicious, with a goal of distorting the statistical results learned by analysts. Clients may generate false or illegitimate answers under coordinated control (e.g., as a botnet), and may act as Sybils [11]. The proxy is assumed to be honest but curious (HbC). It will faithfully follow the specified protocol, but may try to exploit additional information that can be learned in so doing. The proxy does not collude with other components. We discuss how we may be able to relax the HbC assumption by using trusted hardware in §6. Data Data 27

b1: u4, u12, c2, …… b2: u6, c3, u19, …… b3: u12, c7, u6, …… Analyst b1: u4, u12, c2, …… b2: u6, c3, u19, …… b3: u12, c7, u6, …… b1: u4, u12, c2, …… b2: u6, c3, u19, …… b3: u12, c7, u6, …… u1: b1, b2, b3, …… u2: b1, b2, b3, …… u3: b1, b2, b3, …… ……. Blind Proxy Blind Proxy u1: b1, b2, b3, …… u2: b1, b2, b3, …… u3: b1, b2, b3, …… ……. The PDDP system consists of three components: analysts, clients, and proxy. Analysts make queries to the system, and collect answers. Clients locally maintain their own data, and answer queries. The proxy mediates between the analysts and clients, and adds differentially private noise to clients’ answers to preserve privacy. Analysts are assumed to be potentially malicious, with a goal of violating individual users’ privacy. An analyst may collude with other analysts, or pretend to be multiple distinct analysts. An analyst may take control of clients, and attempt to use the PDDP protocol to reveal information about those clients. An analyst may deploy its own clients and manipulate their answers. An analyst may also publish its collected answers. Analysts can intercept and modify all messages (e.g., an ISP posing as an analyst). Clients are also assumed to be potentially malicious, with a goal of distorting the statistical results learned by analysts. Clients may generate false or illegitimate answers under coordinated control (e.g., as a botnet), and may act as Sybils [11]. The proxy is assumed to be honest but curious (HbC). It will faithfully follow the specified protocol, but may try to exploit additional information that can be learned in so doing. The proxy does not collude with other components. We discuss how we may be able to relax the HbC assumption by using trusted hardware in §6. Data Data Data 28

The shuffling at each proxy must be identical (though random) Analyst b1: u4, u12, c2, …… b2: u6, c3, u19, …… b3: u12, c7, u6, …… b1: u4, u12, c2, …… b2: u6, c3, u19, …… b3: u12, c7, u6, …… u1: b1, b2, b3, …… u2: b1, b2, b3, …… u3: b1, b2, b3, …… ……. Blind Proxy Blind Proxy u1: b1, b2, b3, …… u2: b1, b2, b3, …… u3: b1, b2, b3, …… ……. The shuffling at each proxy must be identical (though random) Because each bit must be paired with its XOR partner The PDDP system consists of three components: analysts, clients, and proxy. Analysts make queries to the system, and collect answers. Clients locally maintain their own data, and answer queries. The proxy mediates between the analysts and clients, and adds differentially private noise to clients’ answers to preserve privacy. Analysts are assumed to be potentially malicious, with a goal of violating individual users’ privacy. An analyst may collude with other analysts, or pretend to be multiple distinct analysts. An analyst may take control of clients, and attempt to use the PDDP protocol to reveal information about those clients. An analyst may deploy its own clients and manipulate their answers. An analyst may also publish its collected answers. Analysts can intercept and modify all messages (e.g., an ISP posing as an analyst). Clients are also assumed to be potentially malicious, with a goal of distorting the statistical results learned by analysts. Clients may generate false or illegitimate answers under coordinated control (e.g., as a botnet), and may act as Sybils [11]. The proxy is assumed to be honest but curious (HbC). It will faithfully follow the specified protocol, but may try to exploit additional information that can be learned in so doing. The proxy does not collude with other components. We discuss how we may be able to relax the HbC assumption by using trusted hardware in §6. Data Data Data 29

But the proxies may have a (slightly) different set of answers. Analyst b1: u4, u12, c2, …… b2: u6, c3, u19, …… b3: u12, c7, u6, …… b1: u4, u12, c2, …… b2: u6, c3, u19, …… b3: u12, c7, u6, …… u1: b1, b2, b3, …… u2: b1, b2, b3, …… u3: b1, b2, b3, …… ……. Blind Proxy Blind Proxy u1: b1, b2, b3, …… u2: b1, b2, b3, …… u3: b1, b2, b3, …… ……. But the proxies may have a (slightly) different set of answers. The PDDP system consists of three components: analysts, clients, and proxy. Analysts make queries to the system, and collect answers. Clients locally maintain their own data, and answer queries. The proxy mediates between the analysts and clients, and adds differentially private noise to clients’ answers to preserve privacy. Analysts are assumed to be potentially malicious, with a goal of violating individual users’ privacy. An analyst may collude with other analysts, or pretend to be multiple distinct analysts. An analyst may take control of clients, and attempt to use the PDDP protocol to reveal information about those clients. An analyst may deploy its own clients and manipulate their answers. An analyst may also publish its collected answers. Analysts can intercept and modify all messages (e.g., an ISP posing as an analyst). Clients are also assumed to be potentially malicious, with a goal of distorting the statistical results learned by analysts. Clients may generate false or illegitimate answers under coordinated control (e.g., as a botnet), and may act as Sybils [11]. The proxy is assumed to be honest but curious (HbC). It will faithfully follow the specified protocol, but may try to exploit additional information that can be learned in so doing. The proxy does not collude with other components. We discuss how we may be able to relax the HbC assumption by using trusted hardware in §6. Data Data Data 30

Synchronize the list of answers. Analyst u1: b1, b2, b3, …… u2: b1, b2, b3, …… u3: b1, b2, b3, …… ……. Blind Proxy Blind Proxy u1: b1, b2, b3, …… u2: b1, b2, b3, …… u3: b1, b2, b3, …… ……. Synchronize the list of answers. Share a random seed for a random number generator, use to shuffle. The PDDP system consists of three components: analysts, clients, and proxy. Analysts make queries to the system, and collect answers. Clients locally maintain their own data, and answer queries. The proxy mediates between the analysts and clients, and adds differentially private noise to clients’ answers to preserve privacy. Analysts are assumed to be potentially malicious, with a goal of violating individual users’ privacy. An analyst may collude with other analysts, or pretend to be multiple distinct analysts. An analyst may take control of clients, and attempt to use the PDDP protocol to reveal information about those clients. An analyst may deploy its own clients and manipulate their answers. An analyst may also publish its collected answers. Analysts can intercept and modify all messages (e.g., an ISP posing as an analyst). Clients are also assumed to be potentially malicious, with a goal of distorting the statistical results learned by analysts. Clients may generate false or illegitimate answers under coordinated control (e.g., as a botnet), and may act as Sybils [11]. The proxy is assumed to be honest but curious (HbC). It will faithfully follow the specified protocol, but may try to exploit additional information that can be learned in so doing. The proxy does not collude with other components. We discuss how we may be able to relax the HbC assumption by using trusted hardware in §6. Data Data Data 31

Time Queries (unfortunately) take time: There is a period of time during which a query is active 10s of minutes, hours, or days??? Start query Synchronize and add coins TIME Clients pull in and answer queries

Differential Privacy, good and bad Adds noise Lots of machinery being built Bad: Very pessimistic (measure of privacy loss is almost certainly way worse than actual privacy loss) “Throwing away the database” not realistic

From INTIMATE workshop Jean’s mobility a good application Collaborative filtering (Bach, Aruna) looks hard to do Serge’s social knowledge may be centered on user devices… Query for people’s opinions… Real-time analytics may be possible Streamed coin addition???

Status and future Building an application analytics tool Initial focus is PC platforms Hope to get real app developers to bundle our tool Additional privacy mechanisms (beyond differential privacy) Work on better understanding of privacy loss in a realistic setting