The Changing Landscape of Privacy in a Big Data World Privacy in a Big Data World A Symposium of the Board on Research Data and Information September 23,

Slides:



Advertisements
Similar presentations
I have a DREAM! (DiffeRentially privatE smArt Metering) Gergely Acs and Claude Castelluccia {gergely.acs, INRIA 2011.
Advertisements

Differentially Private Recommendation Systems Jeremiah Blocki Fall A: Foundations of Security and Privacy.
Human Mobility Modeling at Metropolitan Scales Sibren Isaacman, Richard Becker, Ramón Cáceres, Margaret Martonosi, James Rowland, Alexander Varshavsky,
Ragib Hasan Johns Hopkins University en Spring 2011 Lecture 8 04/04/2011 Security and Privacy in Cloud Computing.
Presented by: Sheekha Khetan. Mobile Crowdsensing - individuals with sensing and computing devices collectively share information to measure and map phenomena.
UTEPComputer Science Dept.1 University of Texas at El Paso Privacy in Statistical Databases Dr. Luc Longpré Computer Science Department Spring 2006.
Seminar in Foundations of Privacy 1.Adding Consistency to Differential Privacy 2.Attacks on Anonymized Social Networks Inbal Talgam March 2008.
Differential Privacy 18739A: Foundations of Security and Privacy Anupam Datta Fall 2009.
CSCE 715 Ankur Jain 11/16/2010. Introduction Design Goals Framework SDT Protocol Achievements of Goals Overhead of SDT Conclusion.
1 Chapter 12: Decision-Support Systems for Supply Chain Management CASE: Supply Chain Management Smooths Production Flow Prepared by Hoon Lee Date on 14.
Dieter Pfoser, LBS Workshop1 Issues in the Management of Moving Point Objects Dieter Pfoser Nykredit Center for Database Research Aalborg University, Denmark.
April 13, 2010 Towards Publishing Recommendation Data With Predictive Anonymization Chih-Cheng Chang †, Brian Thompson †, Hui Wang ‡, Danfeng Yao † †‡
ONS Big Data Project. Plan for today Introduce the ONS Big Data Project Provide a overview of our work to date Provide information about our future plans.
Research Directions for the Internet of Things Supervised by: Dr. Nouh Sabry Presented by: Ahmed Mohamed Sayed.
An expert system is a package that holds a body of knowledge and a set of rules on a subject that has been gained from human experts. An expert system.
IPv6 Elite Panel Addressing the IPv6 Internet Paul Wilson APNIC.
What High School Students Should Know About Cyber Security and Privacy CS4HS Workshop August 2012 Rebecca Wright Rutgers University
Issues in Implementing Technology in Schools
Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore
An Analysis of Security and Privacy Issues in Smart Grid Software Architectures on Clouds Dresden, 22/05/2014 Felipe de Sousa Silva Simmhan, Kumnhare,
R 18 G 65 B 145 R 0 G 201 B 255 R 104 G 113 B 122 R 216 G 217 B 218 R 168 G 187 B 192 Core and background colors: 1© Nokia Solutions and Networks 2014.
Constructing Individual Level Population Data for Social Simulation Models Andy Turner Presentation as part.
Signatures As Threats to Privacy Brian Neil Levine Assistant Professor Dept. of Computer Science UMass Amherst.
6 am 11 am 5 pm Fig. 5: Population density estimates using the aggregated Markov chains. Colour scale represents people per km. Population Activity Estimation.
UNIVERSITY of NOTRE DAME COLLEGE of ENGINEERING Preserving Location Privacy on the Release of Large-scale Mobility Data Xueheng Hu, Aaron D. Striegel Department.
Ragib Hasan University of Alabama at Birmingham CS 491/691/791 Fall 2011 Lecture 16 10/11/2011 Security and Privacy in Cloud Computing.
Making the most of social historic data Aleksander Kolcz Twitter, Inc.
Where did you come from? Where did you go? Robust policy relevant evidence from mobile network big data Danaja Maldeniya, Amal Kumarage, Sriganesh Lokanathan,
Health Datasets in Spatial Analyses: The General Overview Lukáš MAREK Department of Geoinformatics, Faculty.
CENTRE CEllular Network Trajectories Reconstruction Environment F. Giannotti, A. Mazzoni, S. Puntoni, C. Renso KDDLab, Pisa.
Differentially Private Marginals Release with Mutual Consistency and Error Independent of Sample Size Cynthia Dwork, Microsoft TexPoint fonts used in EMF.
Generic Approaches to Model Validation Presented at Growth Model User’s Group August 10, 2005 David K. Walters.
SUTRA [ Sustainable Urban TRAnsportation ] Fifth framework programme of the European Community Environment and sustainable development Final Meeting and.
Managing Complexity with Multi-scale Travel Forecasting Models Jeremy Raw Office of Planning Federal Highway Administration May 11, 2011.
Location, Location, Location: The Emerging Crisis in Wireless Data Privacy Ari Schwartz & Alan Davidson Center for Democracy and Technology
Chapter No 4 Query optimization and Data Integrity & Security.
FDOT Transit Office Modeling Initiatives The Transit Office has undertaken a number of initiatives in collaboration with the Systems Planning Office and.
Privacy Framework for RDF Data Mining Master’s Thesis Project Proposal By: Yotam Aron.
PRISM: Private Retrieval of the Internet’s Sensitive Metadata Ang ChenAndreas Haeberlen University of Pennsylvania.
1 IPAM 2010 Privacy Protection from Sampling and Perturbation in Surveys Natalie Shlomo and Chris Skinner Southampton Statistical Sciences Research Institute.
Geo-Indistinguishability: Differential Privacy for Location Based Services Miguel Andres, Nicolas Bordenabe, Konstantinos Chatzikokolakis, Catuscia Palamidessi.
1.Research Motivation 2.Existing Techniques 3.Proposed Technique 4.Limitations 5.Conclusion.
Exposure Assessment for Health Effect Studies: Insights from Air Pollution Epidemiology Lianne Sheppard University of Washington Special thanks to Sun-Young.
Location Privacy Protection for Location-based Services CS587x Lecture Department of Computer Science Iowa State University.
Predicting the Location and Time of Mobile Phone Users by Using Sequential Pattern Mining Techniques Mert Özer, Ilkcan Keles, Ismail Hakki Toroslu, Pinar.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
Differential Privacy Xintao Wu Oct 31, Sanitization approaches Input perturbation –Add noise to data –Generalize data Summary statistics –Means,
Unraveling an old cloak: k-anonymity for location privacy
Ambient Intelligence: Everyday Living Aid System for Elders
A Policy Based Infrastructure for Social Data Access with Privacy Guarantees Tim Finin (UMBC) for: Palanivel Kodeswaran (UMBC) Evelyne Viegas (Microsoft.
Course : Study of Digital Convergence. Name : Srijana Acharya. Student ID : Date : 11/28/2014. Big Data Analytics and the Telco : How Telcos.
Business Challenges in the evolution of HOME AUTOMATION (IoT)
1 Anonymity. 2 Overview  What is anonymity?  Why should anyone care about anonymity?  Relationship with security and in particular identification 
Internet of Things – Getting Started
Reconciling Confidentiality Risk Measures from Statistics and Computer Science Jerry Reiter Department of Statistical Science Duke University.
A hospital has a database of patient records, each record containing a binary value indicating whether or not the patient has cancer. -suppose.
An agency of the European Union Guidance on the anonymisation of clinical reports for the purpose of publication in accordance with policy 0070 Industry.
COMPSCI 720 Security for Smart-devices Tracking Mobile Web Users Through Motion Sensors: Attacks and Defenses [1] Harry Jackson hjac660 [1] Das, Anupam,
Mobility Trajectory Mining Human Mobility Modeling at Metropolitan Scales Sibren Isaacman 2012 Mobisys Jie Feng 2016 THU FIBLab.
University of Texas at El Paso
ACHIEVING k-ANONYMITY PRIVACY PROTECTION USING GENERALIZATION AND SUPPRESSION International Journal on Uncertainty, Fuzziness and Knowledge-based Systems,
Private Data Management with Verification
Privacy-preserving Release of Statistics: Differential Privacy
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Chapter 12: Automated data collection methods
Differential Privacy in Practice
University of Washington, Autumn 2018
Presented by : SaiVenkatanikhil Nimmagadda
Published in: IEEE Transactions on Industrial Informatics
Welcome to the New Smart City by TrafficCast…
Presentation transcript:

The Changing Landscape of Privacy in a Big Data World Privacy in a Big Data World A Symposium of the Board on Research Data and Information September 23, 2013 Rebecca Wright Rutgers University

The Big Data World Internet, WWW, social computing, cloud computing, mobile phones as computing devices. Embedded systems in cars, medical devices, household appliances, and other consumer products. Critical infrastructure heavily reliant on software for control and management, with fine-grained monitoring and increasing human interaction (e.g., Smart grid). Computing, especially data-intensive computing, drives advances in almost all fields. Users (or in the medical setting, patients) as content providers, not just consumers. Everyday activities over networked computers.

Privacy Means different things to different people, to different cultures, and in different contexts. Simple approaches to “anonymization” don’t work in today’s world where many data sources are readily available. Appropriate uses of data: – What is appropriate? – Who gets to decide? – What if different stakeholders disagree? There are some good definitions for some specific notions of privacy.

Personally Identifiable Information Many privacy policies and solutions are based on the concept of “personally identifiable information” (PII). However, this concept is not robust in the face of today’s realities. Any interesting and relatively accurate data about someone can be personally identifiable if you have enough of it and appropriate auxiliary information. In today’s data landscape, both of these are often available. Examples: Sweeney’s work [Swe90’s], AOL web search data [NYT06], Netflix challenge data [NS08], social network reidentification [BDK07], …

Reidentification Sweeney: 87% of the US population can be uniquely identified by their date of birth, 5-digit zip code, and gender. AOL search logs released August 2006: user IDs and IP addresses removed, but replaced by unique random identifiers. Some queries provide information about who the querier is, others give insight into the querier ’ s mind. Birth date Zip code Gender “ Innocuous ” database with names. Allows complete or partial reidentification of individuals in sensitive database.

Differential Privacy [DMNS06] The risk of inferring something about an individual should not increase (significantly) because of her being in a particular database or dataset. Even with background information available. Has proven useful for obtaining good utility and rigorous privacy, especially for “aggregate” results. Can’t hope to hide everything while still providing useful information. Example: Medical studies determine that smoking causes cancer. I know you’re a smoker.

Differential Privacy [DMNS06] A randomized algorithm A provides differential privacy if for all neighboring inputs x and x ’, all outputs t, and privacy parameter ε: is a privacy parameter.

Differential Privacy [DMNS06] Outputs, and consequences of those ouputs, are no more or less likely whether any one individual is in the database or not. is a privacy parameter.

Differentially Private Human Mobility Modeling at Metropolitan Scales [MICMW13] Human mobility models have many applications in a broad range of fields – Mobile computing – Urban planning – Epidemiology – Ecology

Goals Realistically model how large populations move within different metropolitan areas – Generate location/time pairs for synthetic individuals moving between important places – Aggregate individuals to reproduce human densities at the scale of a metropolitan area – Account for differences in mobility patterns across different metropolitan areas – While ensuring privacy of individuals whose data is used.

WHERE modeling approach [Isaacman et al.] Identify key spatial and temporal properties of human mobility Extract corresponding probability distributions from empirical data, e.g., “anonymized”Call Detail Records (CDRs) Intelligently sample those distributions Create synthetic CDRs for synthetic people

WHERE modeling procedure Home d d Work Home DistributionCommute DistributionWork Distribution Select work conditioned on home. Locate person and calls according to activity times at each location. Repeat as needed to produce a synthetic population and desired duration.

WHERE modeling procedure Distributions of commute distances per home region Distribution of # of calls in a day Probability of a call at each minute of the day Distribution of work locations Probabilities of a call at each location per hour Distribution of home locations Select # of calls q in current day Form a circle with radius c around Home Select commute distance c Select times of day for q calls Select Home (lat, long) Assign Home or Work location to each call to produce a synthetic CDR with appropriate (time, lat, long) Select Work (lat, long)

WHERE models are realistic Real CDRs WHERE2 synthetic CDRs Typical Tuesday in the NY metropolitan area WHERE synthetic CDRs

One way to achieve differential privacy Example: Home distribution (empirical) Measure the biggest change to the Home distribution that any one user can cause Add Laplace noise to the Home distribution proportional to this change [DMNS06] IDDate-time Lat, Long Home /04/13-02: , /04/13-14: , , /03/13-16: , , /02/13-00: , , /03/13-15: , ,

DP version of Home distribution

Add noise DP Commute Distance distributions DP CallsPerDay distribution DP CallTime distribution DP Work distribution DP HourlyLoc distributions DP Home distribution DP-WHERE modeling procedure Distributions of commute distances per home region Distribution of # of calls in a day Probability of a call at each minute of the day Distribution of work locations Probabilities of a call at each location per hour Distribution of home locations Select # of calls q in current day Form a circle with radius c around Home Select commute distance c Select times of day for q calls Select Home (lat, long) Assign Home or Work location to each call to produce a synthetic CDR with appropriate (time, lat, long) Select Work (lat, long) WHERE modeling procedure

DP-WHERE reproduces population densities Earth Mover’s Distance error in NY area

DP-WHERE reproduces daily range of travel

DP-WHERE Summary Synthetic CDRs produced by DP-WHERE mimic movements seen in real CDRs – Works at metropolitan scales – Capture differences between geographic areas – Reproduce population density distributions over time – Reproduce daily ranges of travel Models can be made to preserve differential privacy while retaining good modeling properties – achieve provable differential privacy with “small” overall ε – resulting CDRs still mimic real-life movements We hope to make models available

Conclusions The big data world creates opportunities for value, but also for privacy invasion Emerging privacy models and techniques have the potential to “unlock” the value of data for more uses while protecting privacy. – biomedical data – location data (e.g. from personal mobile devices or sensors in automobiles) – social network data – search data – crowd-sourced data Important to recognize that different parties have different goals and values.

The Changing Landscape of Privacy in a Big Data World Privacy in a Big Data World A Symposium of the Board on Research Data and Information September 23, 2013 Rebecca Wright Rutgers University