The Role of History and Prediction in Data Privacy Kristen LeFevre University of Michigan May 13, 2009.

Slides:



Advertisements
Similar presentations
Cipher Techniques to Protect Anonymized Mobility Traces from Privacy Attacks Chris Y. T. Ma, David K. Y. Yau, Nung Kwan Yip and Nageswara S. V. Rao.
Advertisements

On the Optimal Placement of Mix Zones Julien Freudiger, Reza Shokri and Jean-Pierre Hubaux PETS, 2009.
I have a DREAM! (DiffeRentially privatE smArt Metering) Gergely Acs and Claude Castelluccia {gergely.acs, INRIA 2011.
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Publishing Set-Valued Data via Differential Privacy Rui Chen, Concordia University Noman Mohammed, Concordia University Benjamin C. M. Fung, Concordia.
Hybrid Context Inconsistency Resolution for Context-aware Services
A Hierarchical Multiple Target Tracking Algorithm for Sensor Networks Songhwai Oh and Shankar Sastry EECS, Berkeley Nest Retreat, Jan
Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.
Differentially Private Recommendation Systems Jeremiah Blocki Fall A: Foundations of Security and Privacy.
Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Probabilistic Privacy Analysis of Published Views Hui (Wendy) Wang Laks V.S.
Di Yang, Elke A. Rundensteiner and Matthew O. Ward Worcester Polytechnic Institute VLDB 2009, Lyon, France 1 A Shared Execution Strategy for Multiple Pattern.
Sensor-Based Abnormal Human-Activity Detection Authors: Jie Yin, Qiang Yang, and Jeffrey Junfeng Pan Presenter: Raghu Rangan.
Dynamic Bayesian Networks (DBNs)
Identity Management Based on P3P Authors: Oliver Berthold and Marit Kohntopp P3P = Platform for Privacy Preferences Project.
1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.
Leting Wu Xiaowei Ying, Xintao Wu Dept. Software and Information Systems Univ. of N.C. – Charlotte Reconstruction from Randomized Graph via Low Rank Approximation.
 Guarantee that EK is safe  Yes because it is stored in and used by hw only  No because it can be obtained if someone has physical access but this can.
CSCE 715 Ankur Jain 11/16/2010. Introduction Design Goals Framework SDT Protocol Achievements of Goals Overhead of SDT Conclusion.
Privacy Preserving Publication of Moving Object Data Joey Lei CS295 Francesco Bonchi Yahoo! Research Avinguda Diagonal 177, Barcelona, Spain 6/10/20151CS295.
Data Privacy UMich Kristen LeFevre, Assistant Professor CSE Current Group Members: Daniel Fabbri (Ph.D. student) Lujun Fang (Ph.D. student)
April 13, 2010 Towards Publishing Recommendation Data With Predictive Anonymization Chih-Cheng Chang †, Brian Thompson †, Hui Wang ‡, Danfeng Yao † †‡
PRIVACY CRITERIA. Roadmap Privacy in Data mining Mobile privacy (k-e) – anonymity (c-k) – safety Privacy skyline.
Preserving Privacy in Clickstreams Isabelle Stanton.
Hippocratic Databases Paper by Rakesh Agrawal, Jerry Kiernan, Ramakrishnan Srikant, Yirong Xu CS 681 Presented by Xi Hua March 1st,Spring05.
1 Preserving Privacy in GPS Traces via Uncertainty-Aware Path Cloaking by: Baik Hoh, Marco Gruteser, Hui Xiong, Ansaf Alrabady ACM CCS '07 Presentation:
D ATABASE S ECURITY Proposed by Abdulrahman Aldekhelallah University of Scranton – CS521 Spring2015.
Differentially Private Data Release for Data Mining Benjamin C.M. Fung Concordia University Montreal, QC, Canada Noman Mohammed Concordia University Montreal,
Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System Rui Chen, Concordia University Benjamin C. M. Fung,
GeoPKDD Geographic Privacy-aware Knowledge Discovery and Delivery Kick-off meeting Pisa, March 14, 2005.
Signatures As Threats to Privacy Brian Neil Levine Assistant Professor Dept. of Computer Science UMass Amherst.
UNIVERSITY of NOTRE DAME COLLEGE of ENGINEERING Preserving Location Privacy on the Release of Large-scale Mobility Data Xueheng Hu, Aaron D. Striegel Department.
Quantifying Location Privacy Reza Shokri George Theodorakopoulos Jean-Yves Le Boudec Jean-Pierre Hubaux May 2011.
Annual Workshop February 5th, A Formal Approach to Analyze Privacy in Electronic Services MSEC Koen Decroix [Koen Decroix – MSEC - KU Leuven]
CS573 Data Privacy and Security Statistical Databases
Introduction to: 1.  Goal[DEN83]:  Provide frequency, average, other statistics of persons  Challenge:  Preserving privacy[DEN83]  Interaction between.
Privacy-Aware Personalization for Mobile Advertising
Differentially Private Data Release for Data Mining Noman Mohammed*, Rui Chen*, Benjamin C. M. Fung*, Philip S. Yu + *Concordia University, Montreal, Canada.
Searching for Extremes Among Distributed Data Sources with Optimal Probing Zhenyu (Victor) Liu Computer Science Department, UCLA.
Protecting Sensitive Labels in Social Network Data Anonymization.
Secure Sensor Data/Information Management and Mining Bhavani Thuraisingham The University of Texas at Dallas October 2005.
Privacy Preservation of Aggregates in Hidden Databases: Why and How? Arjun Dasgupta, Nan Zhang, Gautam Das, Surajit Chaudhuri Presented by PENG Yu.
Survey on Privacy-Related Technologies Presented by Richard Lin Zhou.
Personalized Social Recommendations – Accurate or Private? A. Machanavajjhala (Yahoo!), with A. Korolova (Stanford), A. Das Sarma (Google) 1.
Rushing Attacks and Defense in Wireless Ad Hoc Network Routing Protocols ► Acts as denial of service by disrupting the flow of data between a source and.
Preservation of Proximity Privacy in Publishing Numerical Sensitive Data J. Li, Y. Tao, and X. Xiao SIGMOD 08 Presented by Hongwei Tian.
Privacy Framework for RDF Data Mining Master’s Thesis Project Proposal By: Yotam Aron.
PRISM: Private Retrieval of the Internet’s Sensitive Metadata Ang ChenAndreas Haeberlen University of Pennsylvania.
Preserving Privacy in GPS Traces via Uncertainty- Aware Path Cloaking Baik Hoh, Marco Gruteser, Hui Xiong, Ansaf Alrabady Presented by Joseph T. Meyerowitz.
August 30, 2004STDBM 2004 at Toronto Extracting Mobility Statistics from Indexed Spatio-Temporal Datasets Yoshiharu Ishikawa Yuichi Tsukamoto Hiroyuki.
MaskIt: Privately Releasing User Context Streams for Personalized Mobile Applications SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference.
m-Privacy for Collaborative Data Publishing
Machine Learning Concept Learning General-to Specific Ordering
Privacy Protection in Social Networks Instructor: Assoc. Prof. Dr. DANG Tran Khanh Present : Bui Tien Duc Lam Van Dai Nguyen Viet Dang.
Data Mining and Decision Support
Location Privacy Protection for Location-based Services CS587x Lecture Department of Computer Science Iowa State University.
Probabilistic km-anonymity (Efficient Anonymization of Large Set-valued Datasets) Gergely Acs (INRIA) Jagdish Achara (INRIA)
1 Differential Privacy Cynthia Dwork Mamadou H. Diallo.
Towards Robustness in Query Auditing Shubha U. Nabar Stanford University VLDB 2006 Joint Work With B. Marthi, K. Kenthapadi, N. Mishra, R. Motwani.
Reconciling Confidentiality Risk Measures from Statistics and Computer Science Jerry Reiter Department of Statistical Science Duke University.
Privacy Vulnerability of Published Anonymous Mobility Traces Chris Y. T. Ma, David K. Y. Yau, Nung Kwan Yip (Purdue University) Nageswara S. V. Rao (Oak.
Developing our Metadata: Technical Considerations & Approach Ray Plante NIST 4/14/16 NMI Registry Workshop BIPM, Paris 1 …don’t worry ;-) or How we concentrate.
Expanding the Role of Synthetic Data at the U.S. Census Bureau 59 th ISI World Statistics Congress August 28 th, 2013 By Ron S. Jarmin U.S. Census Bureau.
Private Data Management with Verification
Privacy-preserving Release of Statistics: Differential Privacy
Differential Privacy in Practice
Differential Privacy (2)
“Location Privacy Protection for Smartphone Users”
Presented by : SaiVenkatanikhil Nimmagadda
Published in: IEEE Transactions on Industrial Informatics
Differential Privacy (1)
Presentation transcript:

The Role of History and Prediction in Data Privacy Kristen LeFevre University of Michigan May 13, 2009

2 Data Privacy Personal information collected every day Healthcare, insurance information Supermarket transaction data RFID, GPS Data Employment history Web search / clickstream

3 Data Privacy Legal, ethical, technical issues surrounding –Data ownership –Data collection –Data dissemination and use Considerable recent interest from technical community –High-profile mishaps and lawsuits –Compliance with data-sharing mandates

4 Privacy Protection Technologies for Public Datasets Goal: Protect sensitive personal information while preserving data utility Privacy Policies and Mechanisms Example Policies: –Protect individual identities –Protect the values of sensitive attributes –Differential privacy [Dwork 06] Example Mechanisms: –Generalize (“coarsen”) the data –Aggregate the data –Add random noise to the data –Add random noise to query results

5 Observations Much work has focused on static data –One-time snapshot publishing –Disclosure by composing multiple different snapshots of a static database [Xiao 07, Ganta 08] –Auditing queries on a static database [Chin 81, Kenthapadi 06, …] What are the unique challenges when the data evolves over time?

6 Outline Sample Problem: Continuously publishing privacy-sensitive GPS traces –Motivation & problem setup –Framework for reasoning about privacy –Algorithms for continuous publishing –Experimental results Applications to other dynamic data speculation

7 GPS Traces (ongoing work w/ Wen Jin, Jignesh Patel) GPS devices attached to phones, cars Interest in collecting and distributing location traces in real time –Real-time traffic reporting –Adaptive pricing / placement of outdoor ads Simultaneous concern for personal privacy Challenge: Can we continuously collect and publish location traces without compromising individual privacy?

8 Data Recipient Problem Setting Central Trace Repository GPS Users (7 AM) Privacy Policy “Sanitized” Location Snapshot “Sanitized” Location Snapshot GPS Users (7:05 AM) “Sanitized” Location Snapshot “Sanitized” Location Snapshot

9 Problem Setting Finite population of n users with unique identifiers {u 1,…,u n } Assume users’ locations are reported and published in discrete epochs t 1,t 2,… Location snapshot D(t j ) –Associates each user with a location during epoch t j Publish sanitized version D*(t j )

10 Threat Model Attacker wants to determine the location of a target user u i during epoch t j Auxiliary Information: Attacker knows location information during some other epochs (e.g., Yellow Pages)

11 Some Naïve Solutions Strawman 1: Replace users’ identifiers ({u 1,…,u n }) with pseudonyms ({p 1,…,p n }) –Problem: Once attacker “unmasks” user p i, he can track her location forever Strawman 2: New pseudonyms ({p 1 j,…,p n j }) at each epoch t j –Problem: Users can still be tracked using multi- target tracking tools [Gruteser 05, Krumm 07]

12 Key Problem: Motion Prediction {Alice, Bob, Charlie} What if the speed limit is 60 mph? Alice

13 Threat Model Attacker wants to determine the location of a target user u i during epoch t j Auxiliary Information: Attacker knows location information during some other epochs (e.g., Yellow Pages) Motion prediction: Given one or more locations for u i, attacker can predict (probabilistically) u i ’s location during following and preceding epochs

14 Privacy Principle: Temporal Unlinkability Consider an attacker who is able to identify (locate) target user u j during m sequential epochs Under reasonable assumptions, he should not be able to locate u j with high confidence during any other epochs * *Similar in spirit to “mix zones” [Beresford 03], which addressed a related problem in a less-formal way.

15 Sanitization Mechanism Needed to select a sanitization mechanism; chose one for maximum flexibility Assign each user u i consistent pseudonym p i Divide users into clusters –Within each cluster, break association between pseudonym, location Release candidate for D(t j ) D*(t j ) = {(C 1 (t j ), L 1 (t j )),…, (C B (t j ), L B (t j ))} –  i=1..B C i (t j ) = {p 1,…,p n } –C i (t j )  C h (t j ) =  (i  h) –Each L i (t j ) contains the locations of users in C i (t j )

16 Sanitization Mechanism: Example Pseudonyms {p 1, p 2, p 3, p 4 } {p1,p2} {p3,p4} t {p1,p2} {p3,p4} t {p1,p3} {p2,p4} t

17 Reasoning about Privacy How can we guarantee temporal unlinkability under the threats of auxiliary information and motion prediction? –(Using the cluster-based sanitization mechanism) Novel framework with two key components –Motion model describes location correlations between epochs –Breach probability function describes an attacker’s ability to compromise temporal unlinkability

18 Motion Models Model motion using an h-step Markov chain –Conditional probability for user’s location, given his location during h prior (future) epochs –Same motion model used by attacker and publisher Forward motion model template –Pr[Loc(P,T j ) = L j | Loc(P,T j-1 ) = L j-1, …, Loc(P,T j-h ) = L j-h ] Backward motion model template –Pr[Loc(P,T j ) = L j | Loc(P,T j+1 ) = L j+1, …, Loc(P,T j+h ) = L j+h ] Independent and replaceable component –For this work, used 1-step motion model based on velocity distribution (speed and direction)

19 Motion Models: Example {p1,p2} {p3,p4} t0t1 Pseudonyms {p 1, p 2, p 3, p 4 } Epochs t 0, t 1, t 2 p1p2p3p4abcd t2 p3 p1 p2 p4 Pr[loc(p 1,t 1 ) = a|Loc(p 1,t 0 )=x] Pr[loc(p 1,t 1 ) = b|Loc(p 1,t 0 )=x] Pr[loc(p 1,t 1 ) = a|Loc(p 1,t 2 )=y]

20 Privacy Breaches Forward breach probability –Pr[Loc(P,T j ) = L j | D(T j-1 ), …, D(T j-h ), D*(T j )] Backward breach probability –Pr[Loc(P,T j ) = L j | D(T j+1 ), …, D(T j+h ), D*(T j )] Privacy Breach: Release candidate D*(T j ) causes a breach iff either of the following is true for threshold C max P, Lj Pr[Loc(P,T j ) = L j | D(T j-1 ), …, D(T j-h ), D*(T j )] > C max P, Lj Pr[Loc(P,T j ) = L j | D(T j+1 ), …, D(T j-h ), D*(T j )] > C

21 Privacy Breaches: Example {p1,p2} {p3,p4} t0t1 p1 p2 p3 p4 a b c d e1 = Pr[loc(p 1,t 1 ) = a|Loc(p 1,t 0 )=x] e2 = Pr[loc(p 1,t 1 ) = b|Loc(p 1,t 0 )=x] e3 = Pr[loc(p 2,t 1 ) = a|Loc(p 2,t 0 )=y] e4 = Pr[loc(p 2,t 1 ) = b|Loc(p 2,t 0 )=y] Pr[loc(p 1,t 1 ) = a|D(T0), D*(T1)] = e1 * e4 e1 * e4 + e2 * e3 … Goal: Verify that all (forward and backward) breach probabilities < threshold C x y

22 Checking for Breaches Does release candidate D*(T j ) cause a breach? Brute force algorithm –Exponential in release candidate cluster size Heuristic pruning tools –Reduce the search space considerably in practice

23 Publishing Algorithms How to publish useful data, without causing a privacy breach? Cluster-based sanitization mechanism offers two main options –Increase cluster size (or change composition) –Reduce publication frequency

24 Publishing Algorithms General Case –At each epoch T j, publish the most compact release candidate D*(T j ) that does not cause a breach –Need to delay publishing until epoch T j+h to check for backward breaches –NP-hard optimization problem; proposed alternative heuristics Special Case –Durable clusters (same individuals at each epoch) –Motion model satisfies symmetry property –No need to delay publishing

25 Experimental Study Used real highway traffic data from UM Transportation Research Institute –GPS data sampled from cars of 72 volunteers –Sampling rate (epoch) = 0.01 seconds –Speed range km/hour Also synthetic data –Able to control the generative motion distribution

26 Experimental Study All static “snapshot” anonymization mechanisms vulnerable to motion prediction attacks –Applied two representative algorithms (r-Gather [Aggarwal 06] and k-Condense [Aggarwal 04]) –Each produces a set of clusters with  k users each r-Gather k-Condense

27 Speculation / Future Work GPS example illustrates importance of reasoning about data dynamics and history, and predictable patterns of change in privacy Dynamic private data in other apps. –E.g., longitudinal social science data Study subjects age predictably Most people don’t move very far Income changes predictably Hypothesis: History and prediction are important in these settings, too!