Christian S. Jensen www.cs.aau.dk/~csj joint work with Man Lung Yiu, Hua Lu, Jesper Møller, Gabriel Ghinita, and Panos Kalnis Privacy for Spatial Queries.

Slides:



Advertisements
Similar presentations
A Survey of Key Management for Secure Group Communications Celia Li.
Advertisements

A Privacy Preserving Index for Range Queries
Efficient Evaluation of k-Range Nearest Neighbor Queries in Road Networks Jie BaoChi-Yin ChowMohamed F. Mokbel Department of Computer Science and Engineering.
Quality Aware Privacy Protection for Location-based Services Zhen Xiao, Xiaofeng Meng Renmin University of China Jianliang Xu Hong Kong Baptist University.
Fast Algorithms For Hierarchical Range Histogram Constructions
Location Privacy Preservation in Collaborative Spectrum Sensing Shuai Li, Haojin Zhu, Zhaoyu Gao, Xinping Guan, Shanghai Jiao Tong University Kai Xing.
Outsourcing Search Services on Private Spatial Data Man Lung Yiu, Gabriel Ghinita, Christian Jensen, and Panos Kalnis Presenter: Uma Kannan 1.
STHoles: A Multidimensional Workload-Aware Histogram Nicolas Bruno* Columbia University Luis Gravano* Columbia University Surajit Chaudhuri Microsoft Research.
Quantifying Location Privacy: The Case of Sporadic Location Exposure Reza Shokri George Theodorakopoulos George Danezis Jean-Pierre Hubaux Jean-Yves Le.
Mohamed F. Mokbel University of Minnesota
Effectively Indexing Uncertain Moving Objects for Predictive Queries School of Computing National University of Singapore Department of Computer Science.
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
A Novel Scheme for Video Similarity Detection Chu-Hong Hoi, Steven March 5, 2003.
Spatial Data Security Methods Avinash Kumar Sahu Under Guidance of Prof. (Mrs.) P. Venkatachalam Centre of Studies in Resources Engineering Indian Institute.
Location Privacy in Casper: A Tale of two Systems
Spatio-temporal Databases Time Parameterized Queries.
Homework #4 Solutions Brian A. LaMacchia Portions © , Brian A. LaMacchia. This material is provided without.
An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University This is joint work with Murat.
1 SINA: Scalable Incremental Processing of Continuous Queries in Spatio-temporal Databases Mohamed F. Mokbel, Xiaopeng Xiong, Walid G. Aref Presented by.
Privacy and Integrity Preserving in Distributed Systems Presented for Ph.D. Qualifying Examination Fei Chen Michigan State University August 25 th, 2009.
Tracking Moving Objects in Anonymized Trajectories Nikolay Vyahhi 1, Spiridon Bakiras 2, Panos Kalnis 3, and Gabriel Ghinita 3 1 St. Petersburg State University.
Privacy-Preserving Computation and Verification of Aggregate Queries on Outsourced Databases Brian Thompson 1, Stuart Haber 2, William G. Horne 2, Tomas.
Privacy Preserving OLAP Rakesh Agrawal, IBM Almaden Ramakrishnan Srikant, IBM Almaden Dilys Thomas, Stanford University.
Ad-hoc Distributed Spatial Joins on Mobile Devices Panos Kalnis, Xiaochen Li National University of Singapore Nikos Mamoulis The University of Hong Kong.
MobiHide: A Mobile Peer-to-Peer System for Anonymous Location-Based Queries Gabriel Ghinita, Panos Kalnis, Spiros Skiadopoulos National University of Singapore.
Research interest: Secure database outsourcing Presented by Alla Lanovenko Thesis Adviser: Professor Huiping Guo 599 A 11 December 2006.
Privacy-Preserving Data Mining Rakesh Agrawal Ramakrishnan Srikant IBM Almaden Research Center 650 Harry Road, San Jose, CA Published in: ACM SIGMOD.
PRIVACY CRITERIA. Roadmap Privacy in Data mining Mobile privacy (k-e) – anonymity (c-k) – safety Privacy skyline.
Wang, Z., et al. Presented by: Kayla Henneman October 27, 2014 WHO IS HERE: LOCATION AWARE FACE RECOGNITION.
A User Experience-based Cloud Service Redeployment Mechanism KANG Yu.
Cong Wang1, Qian Wang1, Kui Ren1 and Wenjing Lou2
Navigating and Browsing 3D Models in 3DLIB Hesham Anan, Kurt Maly, Mohammad Zubair Computer Science Dept. Old Dominion University, Norfolk, VA, (anan,
Privacy Preserving Query Processing in Cloud Computing Wen Jie
Research Overview Kyriakos Mouratidis Assistant Professor School of Information Systems Singapore Management University
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
Gabriel Ghinita1 Panos Kalnis1 Ali Khoshgozaran2 Cyrus Shahabi2
Privacy Preserving Data Mining on Moving Object Trajectories Győző Gidófalvi Geomatic ApS Center for Geoinformatik Xuegang Harry Huang Torben Bach Pedersen.
Systems and Internet Infrastructure Security (SIIS) LaboratoryPage Systems and Internet Infrastructure Security Network and Security Research Center Department.
Disclosure risk when responding to queries with deterministic guarantees Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University.
1 SpaceTwist: A Flexible Approach for Hiding Query User Location Speaker: Man Lung Yiu Aalborg University Joint work with Christian S. Jensen, Xuegang.
HPDC 2013 Taming Massive Distributed Datasets: Data Sampling Using Bitmap Indices Yu Su*, Gagan Agrawal*, Jonathan Woodring # Kary Myers #, Joanne Wendelberger.
Additive Data Perturbation: the Basic Problem and Techniques.
Randomization in Privacy Preserving Data Mining Agrawal, R., and Srikant, R. Privacy-Preserving Data Mining, ACM SIGMOD’00 the following slides include.
Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation.
Efficient Processing of Top-k Spatial Preference Queries
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
Wei-Shinn Ku Slide 1 Auburn University Computer Science and Software Engineering Query Integrity Assurance of Location-based Services Accessing Outsourced.
Geo-Indistinguishability: Differential Privacy for Location Based Services Miguel Andres, Nicolas Bordenabe, Konstantinos Chatzikokolakis, Catuscia Palamidessi.
Privacy Preserving Payments in Credit Networks By: Moreno-Sanchez et al from Saarland University Presented By: Cody Watson Some Slides Borrowed From NDSS’15.
A Hybrid Technique for Private Location-Based Queries with Database Protection Gabriel Ghinita 1 Panos Kalnis 2 Murat Kantarcioglu 3 Elisa Bertino 1 1.
Bloom Cookies: Web Search Personalization without User Tracking Authors: Nitesh Mor, Oriana Riva, Suman Nath, and John Kubiatowicz Presented by Ben Summers.
Presented By Amarjit Datta
Location Privacy Protection for Location-based Services CS587x Lecture Department of Computer Science Iowa State University.
Probabilistic km-anonymity (Efficient Anonymization of Large Set-valued Datasets) Gergely Acs (INRIA) Jagdish Achara (INRIA)
Secure Data Outsourcing
Privacy-Preserving Publication of User Locations in the Proximity of Sensitive Sites Bharath Krishnamachari Gabriel Ghinita Panos Kalnis National University.
Similarity Measurement and Detection of Video Sequences Chu-Hong HOI Supervisor: Prof. Michael R. LYU Marker: Prof. Yiu Sang MOON 25 April, 2003 Dept.
Center for E-Business Technology Seoul National University Seoul, Korea Private Queries in Location Based Services: Anonymizers are not Necessary Gabriel.
Overview Issues in Mobile Databases – Data management – Transaction management Mobile Databases and Information Retrieval.
 A Two-level Protocol to Answer Private Location-based Queries Roopa Vishwanathan Yan Huang [RoopaVishwanathan, Computer Science and.
Gabriel Ghinita1 Panos Kalnis1 Ali Khoshgozaran2 Cyrus Shahabi2
Private Data Management with Verification
Location Cloaking for Location Safety Protection of Ad Hoc Networks
Probabilistic Data Management
Location Privacy.
Spatio-temporal Pattern Queries
Efficient Evaluation of k-NN Queries Using Spatial Mashups
Feifei Li, Ching Chang, George Kollios, Azer Bestavros
Efficient Processing of Top-k Spatial Preference Queries
Presentation transcript:

Christian S. Jensen joint work with Man Lung Yiu, Hua Lu, Jesper Møller, Gabriel Ghinita, and Panos Kalnis Privacy for Spatial Queries and Data

Motivation Outsourcing and cloud computing are on the rise. Big and growing mobile Internet 2.7 B mobile phone users (cf. 850 MM PCs) 1.1 B Internet users, 750 MM access the Internet from phones This year, 1.2 B mobile phones will be sold, 200 MM high-end (cf. 200 MM PCs); 13 MM new users in China and India monthly Africa has surpassed North America in numbers of users The mobile Internet will be location aware. GPS, Wi-Fi-based, cell-id-based, Bluetooth-based, other A very important signal in a mobile setting! Privacy is an enabling technology. 2Stanford InfoSeminar, March 6, © Christian S. Jensen. All rights reserved.

3 Outline Query Location Privacy Motivation and related work Solution: SpaceTwist Granular search in SpaceTwist Empirical study Summary Spatial Data Privacy Closing remarks 3Stanford InfoSeminar, March 6, © Christian S. Jensen. All rights reserved.

Query Location Privacy A mobile user wants nearby points of interest. A service provider offers this functionality. Requires an account and login The user does not trust the service provider. The user wants location privacy. 4 clientserver What should I do? I want the nearest x. I don’t want to tell where I am. Stanford InfoSeminar, March 6, © Christian S. Jensen. All rights reserved.

Spatial Cloaking kNN query (k=1) K anonymity Range kNN query Candidate set is {p 1, …, p 6 } Result is p 1 Identity vs. location privacy p-2-p or only client Cloaking wo. K anonymity Q’ may be other shapes, dummies. 5 client server anonymizer q u1u1 u2u2 u3u3 q uiui Q’ pipi pipi p2p2 p4p4 p3p3 p6p6 p5p5 p1p1 uiui Stanford InfoSeminar, March 6, © Christian S. Jensen. All rights reserved.

Transformation-Based Privacy 6 client server q H(q) H(p i ) p2p2 p3p3 p1p1 pipi H, H -1 H(q) q H’, H’ -1 {10,13,14} H(q) = 2 10 H’(p i ) {5,8,11} H’(q) = H’(q) H’(p i ) Stanford InfoSeminar, March 6, © Christian S. Jensen. All rights reserved.

Definitions of Privacy K-anonymity: The user cannot be distinguished from K-1 other users. The area of the region within which the user’s position can be. The average distance between the true position and all possible positions. 7Stanford InfoSeminar, March 6, © Christian S. Jensen. All rights reserved.

Solution Requirements The solution must enable the user to retrieve the nearest points of interest while affording the user location privacy. Should offer flexibility in the degree of privacy guaranteed, so that the user can decide  Settings should be meaningful to the user  Like browser security settings or a slider Should work with a standard client-server architecture  The user trusts only the mobile client Should assume a typical setting where the user must log in to use the service Should provide privacy at low performance overhead  Server-side costs – workload and complexity  Communication costs – bits transferred  Client-side costs – workload, complexity, power Should enable better performance by reducing the result accuracy 8Stanford InfoSeminar, March 6, © Christian S. Jensen. All rights reserved.

SpaceTwist Concepts Anchor location q’ (fake client location) Defines an ordering on the data points Client fetches points from server incrementally Supply space The part to space explored by the client so far Known by both server and client Grows as more data points are retrieved Demand space Guaranteed to cover the actual result Known only by the client Shrinks when a “better” result is found Termination when the supply space contains the demand space 9 the beginning q’ q demand space supply space the end q’ q Stanford InfoSeminar, March 6, © Christian S. Jensen. All rights reserved.

SpaceTwist Example 10 client server q’ pipi q p1p1 p2p2 p3p3 pipi qq p2p2 p3p3 p1p1 p2p2 p3p3 p1p1 p2p2 p3p3 p1p1 Stanford InfoSeminar, March 6, © Christian S. Jensen. All rights reserved.

SpaceTwist Properties Retrieves data points from the server incrementally until the client can produce the exact result Fundamentally different from previous approaches No cloaking region Queries are evaluated in the original space. Offers privacy guarantees Relatively easy to support in existing systems Simple client-server architecture (no trusted components, peers) Simple server-side query processing: incremental NN search Granular search (improved server-side performance) Reduced communication cost for results with guaranteed accuracy 11Stanford InfoSeminar, March 6, © Christian S. Jensen. All rights reserved.

Privacy Analysis What does the server know? The anchor location q’ The reported points (in reporting order): p 1, p 2, …, p m  Termination condition: dist(q,q’) + dist(q,NN) ≤ dist(q’, p m  ) Possible query location q c The client did not stop at point p (m-1)   dist(q c, q’) + min{ dist(q c, p i ) : i  [1,(m-1)  ] } > dist(q’, p (m-1)  ) Client stoped at point p m   dist(q c, q’) + min{ dist(q c, p i ) : i  [1,m  ] } ≤ dist(q’, p m  ) Inferred privacy region  : the set of all possible q c Quantification of privacy Privacy value:  (q,  ) = the average dist. of location in  from q 12Stanford InfoSeminar, March 6, © Christian S. Jensen. All rights reserved.

Visualization of  Visualization with different types of points Characteristics of  (i.e., possible locations q c ) Roughly an irregular ring shape centered at q’ Radius approx. dist(q,q’) 13  =4 coarser granularity Stanford InfoSeminar, March 6, © Christian S. Jensen. All rights reserved.

Privacy Analysis By carefully selecting the distance between q and q’, it is possible to guarantee a privacy setting specified by the user. SpaceTwist extension: Instead of terminating when possible, request additional query points. This makes the problem harder for the adversary. It makes it easier (and more practical) to guarantee a privacy setting. 14Stanford InfoSeminar, March 6, © Christian S. Jensen. All rights reserved.

Communication Cost The communication cost is the number of (TCP/IP) packets transmitted. It is inefficient to use a packet for each point. Rather packets are filled before transmission. The packet capacity  is the number of points in a packet. Actual value of  ? Depends on the Maximum Transmission Unit (MTU) In empirical studies, we use MTU = 576 bytes and  = 67. The cost has been characterized analytically. Empirical studies have been conducted. 15Stanford InfoSeminar, March 6, © Christian S. Jensen. All rights reserved.

Granular Search What if the server considers searching on a small sample of the data points instead of all? Lower communication cost  becomes large at low data density But less accurate results Accuracy requirement: the user specifies an error bound  A point p  P is a relaxed NN of q iff dist(q, p)   + min {dist(q, p’) : p’  P} A grid with cell length =  /  2 is applied. As before, the server reports points in ascending distance from q’, but it never reports more than one data point p from the same cell. 16Stanford InfoSeminar, March 6, © Christian S. Jensen. All rights reserved.

Granular Search Example 17 p4p4 q p1p1 q’ p2p2 p3p3 Stanford InfoSeminar, March 6, © Christian S. Jensen. All rights reserved.

Experimental Study Our solution GST (Granular SpaceTwist) Without delayed termination Spatial datasets (domain: [0,10,000] 2 ) Two real datasets: SC (172,188 pts), TG (556,696 pts) Synthetic uniform random UI datasets Performance metrics (workload size = 100) Communication cost (in number of packets; 1 packet = 67 points) Result error (result NN distance – actual NN distance) Privacy value of inferred privacy region  Default parameter values Anchor distance dist(q,q’): 200 Error bound  : 200 Data size N: 500,000 18Stanford InfoSeminar, March 6, © Christian S. Jensen. All rights reserved.

Transformation-Based Privacy Vs. GST Hilbert transformation [Khoshgozaran and Shahabi, 2007] SHB: single Hilbert curve DHB: two orthogonal Hilbert curves GST computes result with low error Very low error on real (skewed) data Stable error across different data distributions 19 k Error (meter) UI, N=0.5MSCTG SHBDHBGSTSHBDHBGSTSHBDHBGST Stanford InfoSeminar, March 6, © Christian S. Jensen. All rights reserved.

Spatial Cloaking Vs. GST Our problem setting: no trusted middleware Competitor: client-side spatial cloaking (CLK) CLK: enlarge q into a square with side length 2*dist(q,q’)  Extent comparable to inferred privacy region  of GST GST produces result at low communication cost  Low cost even at high privacy  Cost independent of N 20 varying dist(q,q’) varying data size N communication cost (# of packets) dist(q, q’) SCTG CLKGSTCLKGST N(million) UI CLKGST Stanford InfoSeminar, March 6, © Christian S. Jensen. All rights reserved.

Summary SpaceTwist is a novel solution for query location privacy of mobile users Granular search at the server Advantages Guaranteed, flexible privacy settings Assumes only a simple client-server setting Low processing and communication cost Enables trading of (guaranteed) accuracy for performance Extensions Ring-based server-side retrieval order, spatial networks Future work Additional query types 21Stanford InfoSeminar, March 6, © Christian S. Jensen. All rights reserved.

22 Outline Query Location Privacy Motivation and related work Solution: SpaceTwist Granular search in SpaceTwist Empirical study Summary Spatial Data Privacy Problem setting, solution framework, and objectives Tailored and general attack models Solution overview Summary Closing remarks 22Stanford InfoSeminar, March 6, © Christian S. Jensen. All rights reserved.

Problem Setting On a trip to Paris, Alice takes photos with her GPS phone camera Private spatial data: each photo tagged with its GPS location (automatically) Example of user-generated content Alice wants to outsource spatial search on the above data to a service provider, e.g., Flickr, Facebook, Picasa Trusted query users: Alice’s friends Nobody else (including the service provider) can be trusted 23Stanford InfoSeminar, March 6, © Christian S. Jensen. All rights reserved.

24 Solution Framework T(P’) Indexed data P’ Transformed dataset Service provider (SP) Trusted friend (query user) P Original dataset Private data owner (PDO) Stanford InfoSeminar, March 6, © Christian S. Jensen. All rights reserved.

Objectives Objectives of the solution Support efficient and accurate processing of range queries Make it hard to reconstruct the original points in P from the transformed points in P’ Orthogonal aspects Verifying the correctness of the query results Protecting the identities of the data owner and query users 25Stanford InfoSeminar, March 6, © Christian S. Jensen. All rights reserved.

26 Attack Models What does the attacker know? The set P’ of the transformed points Background information: a subset S of points in P and their corresponding points S’ in P’  But no other points in P  Cannot choose an S (S’) Tailored attack Specific to the known transformation method Goal: determine the exact location of each point Formulate a system of equations, solve for the key parameters by using the values in S and S’ Tailored attacks can be computationally infeasible Stanford InfoSeminar, March 6, © Christian S. Jensen. All rights reserved.

2 6 4 c p’ General attack Independent of the (unknown) transformation method Goal: estimate a location c, such that the feature vector of c (wrt. S) is the most similar to the feature vector of p’ (wrt. S’) 27 Attack models original spacetransformed space V(p’, S’) = V(c, S) = s’ 1 s’ 3 s’ 2 s2s2 s1s1 s3s3 Stanford InfoSeminar, March 6, © Christian S. Jensen. All rights reserved.

Overview of Solutions See papers (listed at the end) for details! Stanford InfoSeminar, March 6, © Christian S. Jensen. All rights reserved.28 MethodTailored attackGeneral attack Transferred data cost Round trips HSD2 known points in same partition High distortion Low1 ERBN/ALow distortion High grows with  1 HSD*N/AHigh distortion Moderate1 CRTN/A ModerateTree height

Summary Contributions A framework that enables service providers to process range queries without knowing actual data Spatial transformations: HSD, ERB, HSD* Cryptographic transformation: CRT Proposals for tailored and general attacks Future work Support other spatial queries, e.g., nearest neighbors, spatial joins 29Stanford InfoSeminar, March 6, © Christian S. Jensen. All rights reserved.

Concluding Remarks The contributions to spatial query and data privacy presented here are part of a trend. Many other challenges, e.g., relating to Privacy for historical data Trust Authentication (e.g., “does the server produce ‘correct’ results”?) 30 Data Management infrastructure for cloud computing Stanford InfoSeminar, March 6, © Christian S. Jensen. All rights reserved.

References M. F. Mokbel, C.-Y. Chow, and W. G. Aref. The New Casper: Query Processing for Location Services without Compromising Privacy. In VLDB, P. Indyk and D.Woodruff. Polylogarithmic Private Approximations and Efficient Matching. In Theory of Cryptography Conference, A. Khoshgozaran and C. Shahabi. Blind Evaluation of Nearest Neighbor Queries Using Space Transformation to Preserve Location Privacy. In SSTD, G. R. Hjaltason and H. Samet. Distance Browsing in Spatial Databases. TODS, 24(2):265–318, R. Agrawal, J. Kiernan, R. Srikant, and Y. Xu, Order-Preserving Encryption for Numeric Data. In SIGMOD, E. Damiani, S. D. C. Vimercati, S. Jajodia, S. Paraboschi, and P. Samarati, Balancing Confidentiality and Efficiency in Untrusted Relational DBMSs. In Proc. of Computer and Communications Security, H. Hacigümüs, B. R. Iyer, C. Li, and S. Mehrotra, Executing SQL over Encrypted Data in the Database-Service-Provider Model. In SIGMOD, R. Agrawal and R. Srikant, Privacy-Preserving Data Mining. In SIGMOD, Stanford InfoSeminar, March 6, © Christian S. Jensen. All rights reserved.

32 Readings C. S. Jensen: When the Internet Hits the Road. Proc. BTW, pp , C. S. Jensen, C. R. Vicente, and R. Wind. User-Generated Content – The case for Mobile Services. IEEE Computer, 41(12):116–118, December To obtain permission to use these slides and to obtain copies of papers in submission, send to M. L. Yiu, C. S. Jensen, H. Lu. SpaceTwist: Managing the Trade-Offs Among Location Privacy, Query Performance, and Query Accuracy in Mobile Services. Proc. ICDE, April M. L. Yiu, C. S. Jensen, J. Møller, and H. Lu. Design and Analysis of an Incremental Approach to Location Privacy for Location-Based Services. In submission. M. L. Yiu, G. Ghinita, C. S. Jensen, and P. Kalnis. Outsourcing Search Services on Private Spatial Data. In Proc. ICDE, 2009, to appear. M. L. Yiu, G. Ghinita, C. S. Jensen, and P. Kalnis. Enabling Search Services on Outsourced Private Spatial Data. In submission. Stanford InfoSeminar, March 6, © Christian S. Jensen. All rights reserved.