Quantifying Location Privacy Reza Shokri, George Theodorakopoulos, Jean-Yves Le Boudec, and Jean-Pierre Hubaux Presented By: Solomon Njorombe
Abstract Security issues in progressed personal communication Many Location-Privacy Protection Mechanisms (LPPMs) proposed No systematic quantification, and incomplete assumptions Framework for LPPMs analysis Information and attacks available to adversary Formalize attack performance Adversary inference attacks(accuracy, certainty, correctness) Implement Location Privacy meter Assess popular metrics(Entropy and k-anonymity) Low correlation to adversary’s success
Introduction
Introduction Smartphones with location sensors: GPS/Triangulation Convenience, but leaves traces of your where about Infer on habits, interests, relationships, secrets Increased computing power. Data mining algorithms, parallel db analysis Threat to privacy Users have the right to control the information shared Minimal information or only with trusted entities
Introduction: Motivation Aim: Progress the quantification of performance of LPPM Why? Humans, bad estimators of risks A meaningful way to compare LPPMs Literature, not matured enough in this Humans are bad estimators of risks Only meaningful comparison between different LPPMs Literature is not matured enough on the topic Lack unified generic formal framework. Hence divergent contribution and confusion. Which is more effective LPPM
Introduction: Contributions Generic model to formalize adversarial attacks Define tracking and localization on anonymous traces as statistical inference problem Statistical methods to evaluate performance of such inference attack Expected estimation error as right metric Location Privacy Meter Inappropriateness of existing metrics
Framework
Framework Location privacy is a tuple <𝒰, 𝒜, 𝐿𝑃𝑃𝑀, 𝒪, 𝐴𝐷𝑉, 𝑀𝐸𝑇𝑅𝐼𝐶> 𝓤: Set of mobile users 𝓐: Actual traces of user LPPM: Location-Privacy Preserving Mechanism Acts on 𝑎 ℰ 𝒜 and produces 𝑜 ℰ 𝒪 𝓞: Traces observed by adversary ADV: Adversary Try to infer a having observed o , relying on LPPM knowledge & users’ mobility model METRIC: metric for performance and success of ADV. Implies users’ location privacy 𝓤 𝓐 𝑳𝑷𝑷𝑴 𝓞 𝑨𝑫𝑽 𝑴𝑬𝑻𝑹𝑰𝑪
Framework: Mobile Users 𝒰={ 𝑢 1 , 𝑢 2 ,…, 𝑢 𝑁 } set on N mobile users within area portioned into M regions ℛ={ 𝑟 1 , 𝑟 2 ,…, 𝑟 𝑀 } 𝒯={1,…,𝑇}: Set of time instants when users can be observed. It is discrete. Spatiotemporal position of users modeled through events and traces Event: where 𝑢∈𝒰, 𝑟∈ℛ, 𝑡∈𝒯 Trace for user u: T-size vector for events 𝑎 𝑢 =( 𝑎 𝑢 1 , 𝑎 𝑢 2 ,…, 𝑎 𝑢 (𝑇)) tuple <𝑢,𝑟,𝑡> -> Tuple <𝒖,𝒓,𝒕> au(T) = < 𝑢 𝑢 , 𝑟 𝑥 , 𝑡 𝑇 > au(1)= < 𝑢 𝑢 , 𝑟 𝑗 , 𝑡 1 > au(2) 𝓤 𝓐 𝑳𝑷𝑷𝑴 𝓞 𝑨𝑫𝑽 𝑴𝑬𝑻𝑹𝑰𝑪
Framework: Mobile Users 𝒜 𝑢 : Set of all traces that may belong to user u Actual trace of u: Only true trace of u for the period t=1…T Actual (au(1), au(2), … au(T)) Actual events: Events of the actual trace of user u < 𝑢 𝑢 , 𝑟 𝑥 , 𝑡 1 >, < 𝑢 𝑢 , 𝑟 𝑒 , 𝑡 2 >, … < 𝑢 𝑢 , 𝑟 𝑖 , 𝑡 𝑇 > 𝒜= 𝒜 𝑢1 × 𝒜 𝑢2 ×…× 𝒜 𝑢𝑁 : Set of all possible traces for all users Could Au be more that one trace? Possibly 𝓤 𝓐 𝑳𝑷𝑷𝑴 𝓞 𝑨𝑫𝑽 𝑴𝑬𝑻𝑹𝑰𝑪
Framework: Location-Privacy Preserving Mechanisms (LPPM) LPPM: Mechanism of modifying and distorting actual traces before exposure Different implementations Offline (e.g. from DB) vs Online (On the fly) Centralized(central anonymity server) vs Distributes(Users’ phones) Receives N actual traces and modify them in 2 steps Obfuscation: Location event replaced with location pseudonyms 𝓡 ′ ={ 𝒓′ 𝟏 , 𝒓′ 𝟐 ,…, 𝒓′ 𝑴′ } Anonymization: User part of each trace replaced with user pseudonym 𝓤 ′ ={ 𝒖′ 𝟏 ,… 𝒖′ 𝑵′ } A region may be obfuscated to different pseudonym each time it’s encountered, user always obfuscated to same pseudonym Information used for obfuscation vary with LPPM architecture and type 𝓤 𝓐 𝑳𝑷𝑷𝑴 𝓞 𝑨𝑫𝑽 𝑴𝑬𝑻𝑹𝑰𝑪
Framework: Location-Privacy Preserving Mechanisms (LPPM) Obfuscated event: <u, r’, t> where 𝑢∈𝒰, 𝑟 ′ ∈ ℛ ′ , 𝑡∈𝒯 Obfuscated trace: 𝑜 𝑢 =( 𝑜 𝑢 1 , 𝑜 𝑢 2 ,…, 𝑜 𝑢 (𝑇)) 𝓞 𝒖 : Set of all possible obfuscated traces of user u 𝓤 𝓐 𝑳𝑷𝑷𝑴 𝓞 𝑨𝑫𝑽 𝑴𝑬𝑻𝑹𝑰𝑪
Framework: Location-Privacy Preserving Mechanisms (LPPM) Obfuscation mechanism: function that maps a trace 𝑎 𝑢 ∈ 𝒜 𝑢 into a random variable 𝑶 𝒖 taking values from set 𝓞 𝒖 Probability density function 𝑓 𝑎 𝑢 𝑂 𝑈 =Pr{ 𝑂 𝑢 = 𝑜 𝑢 | 𝐴 𝑢 = 𝑎 𝑢 } Methods by LPPMs to reduce accuracy and/or precision of the events’ spatiotemporal information Perturbation Adding dummy regions Reducing precision(merge regions) Location hiding 𝓤 𝓐 𝑳𝑷𝑷𝑴 𝓞 𝑨𝑫𝑽 𝑴𝑬𝑻𝑹𝑰𝑪
Framework: Location-Privacy Preserving Mechanisms (LPPM) Anonymization mechanism: Function Σ randomly chosen from functions mapping 𝒰 to 𝒰′ Drawn according to probability function 𝑔 𝜎 =Pr(Σ=𝜎) We consider random permutation over possible N! 𝓤 𝓐 𝑳𝑷𝑷𝑴 𝓞 𝑨𝑫𝑽 𝑴𝑬𝑻𝑹𝑰𝑪
Framework: Location-Privacy Preserving Mechanisms (LPPM) 𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛 𝑝𝑟𝑖𝑣𝑎𝑐𝑦=(𝑓,𝑔) 𝑓 𝑎 𝑢 1 ,…, 𝑎 𝑢 𝑁 ⟼{ 𝑜 𝑢 1 ,…, 𝑜 𝑢 𝑁 } { 𝑜 𝑢 1 ,…, 𝑜 𝑢 𝑁 } an instantiation of random variables { 𝑂 𝑢 1 ,…, 𝑂 𝑢 𝑁 } 𝑔 𝑜 𝑢 1 ,…, 𝑜 𝑢 𝑁 ⟼ 𝑜 𝜎 𝑢 1 ,…, 𝑜 𝜎 𝑢 𝑁 Set of actual traces Set of obfuscated traces Set of obfuscated traces Set of anonymized traces 𝓤 𝓐 𝑳𝑷𝑷𝑴 𝓞 𝑨𝑫𝑽 𝑴𝑬𝑻𝑹𝑰𝑪
Framework: Location-Privacy Preserving Mechanisms (LPPM) Summarize LPPM with the probability distribution that gives the probability of mapping 𝑎∈ 𝒜 into 𝑜∈𝒪= 𝒪 1 × 𝒪 2 ×… 𝒪 𝑁 𝐿𝑃𝑃𝑀 𝑎 (𝑜)=Pr{ ∩ 𝑖=1 𝑁 𝑂 Σ( 𝑢 1 ) = 𝑜 𝜎( 𝑢 𝑖 ) | ∩ 𝑖=1 𝑁 𝐴 𝑢 𝑖 = 𝑎 𝑢 𝑖 } Adversary’s aim is to reconstruct a when given o 𝓞 𝝈(𝒖) : Set of all observable traces of user u 𝓤 𝓐 𝑳𝑷𝑷𝑴 𝓞 𝑨𝑫𝑽 𝑴𝑬𝑻𝑹𝑰𝑪
Framework : Adversary Knows anonymization and obfuscation probability distribution functions f and 𝑔 Has access to training traces + users’ public information Based on this information, construct mobility profile Pu for each user Given LPPM(ie. f &𝑔), users’ profiles {(u, Pu)}, observed traces {o1, o2,…, oN} attacker runs inference attack formulating objectives as 𝒰−ℛ−𝒯 (subset of Users, Regions & Time) One of most important 𝓤 𝓐 𝑳𝑷𝑷𝑴 𝓞 𝑨𝑫𝑽 𝑴𝑬𝑻𝑹𝑰𝑪
Framework : Adversary Presence/Absence disclosure attacks Infer user, regions relationship over time Tracking attacks: ADV trying to find full/partial sequence os a user’s track Localization attacks: ADV target a single event in a user’s trace Meeting Disclosure attack ADV interested in proximity btw 2 users. (meeting in a given time) Paper’s algorithm implement general attack General attack: Try to recover traces for all users Possible types of attack 𝓤 𝓐 𝑳𝑷𝑷𝑴 𝓞 𝑨𝑫𝑽 𝑴𝑬𝑻𝑹𝑰𝑪
Framework : Evaluation Traces are probabilistically generated Actual traces – probabilistic over user mobility profile Observed traces – probabilistic over LPPM Attack output can be Probability distribution of possible outcomes Most probable outcome Expected outcome under distribution of possible outcomes Any function of the actual trace 𝓤 𝓐 𝑳𝑷𝑷𝑴 𝓞 𝑨𝑫𝑽 𝑴𝑬𝑻𝑹𝑰𝑪
Framework : Evaluation ∅ . : Function for the attacker’s objective If its argument is a then ∅ 𝒂 is correct answer to the attack 𝒳: Set of values ∅ . can take for a given attack ( M regions, N users, MT traces of one user) But attacker cannot obtain exact ∅ 𝒂 , the task is highly probabilistic. Best hope: extract all information about it from observed traces 𝓤 𝓐 𝑳𝑷𝑷𝑴 𝓞 𝑨𝑫𝑽 𝑴𝑬𝑻𝑹𝑰𝑪
Framework : Evaluation Extracted information is in the form Pr(x|o), 𝑥∈𝒳 x is from all possible value ∅ . derivable from observed o Uncertainty: Ambiguity of Pr(x|o) in respect to finding a unique answer (Max under uniform distribution) Inaccuracy: Difference between Pr(x|o) and 𝑃𝑟 (𝑥|𝑜) 𝑃𝑟 (𝑥|𝑜): estimate as ADV doesn’t have infinite resource But uncertainty and Inaccuracy don’t quantify user’s privacy, correctness does 𝓤 𝓐 𝑳𝑷𝑷𝑴 𝓞 𝑨𝑫𝑽 𝑴𝑬𝑻𝑹𝑰𝑪
Framework : Evaluation Correctness: Distance between result of the attack and the real answer. Accuracy Certainty Correctness Only correctness really matters Accuracy and certainty may not be equivalent to correctness Consider situation with insufficient traces 𝓤 𝓐 𝑳𝑷𝑷𝑴 𝓞 𝑨𝑫𝑽 𝑴𝑬𝑻𝑹𝑰𝑪
Framework : Evaluation Accuracy: Quantified with confidence interval and level Certainty: Quantified through entropy. Concentrated vs uniform. Higher entropy -> lower certainty 𝐻 𝑥 = 𝑥 𝑃𝑟 (𝑥|𝑜) 𝑙𝑜𝑔 1 𝑃𝑟 (𝑥|𝑜). Confidence level = 1 X=xc Prohibitively costly 𝑃𝑟 (𝑥|𝑜) for some x. It is within some confidence interval Entropy : Chaos, Randomness 𝓤 𝓐 𝑳𝑷𝑷𝑴 𝓞 𝑨𝑫𝑽 𝑴𝑬𝑻𝑹𝑰𝑪 Concentrated / Biased
Framework : Evaluation Correctness: Quantified as expected distance between true xc and 𝑃𝑟 (𝑥|𝑜). If there is a distance ||.|| between members of X. expected estimation error is 𝑥 𝑃𝑟 𝑥 𝑜 ||𝑥− 𝑥 𝑐 || If the distance was =0 iff x=xc and 1 otherwise incorrectness would be: 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑒𝑟𝑟𝑜𝑟=1− 𝑃𝑟 ( 𝑥 𝑐 |𝑜) Concentrated: How easy to pin point a single outcome x out of X 𝓤 𝓐 𝑳𝑷𝑷𝑴 𝓞 𝑨𝑫𝑽 𝑴𝑬𝑻𝑹𝑰𝑪
Framework : Evaluation So correctness is the metric that determines user privacy Adversary doesn’t know xc, and cannot observe this parameter. However Accuracy, Certainty and correctness are very independent. 𝓤 𝓐 𝑳𝑷𝑷𝑴 𝓞 𝑨𝑫𝑽 𝑴𝑬𝑻𝑹𝑰𝑪
Location Privacy Meter
Location-Privacy Preserving Machanisms Implemented 2 obfuscation mechanisms Precision Reducing(merging regions) Drop low order bits of or region identifier Eg µx and µy dropped bits of x and y coordinates Location hiding Events are independently eliminated. Replace location with Ø with probability λh : location hiding level To import LPPM into tool, Specify probability function by importing Anonymization function Obfuscation function
Knowledge of the Adversary
Knowledge of the Adversary Adversary collects information about user mobility Can translate to event, transition, full/partial traces This can be encoded as: Traces or Matrix of Transition Count TC TC is an M x M matrix with ij number of i to j transitions user created and not encoded in the traces Adversary also considers user mobility constraint
Knowledge of the Adversary ADV tries to model user mobility using Markov Chain Such that Pu : user’s transition matrix for their Markov chain 𝑃 𝑖𝑗 𝑢 : probability that user will move from rj to ri in next time slot Objective: construct Pu starting from prior mobility information. With bigger goal of: estimating the underlying Markov Chain Fill the Training Trace TT towards ET(Estimated Trace) Utilize convergence in Gibbs sampling Markov Chain : a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Origin Gibbs Sampling: Gibbs sampling is one MCMC. technique suitable for the task. The idea in Gibbs sampling is to generate posterior samples. by sweeping through each variable (or block of variables) to sample from its conditional. distribution with the remaining variables fixed to their current values.
Tracking Attack ADV tries to reconstruct partial/complete actual traces Maximum Likelihood Tracking Attack Objective: Find jointly most likely traces for all users, given the observed traces That is done within a space of N!MT elements, brute force approach is not practical
Tracking Attack : Maximum Likelihood Tracking Attack Proceed through two steps: Deanonymization Cannot assign most probable traces, multiple users may get same traces Perform the likelihood for all trace-user pairs Create an edge weighted bipartite graph The edge weight is the user-trace likelihood Find maximum weighted Assignment use Hungarian algorithm De-obfuscation The Hungarian method is a combinatorial optimization algorithm that solves the assignment problem in polynomial time and which anticipated later primal-dual methods. Set of users Set of traces
Tracking Attack : Maximum Likelihood Tracking Attack De-obfuscation Use Viterbi algorithm. Tries to maximize the joint probability of the most likely traces. Recursively compute the values at time T(max probability) But interest is on the trace itself Almost similar to finding the shortest path in a edge-weighted directed graph. Vertices as set of R x T The Viterbi algorithm is a dynamic programming algorithm for finding the most likely sequence of hidden states – called the Viterbi path – that results in a sequence of observed events, especially in the context of Markov information sources and hidden Markov models.
Tracking Attack : Distribution Tracking Attack Computes the distribution of traces for each user rather than the most likely trace Use Metropolis Hasting algorithm Try to draw sample from 𝒜𝑥 Σ that are identically distributed to as per the desired distribution. MH tries to perform a random walk over possible values of (𝑎, 𝜎) Can answer wide range of U-R-T questions but very computationally intensive. the Metropolis–Hastings algorithm is a Markov chain Monte Carlo (MCMC) method for obtaining a sequence of random samples from a probability distribution for which direct sampling is difficult.
Localization Attack Find the location of user u at time t Output: distribution of possible region, from which they select the most probable Attacker needs estimate of observed trace(Max weighted assignment) Can be computed using Forward-Backward algorithm
Meeting Disclosure Attack Objective 1: specify a pair of users (u and v), a region r and time t Computed as a product of the distribution for both events These established through localization attacks Another objective: Just a pair of users. How often they would have met, and the region Answered using localization attack Objective 3: Location and time, expecting number of present users Through localization attacks again
Using The Tool: Evaluation of LPPMs
Using The Tool: Evaluation of LPPMs Goals: Use Location Privacy Meter to quantify effectiveness of LPPMs Evaluate effectiveness of entropy and k-anonymity to quantify location privacy Location samples: N=20, 5 min intervals for 8 hrs(T=96), Bay area M=40(5 by 8 grid) Privacy mechanism: Precision reducing Anonymized using random permutation(unique pseudonyms 1-N)
Using The Tool: Evaluation of LPPMs To consider strongest adversary: Feed Knowledge constructor(KC) with actual traces of user U-R-T attack scenario LO-ATT(Localization Attack): User u at time t, what is his location at time t? MD-ATT(Meeting Disclosure Attack): How many instances in T are two people in the same region AP-ATT(Aggregate Presence Attack): for a region r and time t, what is the expected time number of users present at t Metric: Adversary incorrectness
Using The Tool: Evaluation of LPPMs LPLO-ATT(u,t) for all users u and time t LPPM(µx, µy, λh) Incorrectness of the # of users
Using The Tool: Evaluation of LPPMs LPMD-ATT(u, v) for all pairs of users u, v LPPM(µx, µy, λh) Incorrectness of # of meetings
Using The Tool: Evaluation of LPPMs LPAP-ATT(r, t) for all regions r and time t LPPM(µx, µy, λh) Incorrectness of number of users in a region
Using The Tool: Evaluation of LPPMs X-axis: Users privacy Y-axis: Normalized entropy *** : LPPM(2, 3, 0.9) strong mechanism … . : LPPM(1, 2, 0.5) medium ooo : LPPM(1, 0, 0.0) Weak
Using The Tool: Evaluation of LPPMs X-axis: Users privacy Y-axis: Normalized k- anonymity *** : LPPM(2, 3, 0.9) strong mechanism … . : LPPM(1, 2, 0.5) medium ooo : LPPM(1, 0, 0.0) Weak
Conclusion
Conclusion A unified formal framework to describe and evaluate a variety of location-privacy preserving mechanisms with respect to various inference attacks LPPM evaluation is modelled as an estimation problem and the Expected Estimation Error metric is provided Designed Location-Privacy Meter tool to evaluate and compare the location-privacy preserving mechanisms
Questions
Framework 𝓤 : Set of mobile users 𝓡 : Set of regions that partition the whole area 𝓣 : Time period under consideration 𝓐 : Set of all possible traces 𝓞 : Set of all observable traces 𝓤 ′ : Set of user pseudonyms 𝓡 ′ : Set of location pseudonyms 𝑵 : Number of users 𝑴 : Number of regions 𝑻 : Number of considered time instants 𝑵 ′ : Number of user pseudonyms 𝑴 ′ : Number of location pseudonyms 𝒇 : Obfuscation function 𝒈 : Anonymization function 𝒂 𝒖 : Actual trace of user u 𝒐 𝒖 : Obfuscated trace of user u 𝒐 𝒊 : Observed trace of user with pseudonym i 𝓐 𝒖 : Set of all possible(actual) traces of user u 𝓞 𝒖 : Set of all possible obfuscated traces of user u 𝓞 𝝈(𝒖) : Set of all observable traces of user u 𝑷 𝒖 : Profile of user u ∅ . : Attacker’s objective 𝓧 : Set of values that can take