Download presentation
Presentation is loading. Please wait.
Published byJerome Goodman Modified over 9 years ago
1
Preserving Privacy in GPS Traces via Uncertainty- Aware Path Cloaking Baik Hoh, Marco Gruteser, Hui Xiong, Ansaf Alrabady Presented by Joseph T. Meyerowitz
2
Location Based Services ► Location Based Services (LBSs) are services that use, in some way, the user's location ► Example: GPS in your car, Microblog, etc ► Growing field
3
Privacy Issues ► Giving your location to another party creates privacy concerns ► Two kinds of privacy involved; location privacy and query privacy ► Example: You need to visit the hospital and don't want anyone to know that you are at the hospital. You ask an LBS for directions. ► Example: You are at home and want to ask where the nearest hospital is.
4
Hospital Example ► Pseudonyms insufficient because of temporal and spatial correlations in your GPS trace ► Identifying locations may be tied to sensitive locations Home Hospital
5
Existing Work ► Location k-anonymity – Queries do not give a coordinate but instead give a region to the LBS that encloses k users ► Path perturbation – Traces are perturbed to increase number of points that can be unambiguously assigned to a single user ► Subsampling – Same as perturbation, data points are removed instead of perturbed
6
CliqueCloak ► Best published k-anonymity algorithm ► Data from vehicles in a 70km x 70km area
7
Overview ► Suggest a different metric, Time To Confusion (TTC) ► Create an algorithm to meet a TTC bound based on empirical data ► Less focus on road coverage metrics
8
Testbed – Traffic Monitoring ► 233 vehicles ► 1 sample per minute while car is moving ► Using data for a hypothetical traffic management system ► Determined that 100m spatial accuracy and 1/minute frequency sufficient to determine what major road a car was on
9
Architecture
10
Empirical Data
11
► A gap of greater than 10 minutes results in the splitting of traces into separate “trips”
12
Empirical Data ► Average trip time of 10 minutes noted; thus tracking for 10 minutes may connect an identifying location with a sensitive location.
13
Privacy Metric and Adversary ► Adversaries can link correlated space/time anonymous coordinates into paths ► This is done with a simple momentum-free extrapolation based on current velocity ► Time to Confusion (TTC) is the time an adversary could correctly follow a trace ► Suggested as a good metric because the link between identifying locations and sensitive locations can be broken with low TTC
14
Privacy Metric and Adversary ► Tracking uncertainty: H = -Σp log(p) ► p is the probability that a location sample belongs to a given user ► Tracking confidence: C = (1 – H) ► p = exp(-d/μ) ► μ is from the empirical PDF of trip times ► d is distance from predicted location* ► In this dataset, μ = 2094 meters ► One must choose a H threshold
15
Proposed Solution ► Maximum time to confusion can be guaranteed if samples are revealed when: Time since last point of confusion is less than the maximum time to confusion ► Point of confusion is a point where (H_i > H_thresh) Tracking uncertainty is above the confusion threshold ► (H_i > H_thresh)
16
Proposed Solution ► Adversary may simply cull points with high H ► Path may still be determinable without a single point ► Empirical CDF of reacquisition Shows what proportion of reacquisitions can occur after a given time gap* Original time gaps are empirical* Remember that each minute is one data point in this system
17
Empirical Reacquisition CDF
18
Extension ► Calculate confusion/uncertainty from past ten minutes ► After Maximum Time to Confusion: Release samples if past 10 minutes contain an aggregate uncertainty value above the threshold ► Before Maximum Time to Confusion: Release samples if past 10 minutes + all samples from last release contain an aggregate uncertainty value above the threshold*
19
Evaluation ► Added traces from the same drivers over different days to get to desired density ► Simulated high-density and low-density systems with n=2000 and n=500 ► Metrics used to measure privacy were maximum time to confusion and median time to confusion ► Metric used to measure data quality was relative weighted road coverage
20
Evaluation ► Black dots are suppressed, gray dots are released
21
Does it work? ► Looking at it without reacquisition ► Comparing to a baseline of random sampling ► Uncertainty threshold set to H = 0.4 ► H = 0.4 means the tracker needs to believe that the next sample has a 0.92 chance of belonging the correct target*
22
Does it work? (n=2000)
24
Does it work? (n=500)
25
Release Quantity
26
Continuing Problems ► No defenses to a-priori knowledge ► Requires a centralized location server ► All users in this system worked at the same site, artificially aiding the algorithm in finding places of high confusion ► Tracker is crude – knowledge of topology may allow for more accurate tracking
27
Takeaway Concepts ► Path entropy can be calculated for intelligent suppression/subsampling of GPS traces ► Tracking can be made more difficult ► Time to Confusion is a useful privacy metric Breaks links between identifying locations and sensitive locations
28
My Critique My Critique ► No guidance for confusion threshold values ► The algorithm will still fail in low-density situations by obscuring too many data points They claim low density areas are irrelevant because they are doing traffic management ► They tested using the empirical data they optimized for – where's the cross-validation? ► Does not protect short trips at all
29
My Conclusion ► Anonymity and privacy are difficult, especially because it is volatile and contextual ► Existing methods cope poorly with low density, but are improving ► Early adoption phases will require better low- density methods ► Hot research topic – ACM workshop on network data anonymization coming up if you're interested
30
Questions?
31
Presenter can be reached at jtm10@duke.edu
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.