Download presentation
Presentation is loading. Please wait.
Published byAmy O’Brien’ Modified over 9 years ago
1
Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System Rui Chen, Concordia University Benjamin C. M. Fung, Concordia University Bipin C. Desai, Concordia University Nériah M. Sossou, Société de transport de Montréal
2
2 Outline Introduction Related Work Preliminaries Sanitization Algorithm Experimental Results Conclusion 2
3
3 The STM Story 3 The Société de transport de Montréal (STM) is the public transit agency in Montreal area. The smart card automated fare collection system generates and collects huge volume of transit data every day. Transit data needs to be shared for many reasons.
4
4 Transit Data Transit data, a kind of sequential data, consists of sequences of time-ordered locations. 4 A station in the STM network
5
5 Privacy Threats 5 Alice visited L4 and then L1
6
6 Privacy Threats 6 Alice also visited L2 …
7
7 Differential Privacy [1] 7 Pr M [M(D) = D*] ≤ exp(ε) × Pr M [M(D’) = D*]
8
8 Technical Challenges 8 Suppose there are 1,000 stations in the STM network Suppose the maximum number of stations visited by a passenger is 20 Traditional differentially private mechanisms are data- independent: Computationally infeasible sequences to consider!
9
9 Contributions 9 The first practical solution for publishing real-life sequential data under differential privacy A study of the real-life transit data sharing scenario at the STM The use of a hybrid-granularity prefix tree for data- dependent publication and an efficient implementation based on a statistical process Enforcement of two sets of consistency constraints Seamless extension to trajectory data
10
10 Outline Introduction Related Work Preliminaries Sanitization Algorithm Experimental Results Conclusion 10
11
11 Related Work 11 Abul et al. [2] achieves (k, δ)-anonymity by space translation. Terrovitis and Mamoulis [3] limits an adversary’s confidence of inferring the presence of a location by global suppression. Yarovoy et al. [4] k-anonymize a moving object database (MOD) by considering timestamps as the quasi-identifiers. Chen et al. [5] achieves the (K, C) L -privacy model by local suppression. Is it possible to employ a much stronger privacy model while achieving desirable utility?
12
12 Outline Introduction Related Work Preliminaries Sanitization Algorithm Experimental Results Conclusion 12
13
13 Laplace Mechanism [1] 13 ε: privacy parameter (privacy budget) Δf: global sensitivity (e.g., the maximum change of f due to the change of a single record). ε/(2 Δf)
14
14 Composition Properties 14 Sequential composition ∑ i ε i –differential privacy Parallel composition max(ε i )–differential privacy
15
15 Prefix Tree 15 A simple but effective way to explore the entire output domain
16
16 Utility Requirements 16 Count query: E.g., how many passengers have visited both Guy-Concordia and McGill stations? Frequent sequential pattern mining: E.g., what are the most popular sequences of stations being visited?
17
17 Outline Introduction Related Work Preliminaries Sanitization Algorithm Experimental Results Conclusion 17
18
18 Sanitization Algorithm 18 Complexity:
19
19 Noisy Prefix Tree 19 Each level consists of two sub-levels with different location granularities Each level receives ε/h privacy budget
20
20 Efficient Implementation Separately handle empty and non-empty nodes
21
21 Hybrid-Granularity Or For an empty node on level i, we reduce noise by a factor of
22
22 For any root-to-leaf path p, where v i is a child of v i+1. For each node v, Consistency Constraints
23
23 Consistency Enforcement Constraint Type Ⅰ [6] Constraint Type Ⅱ
24
24 Outline Introduction Related Work Preliminaries Sanitization Algorithm Experimental Results Conclusion 24
25
25 STM Datasets 25 Real-life STM datasets are used for evaluation: Datasets|D||D| |L| max|S|avg|S| Metro847,66868904.21 Bus778,7249441215.67
26
26 Average Relative Error vs. ε 26
27
27 Average Relative Error vs. ε 27
28
28 Average Relative Error vs. h 28
29
29 Average Relative Error vs. h 29
30
30 Utility vs. k 30 k TP (M/B) Simple FP (FD) (M/B) Simple TP (M/B) Hybrid FP (FD) (M/B) Hybrid 10099/971/3100/1000/0 150143/1397/11149/1441/6 200178/16822/32185/17715/23 250209/19541/55220/20930/41 300241/21259/88257/23343/67
31
31 Utility vs. ε 31 ε TP (M/B) Simple FP (FD) (M/B) Simple TP (M/B) Hybrid FP (FD) (M/B) Hybrid 0.5227/19473/106244/21556/85 0.75239/20661/94253/22447/76 1.0241/21259/88257/23343/67 1.25243/21657/84259/23841/62 1.5248/22452/76261/24239/58
32
32 Utility vs. h 32 h TP (M/B) Simple FP (FD) (M/B) Simple TP (M/B) Hybrid FP (FD) (M/B) Hybrid 6234/21266/88241/22159/79 8240/21760/83254/23246/68 10241/21558/85255/23645/64 12241/21259/88257/23343/67 14241/21259/88258/23342/67 16240/21060/90258/23142/69 18240/20960/91255/23045/70 20238/20662/94254/22846/72
33
33 Scalability 33
34
34 Outline Introduction Related Work Preliminaries Sanitization Algorithm Experimental Results Conclusion 34
35
35 Conclusion 35 It is possible to publish useful transit data (sequential data) under differential privacy. Generally, a data-dependent solution outperforms a data- independent solution. It is worth exploring the utility of released data for more complex data analysis tasks. It is important to educate transport service practitioners.
36
36 References 36 C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, 2006. O. Abul, F. Bonchi, and M. Nanni. Never walk alone: Uncertainty for anonymity in moving objects databases. In ICDE, 2008. M. Terrovitis and N. Mamoulis. Privacy preservation in the publication of trajectories. In MDM, 2008. R. Yarovoy, F. Bonchi, L. V. S. Lakshmanan, and W. H. Wang. Anonymizing moving objects: How to hide a MOB in a crowd? In EDBT, 2009. R. Chen, B. C. M. Fung, N. Mohammed, and B. C. Desai. Privacy- preserving trajectory data publishing by local suppression. Information Sciences, in press. M. Hay, V. Rastogi, G. Miklau, and D. Suciu. Boosting the accuracy of differentially private histograms through consistency. PVLDB, 2010.
37
37 Q&A Thank You Very Much 37
38
38 Back-up Slides 38
39
39 Detailed Algorithm
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.