Hani AbuSharkh Benjamin C. M. Fung fung (at) ciise.concordia.ca

Slides:

Advertisements

Similar presentations

Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.

Advertisements

Routing and Congestion Problems in General Networks Presented by Jun Zou CAS 744.

Anonymity for Continuous Data Publishing

Publishing Set-Valued Data via Differential Privacy Rui Chen, Concordia University Noman Mohammed, Concordia University Benjamin C. M. Fung, Concordia.

Data Mining (Apriori Algorithm)DCS 802, Spring DCS 802 Data Mining Apriori Algorithm Spring of 2002 Prof. Sung-Hyuk Cha School of Computer Science.

Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.

Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System ` Introduction With the deployment of smart card automated.

Simulatability “The enemy knows the system”, Claude Shannon CompSci Instructor: Ashwin Machanavajjhala 1Lecture 6 : Fall 12.

Center for Secure Information Systems Concordia Institute for Information Systems Engineering k-Jump Strategy for Preserving Privacy in Micro-Data Disclosure.

Computer Security Lab Concordia Institute for Information Systems Engineering Concordia University Montreal, Canada A Novel Approach of Mining Write-Prints.

Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.

Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca.

Secure Distributed Framework for Achieving -Differential Privacy Dima Alhadidi, Noman Mohammed, Benjamin C. M. Fung, and Mourad Debbabi Concordia Institute.

Template-Based Privacy Preservation in Classification Problems IEEE ICDM 2005 Benjamin C. M. Fung Simon Fraser University BC, Canada Ke.

Anonymizing Healthcare Data: A Case Study on the Blood Transfusion Service Benjamin C.M. Fung Concordia University Montreal, QC, Canada

Privacy-Preserving Data Mashup Benjamin C.M. Fung Concordia University Montreal, QC, Canada Noman Mohammed Concordia University.

PRIVACY AND SECURITY ISSUES IN DATA MINING P.h.D. Candidate: Anna Monreale Supervisors Prof. Dino Pedreschi Dott.ssa Fosca Giannotti University of Pisa.

Fast Data Anonymization with Low Information Loss 1 National University of Singapore 2 Hong Kong University

Project topics – Private data management Nov

Cascading Spatio-Temporal Pattern Discovery P. Mohan, S.Shekhar, J. Shine, J. Rogers CSci 8715 Presented by: Atanu Roy Akash Agrawal.

Privacy Preserving Serial Data Publishing By Role Composition Yingyi Bu 1, Ada Wai-Chee Fu 1, Raymond Chi-Wing Wong 2, Lei Chen 2, Jiuyong Li 3 The Chinese.

Anatomy: Simple and Effective Privacy Preservation Israel Chernyak DB Seminar (winter 2009)

2001 Dimitrios Katsaros Panhellenic Conference on Informatics (ΕΠΥ’8) 1 Efficient Maintenance of Semistructured Schema Katsaros Dimitrios Aristotle University.

2/8/00CSE 711 data mining: Apriori Algorithm by S. Cha 1 CSE 711 Seminar on Data Mining: Apriori Algorithm By Sung-Hyuk Cha.

1 A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS* by Gökhan Yavaş Feb 22, 2005 *: To appear in Data and Knowledge Engineering, Elsevier.

Privacy Protection for RFID Data Benjamin C.M. Fung Concordia Institute for Information systems Engineering Concordia university Montreal, QC, Canada

Differentially Private Data Release for Data Mining Benjamin C.M. Fung Concordia University Montreal, QC, Canada Noman Mohammed Concordia University Montreal,

Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014.

Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System Rui Chen, Concordia University Benjamin C. M. Fung,

Temporal Event Map Construction For Event Search Qing Li Department of Computer Science City University of Hong Kong.

Privacy and trust in social network

USpan: An Efficient Algorithm for Mining High Utility Sequential Patterns Authors: Junfu Yin, Zhigang Zheng, Longbing Cao In: Proceedings of the 18th ACM.

VLDB 2012 Mining Frequent Itemsets over Uncertain Databases Yongxin Tong 1, Lei Chen 1, Yurong Cheng 2, Philip S. Yu 3 1 The Hong Kong University of Science.

Data Publishing against Realistic Adversaries Johannes Gerhrke Cornell University Ithaca, NY Michaela Götz Cornell University Ithaca, NY Ashwin Machanavajjhala.

Trajectory Pattern Mining

Differentially Private Data Release for Data Mining Noman Mohammed*, Rui Chen*, Benjamin C. M. Fung*, Philip S. Yu + *Concordia University, Montreal, Canada.

Thwarting Passive Privacy Attacks in Collaborative Filtering Rui Chen Min Xie Laks V.S. Lakshmanan HKBU, Hong Kong UBC, Canada UBC, Canada Introduction.

MINING FREQUENT ITEMSETS IN A STREAM TOON CALDERS, NELE DEXTERS, BART GOETHALS ICDM2007 Date: 5 June 2008 Speaker: Li, Huei-Jyun Advisor: Dr. Koh, Jia-Ling.

Department of Computer Science City University of Hong Kong Department of Computer Science City University of Hong Kong 1 Probabilistic Continuous Update.

SFU Pushing Sensitive Transactions for Itemset Utility (IEEE ICDM 2008) Presenter: Yabo, Xu Authors: Yabo Xu, Benjam C.M. Fung, Ke Wang, Ada. W.C. Fu,

Dave McKenney 1.  Introduction  Algorithms/Approaches  Tiny Aggregation (TAG)  Synopsis Diffusion (SD)  Tributaries and Deltas (TD)  OPAG  Exact.

Data Anonymization – Introduction and k-anonymity Li Xiong CS573 Data Privacy and Security.

SECURED OUTSOURCING OF FREQUENT ITEMSET MINING Hana Chih-Hua Tai Dept. of CSIE, National Taipei University.

Security Control Methods for Statistical Database Li Xiong CS573 Data Privacy and Security.

Hybrid l-Diversity* Mehmet Ercan NergizMuhammed Zahit GökUfuk Özkanlı

Privacy Preserving Data Mining Benjamin Fung bfung(at)cs.sfu.ca.

Privacy vs. Utility Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.

Wei-Shinn Ku Slide 1 Auburn University Computer Science and Software Engineering Query Integrity Assurance of Location-based Services Accessing Outsourced.

MaskIt: Privately Releasing User Context Streams for Personalized Mobile Applications SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference.

1/3/ A Framework for Privacy- Preserving Cluster Analysis IEEE ISI 2008 Benjamin C. M. Fung Concordia University Canada Lingyu.

Anonymizing Data with Quasi-Sensitive Attribute Values Pu Shi 1, Li Xiong 1, Benjamin C. M. Fung 2 1 Departmen of Mathematics and Computer Science, Emory.

Probabilistic km-anonymity (Efficient Anonymization of Large Set-valued Datasets) Gergely Acs (INRIA) Jagdish Achara (INRIA)

An Energy-Efficient Approach for Real-Time Tracking of Moving Objects in Multi-Level Sensor Networks Vincent S. Tseng, Eric H. C. Lu, & Kawuu W. Lin Institute.

A Scalable Association Rules Mining Algorithm Based on Sorting, Indexing and Trimming Chuang-Kai Chiou, Judy C. R Tseng Proceedings of the Sixth International.

1 Discovering Calendar-based Temporal Association Rules SHOU Yu Tao May. 21 st, 2003 TIME 01, 8th International Symposium on Temporal Representation and.

1 Top Down FP-Growth for Association Rule Mining By Ke Wang.

1 Maintaining Data Privacy in Association Rule Mining Speaker: Minghua ZHANG Oct. 11, 2002 Authors: Shariq J. Rizvi Jayant R. Haritsa VLDB 2002.

Deriving Private Information from Association Rule Mining Results Zutao Zhu, Guan Wang, and Wenliang Du ICDE /3/181.

ACHIEVING k-ANONYMITY PRIVACY PROTECTION USING GENERALIZATION AND SUPPRESSION International Journal on Uncertainty, Fuzziness and Knowledge-based Systems,

Xiaokui Xiao and Yufei Tao Chinese University of Hong Kong

Saliency-guided Video Classification via Adaptively weighted learning

Sequential Pattern Mining Using A Bitmap Representation

Lin Lu, Margaret Dunham, and Yu Meng

Privacy Preserving Data Publishing

CARPENTER Find Closed Patterns in Long Biological Datasets

Action Association Rules Mining

Association Rule Mining

Presented by : SaiVenkatanikhil Nimmagadda

Walking in the Crowd: Anonymizing Trajectory Data for Pattern Analysis

Presentation transcript:

Service-Oriented Architecture for Sharing Private Spatial-Temporal Data Hani AbuSharkh Benjamin C. M. Fung fung (at) ciise.concordia.ca IEEE CSC 2011 Concordia Institute for Information Systems Engineering Concordia University Montreal, Canada IEEE CSC 2011 The research is supported in part by the Discovery Grants (356065-2008) from Natural Sciences and Engineering Research Council of Canada (NSERC).

Agenda Motivating Scenario Problem Description Service Oriented Architecture Anonymization Algorithm Empirical Study Related Works Summary and Conclusion 1

Motivating Scenario Passengers use personal rechargeable smart/RFID card for their travel. Transit companies want to share passengers’ trajectory information to third party for analysis. The data may contain person-specific sensitive information, such as age, disability status, and employment status. How can the transit company safeguard data privacy while keeping the released spatial-temporal data useful? Source: http://www.stl.laval.qc.ca/

The Two Problems How can a data miner identify an appropriate service provider(s)? How can the service providers share their private data without compromising the privacy of its clients and the information utility for data mining? 3

Service-Oriented Architecture Fetch DB schema Authenticate data miner Identify contributing data providers Initialize session Negotiate requirements Anonymize data Share data 7

Spatial-Temporal Data Table Path Raw Data Spatial-Temporal Data Table <EPC#; loc; time> <EPC1; a; t1> <EPC2; b; t1> <EPC3; c; t2> <EPC2; d; t2> <EPC1; e; t2> <EPC3; e; t4> <EPC1; c; t3> <EPC2; f; t3> <EPC1; g; t4> Path EPC1 EPC2 EPC3 < a1 e2  c3  g4 > < b1  d2  f3 > < c2  e4 > Person-Specific Data [EPC1, Full-time] [EPC2, Part-time] [EPC3, On-welfare] 7

<(loc1t1) … (locntn)> : s1,…,sp Spatial-Temporal Data Table <(loc1t1) … (locntn)> : s1,…,sp where (lociti) is a doublet indicating the location and time, <(loc1t1) … (locntn)> is a path, and s1,…,sp are sensitive values. 12

Privacy Threats: Record Linkage Assumption: an adversary knows at most L doublets about a target victim. L represents the power of the adversary. q = <d2f6>, G(q) = {EPC#1,4,5} q = <e4c7>, G(q) = {EPC#1} A table T satisfies LK-anonymity if and only if |G(q)| ≥ K for any subsequence q with |q| ≤ L of any path in T, where G(q) is the set of records containing q and K is an anonymity threshold. 22

Privacy Threats: Attribute Linkage q = <d2f6>, G(q) = {EPC#1,4,5} Let S be a set of data holder-specified sensitive values. A table T satisfies LC-dilution if and only if Conf(s|G(q)) ≤ C for any s ∈ S and for any subsequence q with |q| ≤ L of any path in T, where Conf(s|G(q)) is the percentage of the records in G(q) containing s and C ≤ 1 is a confidence threshold. 24

LKC-Privacy Model A spatial-temporal data table T satisfies LKC-privacy if T satisfies both LK-anonymity and LC-dilution Privacy guarantee: LKC-privacy bounds probability of a successful record linkage is ≤ 1/K and probability of a successful attribute linkage is ≤ C given the adversary’s background knowledge is ≤ L. 25

Information Utility q = <d2c7>, G(q) = {EPC#1,4,5,7} A sequence q is a frequent sequence if |G(q)| ≥ K′, where G(q) is the set of records in T containing q and K′ is a minimum support threshold. 10

Spatial-Temporal Anonymizer ST-Anonymizer 1: Supp = ∅; 2: while |V(T)| > 0 do 3: Select a doublet d with the maximum Score(d); 4: Supp  d; 5: Update Score(d′) if any sequence in V (T) or F(T) containing both d and d′; 6: end while 7: return Table T after suppressing doublets in Supp; We suppress a doublet d of the highest score: A naïve approach: first enumerate all violating sequences then remove them  Not efficient 12

Border Representation Violating Sequence (VS) border: UB contains minimal violating sequences. LB contains maximal sequences y with support |T(y)| ≥ 1. Frequent Sequence (FS) border: UB contains doublets d with support |T(d)| ≥ max(K, K’). LB contains maximal sequences y with support |T(y)| ≥ K’ where K is the anonymity threshold, and K’ is the minimum support threshold 14

Minimal Violating Sequence A sequence q with length ≤ L is a violating sequence with respect to a LKC-privacy requirement if |G(q)| < K or Conf(s|G(q)) > C. A violating sequence q is a minimal violating sequence if every proper subsequence of q is not a violating sequence. 27

<e4  c7> is a minimal violating sequence because Suppose L = 2 and K = 2. <e4  c7> is a minimal violating sequence because <e4> is not a violation and <c7> is not a violation. 28

<d2  e4  c7> is a violating sequence but not minimal because Suppose L = 2 and K = 2. <d2  e4  c7> is a violating sequence but not minimal because <e4  c7> is a violating sequence. 28

Intuition Generate minimal violating sequences of size i+1 by incrementally extending non- violating sequences of size i with an additional doublet. [Mohammed et al. (2009)] 30

Counting Function Consider a single edge ⟨x, y⟩ in a border. The equation below returns the number of sequences with maximum length L that are covered by ⟨x, y⟩ and are super sequences of a given sequence q. Where: 15

Counting Function - Example x y 16

Suppressing Sequences Select doublet d to be suppressed, with maximum score Get affected edges Compute number of affected sequences details next Update score Update the borders by removing the violating sequences 17

Suppressing Sequences 18

Empirical Study – Dataset Evaluate the performance of our proposed method: Utility loss: (F(T) – F(T’)) / F(T), where |F(T)| and |F(T’)| are the numbers of frequent sequences before and after the anonymization. Scalability of anonymization. Dataset: Metro100K dataset consists of travel routes of 100,000 passengers in the Montreal subway transit system with 65 stations. Each record in the dataset corresponds to the route of one passenger 19

Empirical Results – Utility Loss 20

Empirical Results – Utility Loss 20

Empirical Results – Utility Loss 21

Related Works Anonymizing relational data Sweeney (2002): k-anonymity Wang et al. (2005): Confidence bounding Machanavajjhala et al. (2007): l-diversity Wong et al. (2009): (α, k)-anonymity Noman et al. (2009): LKC-privacy 1

Related Works Anonymizing trajectory data Abul et al. (2008) proposed (k,δ)-anonymity based on space translation. Pensa et al. (2008) proposed a variant of k-anonymity model for sequential data, with the goal of preserving frequent sequential patterns. Terrovitis and Mamoulis (2008) further assumed that different adversaries may possess different background knowledge and that the data holder has to be aware of all such adversarial knowledge. 1

Related Works Fung et al. (in press) proposed an SOA for achieving LKC-privacy for relational data mashup. (IEEE Transactions on Services Computing) Xu et al. (2008) proposed a border-based anonymiztion method for set-valued data. Fung et al. (2010): Privacy-preserving data publishing: a survey of recent developments. (ACM Computing Surveys). 1

Summary and Conclusion Studied the problem of privacy-preserving spatial-temporal data publishing. Proposed a service-oriented architecture to determine an appropriate location-based service provider for a given data request. Presented a border-based anonymization algorithm to anonymize a spatial-temporal dataset. Demonstrated the feasibility to simultaneously preserve both privacy and information utility for data mining. 22

Thank you! Questions? Contact: Benjamin Fung <fung@ciise.concordia.ca> Website: http://www.ciise.concordia.ca/~fung 1

References O. Abul, F. Bonchi, and M. Nanni. Never walk alone: Uncertainty for anonymity in moving objects databases. In Proc. of the 24th IEEE International Conference on Data Engineering, pages 376–385, 2008. B. C. M. Fung, T. Trojer, P. C. K. Hung, L. Xiong, K. Al-Hussaeni, and R. Dssouli. Service-oriented architecture for high-dimensional private data mashup. IEEE Transactions on Services Computing (TSC), in press. B. C. M. Fung, K. Wang, R. Chen, and P. S. Yu. Privacy-preserving data publishing: A survey of recent developments. ACM Computing Surveys, 42(4):14:1–14:53, June 2010. A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam. ℓ-diversity: Privacy beyond k-anonymity. ACM TKDD, 1(1):3, March 2007. 1

References N. Mohammed, B. C. M. Fung, and M. Debbabi. Walking in the crowd: anonymizing trajectory data for pattern analysis. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM), pages 1441-1444, Hong Kong: ACM Press, November 2009. N. Mohammed, B. C. M. Fung, P. C. K. Hung, and C. Lee. Anonymizing healthcare data: A case study on the blood transfusion service. In Proc. of the 15th ACM SIGKDD, pages 1285–1294, June 2009. R. G. Pensa, A. Monreale, F. Pinelli, and D. Pedreschi. Pattern preserving k-anonymization of sequences and its application to mobility data mining. In Proc. of the International Workshop on Privacy in Location-Based Applications, 2008. L. Sweeney. Achieving k-anonymity privacy protection using generalization and suppression. International Journal of Uncertainty, Fuzziness, and Knowledge-based Systems, 10(5):571–588, 2002. 1

References M. Terrovitis and N. Mamoulis. Privacy preservation in the publication of trajectories. In Proc. of the 9th International Conference on Mobile Data Management, pages 65–72, Beijing, China, April 2008. K. Wang, B. C. M. Fung, and P. S. Yu. Template-based privacy preservation in classification problems. In Proceedings of the 5th IEEE International Conference on Data Mining (ICDM), pages 466-473, Houston, TX: IEEE Computer Society, November 2005. R. C. W. Wong, J. Li., A. W. C. Fu, and K. Wang. (α,k)-anonymous data publishing. Journal of Intelligent Information Systems, 33(2):209–234, October 2009. Y. Xu, B. C. M. Fung, K. Wang, A. W. C. Fu, and J. Pei. Publishing sensitive transactions for itemset utility. In Proc. of the 8th IEEE International Conference on Data Mining (ICDM), December 2008. 1