Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca.

Slides:

Advertisements

Similar presentations

Anonymity for Continuous Data Publishing

Advertisements

Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,

Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.

Publishing Set-Valued Data via Differential Privacy Rui Chen, Concordia University Noman Mohammed, Concordia University Benjamin C. M. Fung, Concordia.

CSE 634 Data Mining Techniques

Association rules and frequent itemsets mining

Anonymizing Sequential Releases ACM SIGKDD 2006 Benjamin C. M. Fung Simon Fraser University Ke Wang Simon Fraser University

Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System ` Introduction With the deployment of smart card automated.

Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Probabilistic Privacy Analysis of Published Views Hui (Wendy) Wang Laks V.S.

Hani AbuSharkh Benjamin C. M. Fung fung (at) ciise.concordia.ca

LOGO Association Rule Lecturer: Dr. Bo Yuan

Template-Based Privacy Preservation in Classification Problems IEEE ICDM 2005 Benjamin C. M. Fung Simon Fraser University BC, Canada Ke.

Anonymizing Healthcare Data: A Case Study on the Blood Transfusion Service Benjamin C.M. Fung Concordia University Montreal, QC, Canada

Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.

Anatomy: Simple and Effective Privacy Preservation Xiaokui Xiao, Yufei Tao Chinese University of Hong Kong.

Fast Data Anonymization with Low Information Loss 1 National University of Singapore 2 Hong Kong University

Data Mining Association Analysis: Basic Concepts and Algorithms

Rakesh Agrawal Ramakrishnan Srikant

Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Data Mining Association Analysis: Basic Concepts and Algorithms

An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University This is joint work with Murat.

Association Analysis: Basic Concepts and Algorithms.

Data Mining Association Analysis: Basic Concepts and Algorithms

Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis, Nikos Mamoulis University of Hong Kong Panos Kalnis National University of Singapore.

Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis Institute for the Management of Information Systems (IMIS), RC Athena Nikos Mamoulis.

Privacy Protection for RFID Data Benjamin C.M. Fung Concordia Institute for Information systems Engineering Concordia university Montreal, QC, Canada

Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System Rui Chen, Concordia University Benjamin C. M. Fung,

Preserving Privacy in Published Data

Privacy and trust in social network

USpan: An Efficient Algorithm for Mining High Utility Sequential Patterns Authors: Junfu Yin, Zhigang Zheng, Longbing Cao In: Proceedings of the 18th ACM.

1 Apriori Algorithm Review for Finals. SE 157B, Spring Semester 2007 Professor Lee By Gaurang Negandhi.

Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.

Approximate Frequency Counts over Data Streams Gurmeet Singh Manku, Rajeev Motwani Standford University VLDB2002.

AR mining Implementation and comparison of three AR mining algorithms Xuehai Wang, Xiaobo Chen, Shen chen CSCI6405 class project.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.

Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.

Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )

Differentially Private Data Release for Data Mining Noman Mohammed*, Rui Chen*, Benjamin C. M. Fung*, Philip S. Yu + *Concordia University, Montreal, Canada.

Thwarting Passive Privacy Attacks in Collaborative Filtering Rui Chen Min Xie Laks V.S. Lakshmanan HKBU, Hong Kong UBC, Canada UBC, Canada Introduction.

SFU Pushing Sensitive Transactions for Itemset Utility (IEEE ICDM 2008) Presenter: Yabo, Xu Authors: Yabo Xu, Benjam C.M. Fung, Ke Wang, Ada. W.C. Fu,

Data Anonymization (1). Outline  Problem  concepts  algorithms on domain generalization hierarchy  Algorithms on numerical data.

Data Anonymization – Introduction and k-anonymity Li Xiong CS573 Data Privacy and Security.

Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.

SECURED OUTSOURCING OF FREQUENT ITEMSET MINING Hana Chih-Hua Tai Dept. of CSIE, National Taipei University.

Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?

CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.

1 Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining -SIGKDD’03 Mohammad El-Hajj, Osmar R. Zaïane.

Privacy vs. Utility Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.

Privacy-preserving data publishing

Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.

2015/12/251 Hierarchical Document Clustering Using Frequent Itemsets Benjamin C.M. Fung, Ke Wangy and Martin Ester Proceeding of International Conference.

The Impact of Duality on Data Representation Problems Panagiotis Karras HKU, June 14 th, 2007.

Probabilistic km-anonymity (Efficient Anonymization of Large Set-valued Datasets) Gergely Acs (INRIA) Jagdish Achara (INRIA)

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Personalized Privacy Preservation: beyond k-anonymity and ℓ-diversity SIGMOD 2006 Presented By Hongwei Tian.

1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.

1 Top Down FP-Growth for Association Rule Mining By Ke Wang.

Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),

Data Mining Association Analysis: Basic Concepts and Algorithms

ADAPTIVE DATA ANONYMIZATION AGAINST INFORMATION FUSION BASED PRIVACY ATTACKS ON ENTERPRISE DATA Srivatsava Ranjit Ganta, Shruthi Prabhakara, Raj Acharya.

Association rule mining

Frequent Pattern Mining

Byung Joon Park, Sung Hee Kim

Farzaneh Mirzazadeh Fall 2007

Presented by : SaiVenkatanikhil Nimmagadda

Walking in the Crowd: Anonymizing Trajectory Data for Pattern Analysis

Presentation transcript:

Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca Concordia Institute for Information Systems Engineering Concordia University Montreal, Quebec Canada H3G 1M8 C3S2E-2009 The research is supported in part by the Discovery Grants ( ) from Natural Sciences and Engineering Research Council of Canada (NSERC). Benjamin Fung fung(at)ciise.concordia.ca

1 Overview  RFID basics  RFID data publishing  Problem statement  Proposed algorithms  Evaluation  Conclusion

2 RFID basics  Radio frequency identification  Uses radio frequency (RF) to identify (ID) objects.  Wireless technology  That allows a sensor (reader) to read, from a distance, and without line of sight, a unique electronic product code (EPC) associated with a tag. Tag Reader

3 Data flow in RFID system This is where we use anonymiziation algorithms to preserve the privacy of data to be published.

4 Motivating example  For example, Alice has used her RFID-based credit card at:  Grocery store, Dental clinic, Shopping mall, Beer bar, Casino, AIDs clinic etc.  Assume that Eve has seen Alice using her card at grocery store and shopping mall.  However, if RFID Company publish its data And there is only one record containing grocery store and shopping mall.  Then Eve can immediately infer that this record belongs to Alice and can also learn other locations visited by her. How can the RFID company safeguard the data privacy while keeping the released RFID data useful?

5 RFID database Path EPC1 EPC2 EPC3 Person-Specific Path Table RFID database where, (loc i t i ) is a pair indicating the location and time (called transaction), is a path (called record)

6 Attacker knowledge Attacker knowledge: Suppose the adversary knows that the target victim, Alice, has visited e and c at time 4 and 7, respectively. If there is only record containing e4 and c7 then attacker can easily infer that this record belongs to Alice and can also learn other locations visited by Alice

7 Problem statement  We model attacker knowledge by I.  Attacker can learn maximum of I transactions within any record. Knowledge is constrained by “effort” required to learn.  We transform person-specific path database D to (k,I)-anonymized database D’.  Such that, no attacker having prior knowledge of m transactions of a record r Є D and m ≤ I, can use his knowledge to identify less than k records from D’.

8 A table T satisfies (K,I)-anonymity if and only if r ≥ K for any subsequence s with |s| ≤ I of any path in T, where r is the number of records containing s and K is an anonymity threshold. Problem statement cont. Assume, Attacker knowledge I=2 and, K value = 3 s =, r = 1 s =, r = 3

9 This is easy said but how to transform database D to version D’ that is immunized against re-identification attacks ?

10  Pre-suppression  Firstly, we scan D to find items support < K.  And, delete them from D to get D pre.  Generate subsets of size-i  We generate subsets of size-I from D pre.  And, make additional scan to count their support.  Add dummy records  We make infrequent subsets to be frequent by using IF-anonymity algorithm. Proposed method: Three steps

11 Generate subsets of size-i Subset generation  Increasing lexicographical order,  means we do not consider the reverse combinations of transactions within a record.  The size of subsets generated should not exceed I. {a1, d2}, {a1, b3}, {a1, e4}, {a1, f6}, {a1, c7}, {d2, b3}, {d2, e4}, {d2, f6}, {d2, c7}, {b3, e4}, {b3, f6}, {b3, c7}, {e4, f6}, {e4, c7}, {f6, c7} {b3, e4}, {b3, f6}, {b3, e8}, {e4, f6}, {e4, e8}, {f6, e8} Do this for all records!! Suppose, I = 2

12  Count support for each subset.  Identify frequent and infrequent subsets. Frequent subsets Infrequent subsets These subsets have support value < K value. We need to add dummy records to make them (K,I) anonymous Count support These subsets have support value ≥ K value.

13 If a transaction itself does not meet support value, means any superset containing this transaction will also not meet the support. Pre-suppression

14 Suppose, k = 3 Infrequent subsets Subsets containing ‘a1’ Infrequent subsets Pre-suppression:Example Pre-suppression: Example

15 What is dummy record? Some properties of adding dummy record: Property 1: Length of dummy record should not exceed the maximum length. Property 2: The transactions within dummy record should have reasonable time difference. Dummy records are fake records inserted in a database In order to make infrequent subsets meet support value.

16 Construct tree out of infrequent subsets. we can get the minimum reasonable time difference between any two locations either by learning from D or by using geographical databases Process to add dummy record Null e4: 3b6: 2d2: 1 c7: 2g9: 1e4: 1a5: 1b6: 1 Two properties:  Reasonable time difference.  Length of dummy record.

17 Divide tree if time conflict  Rule 1: Let β is the set of nodes at level 1 of tree  And ‘n’ be the node at which tree need to be divided.  Let γ be the set of children nodes of ‘n’.  If there exists an intersection α between β and γ, β ∩ γ = α ≠ ф.  Let δ be the set of children nodes of α.  And intersection |δ ∩ γ| ≥ |δ| / 2.  We separate ‘n’ and α along with their children nodes (γ and δ respectively) from original tree to construct different tree.

18 Divide tree if large  Count the number of nodes in each tree except null.  If any tree has nodes more than threshold.  Divide tree again by taking ratio:  Let X be the number of nodes in tree and X > λ  Ratio: X / λ.

19 Divide tree Cont..  Rule2: suppose number of nodes at level-1 of tree are |1x|.  And ratio: X / λ ≥ |1x|  We divide tree for each node at level-1 and we compute ratio again for each tree.  And if ratio: X / λ < |1x|  We divide tree at level-1 by combining nodes (at level-1) having more intersecting children’s in one tree.

20 Add dummy  After having each tree satisfying X ≈ λ.  We can write dummy record by following rule 3.  Rule 3:  let Xj be the set of nodes at level-i (initially i =1)  And Xj+1 be the set of nodes at level-(i+1),.....,  Xm be the set of nodes at level-m.  All sets have their values in ascending order by time. We get dummy record by taking Union of (X1, X2,.., Xm).

21 Recount support  Dummy will also generate some subsets for which we do not know the support.  For ex, {a, b}, {b, c} are infrequent subsets and we added dummy a  b  c. To make the frequent but there is also one new subset {a, c} for which we don’t know the support value.  So, we generate subsets of size-I from dummies and count support for each.  We repeat IF-anonymity algorithm for new infrequent subsets.  Process stops when there is no infrequent subset.

22

23 Experimental evaluation: Distortion vs. Dimensions

24 Distortion vs. Attacker knowledge

25 Distortion vs. Maximum length

26 Distortion vs. Number of record

27 Conclusion  Privacy in publishing high dimensional data has become an important issue.  We illustrate the treat of re-identification attack caused by publishing RFID data.  In this paper, we have proposed an efficient scheme to (K,I)-anonymize high dimensional data.

28 References A. R. Beresford and F. Stajano. Location privacy in pervasive computing. IEEE Pervasive Computing, L. Sweeney. Achieving k-Anonymity Privacy Protection Using Generalization and Suppression. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10(5), R. J. Bayardo and R. Agrawal. Data Privacy through Optimal k-Anonymization. In IEEE ICDE, pages 217–228, K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Incognito: Efficient Full-domain k- Anonymity. In ACM SIGMOD, pages 49–60, K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Mondrian Multidimensional k- Anonymity. In IEEE ICDE, C. C. Aggarwal and P. S. Yu. A Condensation Based approach to Privacy Preserving Data Mining. In EDBT, pages , L. Sweeney. k-Anonymity: A Model for Protecting Privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 5, pages 557– 570, A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam. l-Diversity: Privacy beyond k-Anonymity. In IEEE ICDE, 2006.

29 References cont. C. C. Aggarwal. On k-Anonymity and the Curse of Dimensionality. In VLDB, pages 901–909, J. Xu, W. Wang, J. Pei, X. Wang, B. Shi, and A. Fu, Utility-Based anonymization Using Local Recoding. In ACM SIGKDD, X. Xiao and Y. Tao. Anatomy: Simple and Effective Privacy Preservation. In VLDB, Y. Xu, B. C. M. Fung, K. Wang, A. W. C. Fu, and J. Pei. Publishing sensitive transactions for itemset utility. In IEEE ICDM, pages , December G. Ghinita, Y. Tao, P. Kalnis. On the anonymization of sparse high-dimensional data. In IEEE ICDE, M. Terrovitis, N. Mamoulis and P. Kalnis. Anonymity in unstructured data. Technical Report, Hong Kong University, J. Han and M. Kamber. Data mining: Concepts and Techniques. The Morgan Kaufmann series in Data Management Systems, Jim Gray, Series Editor Morgan Kaufmann Publishers, March ISBN

30 References cont. B. C. M. Fung, K. Wang, R. Chen, and P. S. Yu. Privacy-preserving data publishing: a survey on recent developments. ACM Computing Surveys, B. C. M. Fung, K. Wang, L. Wang, and P. C. K. Hung. Privacy-preserving data publishing for cluster analysis. Data & Knowledge Engineering, N. Mohammed, B. C. M. Fung, P. C. K. Hung, and C. K. Lee. Anonymizing healthcare data: a case study on the Red Cross. In ACM SIGKDD, June B. C. M. Fung, K. Wang, and P. S. Yu. Anonymizing classification data for privacy preservation. IEEE (TKDE), 19(5): , May K. Wang, B. C. M. Fung, and P. S. Yu. Handicapping attacker's confidence: an alternative to k-anonymization. Knowledge and Information Systems (KAIS), 11(3): , April Springer-Verlag. B. C. M. Fung, K. Wang, A. W. C. Fu, and J. Pei. Anonymity for continuous data publishing. In EDBT, pages , March K. Wang and B. C. M. Fung. Anonymizing sequential releases. In ACM SIGKDD, pages , August DOI= B. C. M. Fung, K. Wang, and P. S. Yu. Top-down specialization for information and privacy preservation. In IEEE ICDE, pages , April 2005.

Thank you ? 32