Preservation of Proximity Privacy in Publishing Numerical Sensitive Data J. Li, Y. Tao, and X. Xiao SIGMOD 08 Presented by Hongwei Tian.

Slides:



Advertisements
Similar presentations
Dana Shapira Hash Tables
Advertisements

Authors: Jianneng Cao, Panagiotis Karras, Panos Kalnis, Kian-Lee Tan
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
A Privacy Preserving Index for Range Queries
The Role of History and Prediction in Data Privacy Kristen LeFevre University of Michigan May 13, 2009.
Simulatability “The enemy knows the system”, Claude Shannon CompSci Instructor: Ashwin Machanavajjhala 1Lecture 6 : Fall 12.
Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Probabilistic Privacy Analysis of Published Views Hui (Wendy) Wang Laks V.S.
M-Invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets by Tyrone Cadenhead.
M-Invariance and Dynamic Datasets based on: Xiaokui Xiao, Yufei Tao m-Invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets Slawomir.
Anonymizing Healthcare Data: A Case Study on the Blood Transfusion Service Benjamin C.M. Fung Concordia University Montreal, QC, Canada
Personalized Privacy Preservation Xiaokui Xiao, Yufei Tao City University of Hong Kong.
1 Privacy in Microdata Release Prof. Ravi Sandhu Executive Director and Endowed Chair March 22, © Ravi Sandhu.
Ming Hua, Jian Pei Simon Fraser UniversityPresented By: Mahashweta Das Wenjie Zhang, Xuemin LinUniversity of Texas at Arlington The University of New South.
Anatomy: Simple and Effective Privacy Preservation Xiaokui Xiao, Yufei Tao Chinese University of Hong Kong.
Fast Data Anonymization with Low Information Loss 1 National University of Singapore 2 Hong Kong University
Leting Wu Xiaowei Ying, Xintao Wu Dept. Software and Information Systems Univ. of N.C. – Charlotte Reconstruction from Randomized Graph via Low Rank Approximation.
Probabilistic Inference Protection on Anonymized Data
Privacy Preserving Serial Data Publishing By Role Composition Yingyi Bu 1, Ada Wai-Chee Fu 1, Raymond Chi-Wing Wong 2, Lei Chen 2, Jiuyong Li 3 The Chinese.
K Beyond k-Anonimity: A Decision Theoretic Framework for Assessing Privacy Risk M.Scannapieco, G.Lebanon, M.R.Fouad and E.Bertino.
Finding Personally Identifying Information Mark Shaneck CSCI 5707 May 6, 2004.
1 On the Anonymization of Sparse High-Dimensional Data 1 National University of Singapore 2 Chinese University of Hong.
Anatomy: Simple and Effective Privacy Preservation Israel Chernyak DB Seminar (winter 2009)
Attacks against K-anonymity
Privacy Preserving OLAP Rakesh Agrawal, IBM Almaden Ramakrishnan Srikant, IBM Almaden Dilys Thomas, Stanford University.
PRIVACY CRITERIA. Roadmap Privacy in Data mining Mobile privacy (k-e) – anonymity (c-k) – safety Privacy skyline.
Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis, Nikos Mamoulis University of Hong Kong Panos Kalnis National University of Singapore.
Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis Institute for the Management of Information Systems (IMIS), RC Athena Nikos Mamoulis.
Database Access Control & Privacy: Is There A Common Ground? Surajit Chaudhuri, Raghav Kaushik and Ravi Ramamurthy Microsoft Research.
Database Laboratory Regular Seminar TaeHoon Kim.
Differentially Private Data Release for Data Mining Benjamin C.M. Fung Concordia University Montreal, QC, Canada Noman Mohammed Concordia University Montreal,
R 18 G 65 B 145 R 0 G 201 B 255 R 104 G 113 B 122 R 216 G 217 B 218 R 168 G 187 B 192 Core and background colors: 1© Nokia Solutions and Networks 2014.
Preserving Privacy in Published Data
Privacy and trust in social network
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
Publishing Microdata with a Robust Privacy Guarantee
APPLYING EPSILON-DIFFERENTIAL PRIVATE QUERY LOG RELEASING SCHEME TO DOCUMENT RETRIEVAL Sicong Zhang, Hui Yang, Lisa Singh Georgetown University August.
Statistical Databases – Query Auditing Li Xiong CS573 Data Privacy and Anonymity Partial slides credit: Vitaly Shmatikov, Univ Texas at Austin.
CS573 Data Privacy and Security Statistical Databases
Data Publishing against Realistic Adversaries Johannes Gerhrke Cornell University Ithaca, NY Michaela Götz Cornell University Ithaca, NY Ashwin Machanavajjhala.
Differentially Private Data Release for Data Mining Noman Mohammed*, Rui Chen*, Benjamin C. M. Fung*, Philip S. Yu + *Concordia University, Montreal, Canada.
Protecting Sensitive Labels in Social Network Data Anonymization.
Background Knowledge Attack for Generalization based Privacy- Preserving Data Mining.
CS573 Data Privacy and Security Anonymization methods Li Xiong.
Refined privacy models
K-Anonymity & Algorithms
Privacy Preservation of Aggregates in Hidden Databases: Why and How? Arjun Dasgupta, Nan Zhang, Gautam Das, Surajit Chaudhuri Presented by PENG Yu.
Data Anonymization (1). Outline  Problem  concepts  algorithms on domain generalization hierarchy  Algorithms on numerical data.
Data Anonymization – Introduction and k-anonymity Li Xiong CS573 Data Privacy and Security.
Security Control Methods for Statistical Database Li Xiong CS573 Data Privacy and Security.
Hybrid l-Diversity* Mehmet Ercan NergizMuhammed Zahit GökUfuk Özkanlı
Additive Data Perturbation: the Basic Problem and Techniques.
1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian.
Privacy vs. Utility Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
Privacy-preserving data publishing
Thesis Sumathie Sundaresan Advisor: Dr. Huiping Guo.
Data Anonymization - Generalization Algorithms Li Xiong, Slawek Goryczka CS573 Data Privacy and Anonymity.
Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’12 Part 4: Data Dependent Query Processing Methods Yin “David” Yang.
No Free Lunch in Data Privacy CompSci Instructor: Ashwin Machanavajjhala 1Lecture 15: Fall 12.
Personalized Privacy Preservation: beyond k-anonymity and ℓ-diversity SIGMOD 2006 Presented By Hongwei Tian.
Output Perturbation with Query Relaxation By: XIAO Xiaokui and TAO Yufei Presenter: CUI Yingjie.
Transforming Data to Satisfy Privacy Constraints 컴퓨터교육 전공 032CSE15 최미희.
Privacy Issues in Graph Data Publishing Summer intern: Qing Zhang (from NC State University) Mentors: Graham Cormode and Divesh Srivastava.
Xiaowei Ying, Kai Pan, Xintao Wu, Ling Guo Univ. of North Carolina at Charlotte SNA-KDD June 28, 2009, Paris, France Comparisons of Randomization and K-degree.
ACHIEVING k-ANONYMITY PRIVACY PROTECTION USING GENERALIZATION AND SUPPRESSION International Journal on Uncertainty, Fuzziness and Knowledge-based Systems,
Xiaokui Xiao and Yufei Tao Chinese University of Hong Kong
Executive Director and Endowed Chair
Executive Director and Endowed Chair
DATABASE HISTOGRAMS E0 261 Jayant Haritsa
Presented by : SaiVenkatanikhil Nimmagadda
Refined privacy models
Presentation transcript:

Preservation of Proximity Privacy in Publishing Numerical Sensitive Data J. Li, Y. Tao, and X. Xiao SIGMOD 08 Presented by Hongwei Tian

Outline What is PPDP  Existing Privacy Principles Proximity Attack  (ε, m)-anonymity  Determine ε and m  Algorithm Experiments and Conclusion

Privacy Preservation Data Publishing A true story in Massachusetts, 1997  GIC  20 dollars  Governor Weld

PPDP Privacy  Sensitive information of individuals should be protected in the published data  More anonymized data Utility  The published data should be useful  More accurate data

PPDP Anonymization Technique  Generalization Specific value -> General value Maintain the semantic meaning  > 7825*, UTSA -> University, 28 -> [20, 30]  Perturbation One value -> another random value Huge information loss -> poor utility

PPDP Example of Generalization

Some Existing Privacy Principles Generalization  SA – Categorical k-anonymity l-diversity, (α, k)-anonymity, m-invariance, … (c, k)-safety, Skyline-privacy …  SA – Numerical (k, e)-anonymity, Variance Control t-closeness δ-presence …

Next… What is PPDP  Existing Privacy Principles Proximity Attack  (ε, m)-anonymity  Determine ε and m  Algorithm Experiments and Conclusion

Proximity Attack

(ε, m)-anonymity I(t)  private neighborhood of tuple t  I(t) = [t.SA − ε, t.SA + ε]  I(t) = [t.SA·(1 − ε), t.SA·(1 + ε)] P(t)  the risk of proximity breach of tuple t  P(t) = x / |G|

(ε, m)-anonymity ε = 20 I(t1) = [980, 1020] x = 3, |G| = 4 P(t1) = 3/4

(ε, m)-anonymity Principle  Given a real value ε and an integer m ≥ 1, a generalized table T ∗ fulfills absolute (relative) (ε,m)-anonymity, if P(t) ≤ 1/m for every tuple t ∈ T.  Larger ε and m mean stricter privacy requirement

(ε, m)-anonymity What is the Meaning of m?  |G| ≥ m  The best situation is for any two tuples t i and t j in G, and  Similar to l-diversity when the equivalence class has l tuples with distinct SA values.

(ε, m)-anonymity How to make t j.SA does not fall in I(t i )?  All tuples in G are sorted in ascending order of their SA values  | j – i | ≧ max{ |left(t j,G)|, |right(t i,G)| }

(ε, m)-anonymity Let maxsize(G) = max ∀ t ∈ G { max{ |left(t,G)|, |right(t,G)| } } | j – i | ≧ maxsize(G)

(ε, m)-anonymity Partitioning  Ascending order of tuples in G according to SA values  Hash the ith tuple into the jth bucket using function j = (i mod maxsize(G))+1  Thus, all tuples (SA values) in the same bucket do not fall into the neighborhood of each other.

(ε, m)-anonymity (6, 2)-anonymity  Privacy is breached  P(t 3 )= ¾ >1/m =1/2 Need partitioning  An ascending order is ready according to SA values  g = maxsize(G) = 2  j = (i mod 2)+1  New P(t 3 )= 1/2 tupleNoQISA 1q10 2q20 3q25 4q30

Determine ε and m Given ε and m  Check if an equivalence class G satisfies (ε, m)- anonymity  Theorem: G has at least one (ε, m)-anonymous generalization, iff Scan the sorted tuples in G to find maxsize(G) Predict whether G can be partitioned or not

Algorithm Step 1: Splitting  Mondrain, ICDE  Splitting is only based on QI-attributes  Iteratively find median value of frequency sets on one selected QI-dimension to cut G into G1 and G2, and make sure G1 and G2 are legal to be partitioned.

Algorithm Splitting ((6, 2)-anonymity)

Algorithm Step 2: Partitioning  After step 1 stops  Check all G produced by splitting Release directly if G satisfies (ε, m)-anonymity Otherwise, Partitioning, and then release new buckets

Algorithm Partitioning ((6, 2)-anonymity)

Next… What is PPDP Evolution of Privacy Preservation Proximity Attack  (ε, m)-anonymity  determine ε and m  algorithm Experiments and Conclusion

Experiments Real Database SAL  Attributes are Age, Birthplace, Occupation and Income with domains [16,93], [1,710], [1,983], and [1k, 100k], respectively.  500K tuples Compare to a perturbation method (OLAP, SIGMOD 2005 )

Experiments - Utility Use count query with workload = 1000

Experiments - Utility

Experiments - Efficiency

Conclusion Discuss most of existing privacy principles in PPDP Identify the proximity attack and propose (ε, m)-anonymity to prevent this attack Verify that the method is effective and efficient experimentally

Any Question?