Anonymizing Sequential Releases ACM SIGKDD 2006 Benjamin C. M. Fung Simon Fraser University Ke Wang Simon Fraser University

Slides:

Advertisements

Similar presentations

Anonymity for Continuous Data Publishing

Advertisements

Anonymizing Sequential Releases ACM SIGKDD 2006 Benjamin C. M. Fung Simon Fraser University Ke Wang Simon Fraser University

Simulatability “The enemy knows the system”, Claude Shannon CompSci Instructor: Ashwin Machanavajjhala 1Lecture 6 : Fall 12.

Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Probabilistic Privacy Analysis of Published Views Hui (Wendy) Wang Laks V.S.

Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca.

Hani AbuSharkh Benjamin C. M. Fung fung (at) ciise.concordia.ca

Template-Based Privacy Preservation in Classification Problems IEEE ICDM 2005 Benjamin C. M. Fung Simon Fraser University BC, Canada Ke.

Data Anonymization - Generalization Algorithms Li Xiong CS573 Data Privacy and Anonymity.

Anonymizing Healthcare Data: A Case Study on the Blood Transfusion Service Benjamin C.M. Fung Concordia University Montreal, QC, Canada

Privacy-Preserving Data Mashup Benjamin C.M. Fung Concordia University Montreal, QC, Canada Noman Mohammed Concordia University.

1 Privacy in Microdata Release Prof. Ravi Sandhu Executive Director and Endowed Chair March 22, © Ravi Sandhu.

Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.

Anatomy: Simple and Effective Privacy Preservation Xiaokui Xiao, Yufei Tao Chinese University of Hong Kong.

Fast Data Anonymization with Low Information Loss 1 National University of Singapore 2 Hong Kong University

Statistical database security Special purpose: used only for statistical computations. General purpose: used with normal queries (and updates) as well.

UTEPComputer Science Dept.1 University of Texas at El Paso Privacy in Statistical Databases Dr. Luc Longpré Computer Science Department Spring 2006.

Probabilistic Inference Protection on Anonymized Data

Privacy Preserving Serial Data Publishing By Role Composition Yingyi Bu 1, Ada Wai-Chee Fu 1, Raymond Chi-Wing Wong 2, Lei Chen 2, Jiuyong Li 3 The Chinese.

Finding Personally Identifying Information Mark Shaneck CSCI 5707 May 6, 2004.

1 Mining Quantitative Association Rules in Large Relational Database Presented by Jin Jin April 1, 2004.

C MU U sable P rivacy and S ecurity Laboratory 1 Privacy Policy, Law and Technology Data Privacy October 30, 2008.

Anatomy: Simple and Effective Privacy Preservation Israel Chernyak DB Seminar (winter 2009)

PRIVACY CRITERIA. Roadmap Privacy in Data mining Mobile privacy (k-e) – anonymity (c-k) – safety Privacy skyline.

Anonymization of Set-Valued Data via Top-Down, Local Generalization Yeye He Jeffrey F. Naughton University of Wisconsin-Madison 1.

Differentially Private Data Release for Data Mining Benjamin C.M. Fung Concordia University Montreal, QC, Canada Noman Mohammed Concordia University Montreal,

Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014.

Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System Rui Chen, Concordia University Benjamin C. M. Fung,

Preserving Privacy in Published Data

Database Systems Normal Forms. Decomposition Suppose we have a relation R[U] with a schema U={A 1,…,A n } – A decomposition of U is a set of schemas.

Publishing Microdata with a Robust Privacy Guarantee

Integrating Private Databases for Data Analysis IEEE ISI 2005 Benjamin C. M. Fung Simon Fraser University BC, Canada Ke Wang Simon Fraser.

Differentially Private Data Release for Data Mining Noman Mohammed*, Rui Chen*, Benjamin C. M. Fung*, Philip S. Yu + *Concordia University, Montreal, Canada.

Querying Structured Text in an XML Database By Xuemei Luo.

Background Knowledge Attack for Generalization based Privacy- Preserving Data Mining.

Refined privacy models

SFU Pushing Sensitive Transactions for Itemset Utility (IEEE ICDM 2008) Presenter: Yabo, Xu Authors: Yabo Xu, Benjam C.M. Fung, Ke Wang, Ada. W.C. Fu,

Disclosure risk when responding to queries with deterministic guarantees Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University.

K-Anonymity & Algorithms

Data Anonymization (1). Outline  Problem  concepts  algorithms on domain generalization hierarchy  Algorithms on numerical data.

Data Anonymization – Introduction and k-anonymity Li Xiong CS573 Data Privacy and Security.

CHAN Siu Lung, Daniel CHAN Wai Kin, Ken CHOW Chin Hung, Victor KOON Ping Yin, Bob SPRINT: A Scalable Parallel Classifier for Data Mining.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Security Control Methods for Statistical Database Li Xiong CS573 Data Privacy and Security.

Preservation of Proximity Privacy in Publishing Numerical Sensitive Data J. Li, Y. Tao, and X. Xiao SIGMOD 08 Presented by Hongwei Tian.

Additive Data Perturbation: the Basic Problem and Techniques.

1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian.

Privacy vs. Utility Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.

Privacy-preserving data publishing

1/3/ A Framework for Privacy- Preserving Cluster Analysis IEEE ISI 2008 Benjamin C. M. Fung Concordia University Canada Lingyu.

Anonymizing Data with Quasi-Sensitive Attribute Values Pu Shi 1, Li Xiong 1, Benjamin C. M. Fung 2 1 Departmen of Mathematics and Computer Science, Emory.

Differential Privacy Xintao Wu Oct 31, Sanitization approaches Input perturbation –Add noise to data –Generalize data Summary statistics –Means,

Data Anonymization - Generalization Algorithms Li Xiong, Slawek Goryczka CS573 Data Privacy and Anonymity.

1 Systematic Data Selection to Mine Concept-Drifting Data Streams Wei Fan Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery.

Personalized Privacy Preservation: beyond k-anonymity and ℓ-diversity SIGMOD 2006 Presented By Hongwei Tian.

Basic Data Mining Techniques Chapter 3-A. 3.1 Decision Trees.

Aggregator Stage : Definition : Aggregator classifies data rows from a single input link into groups and calculates totals or other aggregate functions.

1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.

Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),

Output Perturbation with Query Relaxation By: XIAO Xiaokui and TAO Yufei Presenter: CUI Yingjie.

Transforming Data to Satisfy Privacy Constraints 컴퓨터교육 전공 032CSE15 최미희.

Versatile Publishing For Privacy Preservation

University of Texas at El Paso

ACHIEVING k-ANONYMITY PRIVACY PROTECTION USING GENERALIZATION AND SUPPRESSION International Journal on Uncertainty, Fuzziness and Knowledge-based Systems,

Anonymizing Sequential Releases

Regression Testing.

Presented by : SaiVenkatanikhil Nimmagadda

Decision Trees for Mining Data Streams

TELE3119: Trusted Networks Week 4

Refined privacy models

Presentation transcript:

Anonymizing Sequential Releases ACM SIGKDD 2006 Benjamin C. M. Fung Simon Fraser University Ke Wang Simon Fraser University

2 Motivation: Sequential Releases Previous works address single release only. Data are released in multiple shots. An organization makes a new release: –New information become available. –A tailored view for each data sharing purpose. –Separate release for sensitive information and identifying information. Related releases sharpens the identification of individuals by a global quasi-identifier.

3 T2: Previous Release PidJobDisease 1BankerCancer 2BankerCancer 3ClerkHIV 4DriverCancer 5EngineerHIV T1: Current Release PidNameJobClass 1AliceBankerc1 2AliceBankerc1 3BobClerkc2 4BobDriverc3 5CathyEngineerc4 The join on T1.Job = T2.Job PidNameJobDiseaseClass 1AliceBankerCancerc1 2AliceBankerCancerc1 3BobClerkHIVc2 4BobDriverCancerc3 5CathyEngineerHIVc4 -AliceBankerCancerc1 -AliceBankerCancerc1 Do not want Name to be linked to Disease in the join of the two releases.

4 T2: Previous Release PidJobDisease 1BankerCancer 2BankerCancer 3ClerkHIV 4DriverCancer 5EngineerHIV T1: Current Release PidNameJobClass 1AliceBankerc1 2AliceBankerc1 3BobClerkc2 4BobDriverc3 5CathyEngineerc4 The join on T1.Job = T2.Job PidNameJobDiseaseClass 1AliceBankerCancerc1 2AliceBankerCancerc1 3BobClerkHIVc2 4BobDriverCancerc3 5CathyEngineerHIVc4 -AliceBankerCancerc1 -AliceBankerCancerc1 join sharpens identification: {Bob, HIV} has groups size 1.

5 T2: Previous Release PidJobDisease 1BankerCancer 2BankerCancer 3ClerkHIV 4DriverCancer 5EngineerHIV T1: Current Release PidNameJobClass 1AliceBankerc1 2AliceBankerc1 3BobClerkc2 4BobDriverc3 5CathyEngineerc4 The join on T1.Job = T2.Job PidNameJobDiseaseClass 1AliceBankerCancerc1 2AliceBankerCancerc1 3BobClerkHIVc2 4BobDriverCancerc3 5CathyEngineerHIVc4 -AliceBankerCancerc1 -AliceBankerCancerc1 join weakens identification: {Alice, Cancer} has groups size 4. lossy join: combat join attack.

6 T2: Previous Release PidJobDisease 1BankerCancer 2BankerCancer 3ClerkHIV 4DriverCancer 5EngineerHIV T1: Current Release PidNameJobClass 1AliceBankerc1 2AliceBankerc1 3BobClerkc2 4BobDriverc3 5CathyEngineerc4 The join on T1.Job = T2.Job PidNameJobDiseaseClass 1AliceBankerCancerc1 2AliceBankerCancerc1 3BobClerkHIVc2 4BobDriverCancerc3 5CathyEngineerHIVc4 -AliceBankerCancerc1 -AliceBankerCancerc1 join enables inferences across tables: Alice  Cancer has 100% confidence.

7 Related Work k-anonymity [ SS98, FWY05, BA05, LDR05, WYC04, WLFW06 ] –Quasi-identifier (QID): a set of identifying attributes in the table. If some record is linked to an external source by a QID value, so are at least k-1 other records. –The database is made anonymous to itself. –In sequential releases, the database must be made anonymous to the combination of all releases thus far.

8 Related Work l-diversity [MGK06] –Ensures that sensitive values are “well- represented” in each QID group, measured by entropy. Confidence limiting [WFY05, WFY06]: qid  s, confidence < h where qid is a value on QID, s is a sensitive value.

9 Related Work View releases –e.g., T1 and T2 are two views, both can be modified before the release: more room for satisfying privacy and information requirements. –[MW04, DP05] measure information disclosure of a view set wrt a secret view. –[YWJ05, KG06] detect privacy violation by a view set over a base table. –They measure or detect violations, but do not remove them.

10 Sequential Release Sequential release: –Current release T1. Previous release T2. –T1 was unknown when T2 was released. –T2, once released, cannot be modified when T1 is released. Solution #1: k-anonymize all attributes in T1. –Excessive distortion. Solution #2: generalize T1 based on T2. –Monotonically distort the later release. Solution #3: release a “complete” cohort of all potential releases anonymized at one time. –Require predicting all future releases

11 Intuition of Our Approach A lossy join hides the true join relationship to cripple a global QID. Generalizing the current release T1 so that the join with the previous release T2 becomes lossy enough to disorient the attacker. Two general notions of privacy: (X,Y)- anonymity and (X,Y)-linkability, where X and Y are sets of attributes.

12 (X,Y)-Anonymity Let x be a value on X. The anonymity of x wrt Y, denoted a Y (x), is the number of distinct values on Y that co- occur with x, i.e., |  Y  x (T)|. Let A Y (X) = min{a Y (x) | x  X}. T satisfies the (X,Y)-anonymity if A Y (X) ≥ k where k is a user-specified threshold in integer.

13 (X,Y)-Linkability Let x be a value on X and y be a value on Y. The linkability of x to y, denoted l y (x), is the percentage of the records that contain both x and y among those that contain x, i.e., a(y,x)/a(x). L y (X) = max{l y (x) | x  X} L Y (X) = max{L y (X) | y  Y}. T satisfies the (X,Y)-linkability if L Y (X) ≤ k where k is a user-specified threshold in real number.

14 (X,Y)-Privacy When no distinction is necessary, we use the term (X,Y)-privacy to refer to either (X,Y)-anonymity or (X,Y)-linkability. This privacy notion generalizes k- anonymity [SS98] and sensitive inferences [WFY05, WFY06].

15 (X,Y)-Privacy k-anonymity: # of distinct records for each QID value ≥ k. (X,Y)-anonymity: # of distinct Y values for each X value ≥ k. (X,Y)-linkability: the maximum confidence that a record contains y given that it contains x ≤ k, where (x,y) are values on X and Y. Generalize k-anonymity [SS98] and confidence limiting [WFY05, WFY06].

16 Example: (X,Y)-Anonymity PidJobZipPoBTest 1Banker123CanadaHIV 1Banker123CanadaDiabetes 1Banker123CanadaEye 2Clerk456JapanHIV 2Clerk456JapanDiabetes 2Clerk456JapanEye 2Clerk456JapanHeart QID = {Job, Zip, PoB} is not a key. k-anonymity fails to ensure that each value on QID is linked to at least k distinct patients.

17 Example: (X,Y)-Anonymity With (X,Y)-anonymity, –specify the anonymity wrt patients by letting X = {Job, Zip, PoB} and Y = Pid –Each X group must be linked to at least k distinct values on Pid. If X = {Job, Zip, PoB} and Y = Test, each X group is required to be linked to at least k distinct tests.

18 Example: (X,Y)-Linkability PidJobZipPoBTest 1Banker123CanadaHIV 2Banker123CanadaHIV 3Banker123CanadaHIV 4Banker123CanadaDiabetes 5Clerk456JapanDiabetes 6Clerk456JapanDiabetes {Banker,123,Canada}  HIV (75% confidence). With Y = Test, the (X,Y)-linkability states that no test can be inferred from a value on X with a confidence higher than a given threshold.

19 Problem Statement The data holder has previously released T2 and wants to release T1, where T2 and T1 are projections of the same underlying table. Want to ensure (X,Y)-privacy on the join of T1 and T2. Sequential anonymization is to generalize T1 on X ∩ att(T1) so that the join of T1 and T2 preserves the (X,Y)-privacy and T1 remains as useful as possible.

20 Generalization / Specialization Each generalization replaces all child values with the parent value. –A cut contains exactly one value on every root-to-leaf path. Each specialization v  {v 1,…,v c }, replaces the value v in every record containing v with the child value v i that is consistent with the original domain value in the record. Job ANY ProfessionalAdmin EngineerLawyerBankerClerk

21 Generalization / Specialization An interval of a continuous attribute is split on-the-fly to maximize information utility. –e.g., age [30-40)  [30-37), [37-40) –The split at 37 maximizes the information gain. A taxonomy tree is dynamically grown for each continuous (non-join) attribute.

22 Match Function Given T1 and T2, the attacker may apply prior knowledge to match the records in T1 and T2. So, the data holder applies such prior knowledge for matching: –schema information of T1 and T2. –taxonomies for attributes. –following inclusion-exclusion principle.

23 Match Function Let t1  T1 and t2  T2. Consistency Predicate: t1.A matches t2.A if they are on the same generalization path for attribute A. –e.g., Male matches Single Male. Inconsistency Predicate: t1.A matches t2.B only if t1.A and t2.B are not semantically inconsistent. –Excludes impossible matches. –e.g., Male and Pregnant are semantically inconsistent, so are Married Male and 6 Month Pregnant.

24 Algorithm Overview Top-Down Specialization for Sequential Anonymization Input: T1, T2, a (X,Y)-privacy requirement, a taxonomy tree for each attribute in X1 where X1=X ∩ att(T1). Output: a generalized T1 satisfying the privacy requirement. 1. generalize every value of A j to ANY j where A j  X1; 2. while there is a valid candidate in ỤCut j do 3. find the winner w of highest Score(w) from ỤCut j ; 4. specialize w on T1 and remove w from ỤCut j ; 5. update Score(v) and the valid status for all v in ỤCut j ; 6. end while 7. output the generalized T1 and ỤCut j ;

25 Monotonic Privacy Theorem 1: On a single table, the (X,Y)-privacy is anti-monotone wrt specialization on X. –If violated, remains violated after a specialization. A Y (X) is non-increasing wrt specialization on X. –X always reduces the set of records that contain a X value, therefore, reduces the set of Y values that co- occur with a X value. L Y (X) is non-decreasing wrt specialization on X. –A specialization v  {v 1,…,v c } transforms a value x on X to the specialized values x 1,…,x c on X. –If l y (x i ) l y (x) (otherwise, l y (x) < l y (x i )).

26 Monotonic Privacy On the join of T1 and T2, in general, (X,Y)- anonymity is not anti-monotone wrt a specialization on X ∩ att(T1). –Specializing T1 may create dangling records. Two tables are population-related if every record in each table has at least one matching record in the other table  no dangling record. Lemma 1: If T1 and T2 are population-related, A Y (X) is non-increasing wrt specialization on X ∩ att(T1).

27 Monotonic Privacy Lemma 2: If Y contains attributes from T1 or T2, but not from both, L Y (X) does not decrease after specialization of T1 on the attributes X ∩ att(T1). Theorem 2: Assume that T1 and T2 are projections of the same underlying tables, (X,Y)-anonymity and (X,Y)-linkability on the join of T1 and T2 are anti-monotone wrt specialization of T1 on X ∩ att(T1).

28 Score Metric Score(v) evaluates the “goodness” of a specialization v for preserving privacy and information. Each specialization v gains some information and loses some privacy. We maximize InfoGain(v) is measured on T1. PrivLoss(v) is measured on the join of T1 and T2.

29 Information Gain If T1 is released for classification on a specified class column, InfoGain(v) could be the reduction of the class entropy: T1[v] denotes the set of generalized records in T1 that contain v before the specialization. T1[v i ] denotes the set of records in T1 that contain v i after the specialization. InfoGain(v) could be the notion of distortion.

30 Privacy Loss PrivLoss(v) is measured by the decrease of A Y (X) or the increase of L Y (X) due to the specialization of v: A Y (X) - A Y (X v ) for (X,Y)-anonymity L Y (X v ) - L Y (X) for (X,Y)-linkability where X and X v represent the attributes before and after specializing v respectively.

31 Challenges 1.Each specialization on w affects the matching of join, thus, privacy checking. too expensive to rejoin the two tables for each specialization. 2.Materializing the join is impractical. A lossy join can be very large. Our solution: Incrementally maintains some count statistics to update Score(v) without executing the join.

32 Data Structure Expensive operations on specializing w –accessing the records in T1 containing w –matching the records in T1 with the records in T2. X1 = X ∩ att(T1) and X2 = X ∩ att(T2), J1 and J2 denote the join attributes in T1 and T2.

33 Data Structure Tree1: partition T1 records by the attributes X1 and J1-X1 in that order, one level per attribute. –Link[v] links up all nodes for v at the attribute level of v. Tree2: partition T2 records by the attributes J2 and X2-J2 in that order. –Tree2 is static. Probe the matching partitions in Tree2. –Match the last |J1| attributes in a partition in Tree1 with the first |J2| attributes in Tree2.

34 Analysis On specializing w, Link[w] provides a direct access to the records involved in T1 Tree2 provides a direct access to the matching partitions in T2. Matching is performed at the partition level, not at the record level. The cost of each iteration has two parts. 1.Specialize the affected partitions on Link[w]. 2.Update the score and status of candidates using count statistics. Each record in T1 is accessed at most | X ∩ att(T1) |  h times where h is the maximum height of the taxonomies.

35 Empirical Study The Adult data set records. Two versions of (T1,T2) Set A (categorical attributes only) –T1 contains the Class attribute, the 3 categorical attributes and the 3 join attributes. –T2 contains the 2 categorical attributes and the 3 join attributes. Set B (both categorical and continuous) –T1 contains the additional 6 continuous attributes from Taxation Department.

DepartmentAttribute# of Leaves # of Levels Taxation (T1) Education (E)165 Occupation (O)143 Work-class (W)85 Common (T1 & T2) Marital-status (M)74 Relationship (Ra)63 Sex (S)22 Immigration (T2) Native-country (Nc)405 Race (Ra)53 Schema for Set A T1 contains the Class attribute

37 Empirical Study Classification metric –Classification error on the generalized testing set of T1. Distortion metric [SS98] –Categorical: 1 unit of distortion for each generalization. –Continuous: Suppose v is generalized to interval [a-b). Unit of distortion = (b-a)/(f2-f1), where [f1,f2) is the full range of the attribute. –Normalize total distortion by the number of records.

38 (X,Y)-Anonymity TopN attributes: most important for classification. –Chosen by successively removing the top attribute in a decision tree. Join attributes are the Top3 attributes. –If not important, simply remove them. X contains –TopN attributes in T1 for a specified N (to ensure that the generalization is performed on important attributes), –all join attributes, –all attributes in T2 (to ensure X is global).

39 Distortion of (X,Y)-anonymity Ki is a key in Ti. XYD: produced by our method with Y = K1. KAD: produced by k-anonymity on T1 with QID=att(T1). Set ASet B

40 Classification error of (X,Y)-anonymity XYE: produced by our method with Y = K1. XYE(row): produced by our method with Y={K1,K2}. BLE: produced by the unmodified data. KAE: produced by k-anonymity on T1 with QID=att(T1). RJE: produced by removing all join attributes from T1. Set A Set B

41 (X,Y)-Linkability Y contains the TopN attributes. –If not important, simply remove them. X contains the rest of the attributes in T1 and T2, except T2.Ra and T2.Nc because otherwise no privacy requirement can be satisfied. Focus on the classification error because the distortion due to (X,Y)-linkability is not comparable with the distortion due to k-anonymity.

42 Classification error of (X,Y)-linkability XYE: produced by our method with Y = TopN. BLE: produced by the unmodified data. RJE: produced by removing all join attributes from T1. RSE: produced by removing all attributes in Y from T1. Set A Set B

43 (X,Y)-anonymity (k=40)(X,Y)-linkability (k=90%) Scalability

44 Conclusion Previous works on k-anonymization focused on a single release of data. Studied the sequential anonymization problem. Extended the privacy notion to this model. Introduced lossy join as a way to hide the join relationship among releases. Addressed computational challenges due to large size of lossy join. Extendable to more than one previously released tables T2,…,Tp.

45 References [BA05] R. Bayardo and R. Agrawal. Data privacy through optimal k-anonymization. In IEEE ICDE, pages , [DP05] A. Deutsch and Y. Papakonstantinou. Privacy in database publishing. In ICDT, [FWY05] B. C. M. Fung, K. Wang, and P. S. Yu. Top-down specialization for information and privacy preservation. In IEEE ICDE, pages , April [KG06] D. Kifer and J. Gehrke. Injecting utility into anonymized datasets. In ACM SIGMOD, Chicago, IL, June 2006.

46 References [LDR05] K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Incognito: Efcient full-domain k- anonymity. In ACM SIGMOD, [MGK06] A. Machanavajjhala, J. Gehrke, and D. Kifer. l-diversity: Privacy beyond k-anonymity. In IEEE ICDE, [MW04] A. Meyerson and R. Williams. On the complexity of optimal k-anonymity. In PODS, [SS98] P. Samarati and L. Sweeney. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. In IEEE Symposium on Research in Security and Privacy, May 1998.

47 References [WFY05] K. Wang, B. C. M. Fung, and P. S. Yu. Template-based privacy preservation in classification problems. In IEEE ICDM, pages , November [WFY06] K. Wang, B. C. M. Fung, and P. S. Yu. Handicapping attacker's condence: An alternative to k-anonymization. Knowledge and Information Systems: An International Journal, [WYC04] K. Wang, P. S. Yu, and S. Chakraborty. Bottom-up generalization: A data mining solution to privacy protection. In IEEE ICDM, November 2004.

48 References [WLFW06] R. C. W. Wong, J. Li., A. W. C. Fu, and K. Wang. ( ,k)-anonymity: An enhanced k- anonymity model for privacy preserving data publishing. In ACM SIGKDD, [YWJ05] C. Yao, X. S. Wang, and S. Jajodia. Checking for k-anonymity violation by views. In VLDB, 2005.