Anonymizing Healthcare Data: A Case Study on the Blood Transfusion Service Benjamin C.M. Fung Concordia University Montreal, QC, Canada

Slides:



Advertisements
Similar presentations
A View Based Security Framework for XML Wenfei Fan, Irini Fundulaki, Floris Geerts, Xibei Jia, Anastasios Kementsietsidis University of Edinburgh Digital.
Advertisements

Anonymity for Continuous Data Publishing
Publishing Set-Valued Data via Differential Privacy Rui Chen, Concordia University Noman Mohammed, Concordia University Benjamin C. M. Fung, Concordia.
Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Probabilistic Privacy Analysis of Published Views Hui (Wendy) Wang Laks V.S.
Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca.
Hani AbuSharkh Benjamin C. M. Fung fung (at) ciise.concordia.ca
Secure Distributed Framework for Achieving -Differential Privacy Dima Alhadidi, Noman Mohammed, Benjamin C. M. Fung, and Mourad Debbabi Concordia Institute.
Template-Based Privacy Preservation in Classification Problems IEEE ICDM 2005 Benjamin C. M. Fung Simon Fraser University BC, Canada Ke.
Privacy-Preserving Data Mashup Benjamin C.M. Fung Concordia University Montreal, QC, Canada Noman Mohammed Concordia University.
Anatomy: Simple and Effective Privacy Preservation Xiaokui Xiao, Yufei Tao Chinese University of Hong Kong.
Fast Data Anonymization with Low Information Loss 1 National University of Singapore 2 Hong Kong University
Database Implementation of a Model-Free Classifier Konstantinos Morfonios ADBIS 2007 University of Athens.
UTEPComputer Science Dept.1 University of Texas at El Paso Privacy in Statistical Databases Dr. Luc Longpré Computer Science Department Spring 2006.
Mining Changes of Classification by Correspondence Tracing Ke Wang Senqiang Zhou Simon Fraser University {wangk, Wai Chee Ada Fu Jeffrey.
Probabilistic Inference Protection on Anonymized Data
Privacy Preserving Serial Data Publishing By Role Composition Yingyi Bu 1, Ada Wai-Chee Fu 1, Raymond Chi-Wing Wong 2, Lei Chen 2, Jiuyong Li 3 The Chinese.
1 On the Anonymization of Sparse High-Dimensional Data 1 National University of Singapore 2 Chinese University of Hong.
Anatomy: Simple and Effective Privacy Preservation Israel Chernyak DB Seminar (winter 2009)
April 13, 2010 Towards Publishing Recommendation Data With Predictive Anonymization Chih-Cheng Chang †, Brian Thompson †, Hui Wang ‡, Danfeng Yao † †‡
The Union-Split Algorithm and Cluster-Based Anonymization of Social Networks Brian Thompson Danfeng Yao Rutgers University Dept. of Computer Science Piscataway,
Hippocratic Databases Paper by Rakesh Agrawal, Jerry Kiernan, Ramakrishnan Srikant, Yirong Xu CS 681 Presented by Xi Hua March 1st,Spring05.
Anonymization of Set-Valued Data via Top-Down, Local Generalization Yeye He Jeffrey F. Naughton University of Wisconsin-Madison 1.
Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis, Nikos Mamoulis University of Hong Kong Panos Kalnis National University of Singapore.
Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis Institute for the Management of Information Systems (IMIS), RC Athena Nikos Mamoulis.
Privacy Protection for RFID Data Benjamin C.M. Fung Concordia Institute for Information systems Engineering Concordia university Montreal, QC, Canada
Differentially Private Data Release for Data Mining Benjamin C.M. Fung Concordia University Montreal, QC, Canada Noman Mohammed Concordia University Montreal,
Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014.
Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System Rui Chen, Concordia University Benjamin C. M. Fung,
Preserving Privacy in Published Data
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
Publishing Microdata with a Robust Privacy Guarantee
1 Office of the Privacy Commissioner for Personal Data Hong Kong SAR Tony LAM Deputy Privacy Commissioner for Personal Data Asian Personal Data Privacy.
m-Privacy for Collaborative Data Publishing
Differentially Private Data Release for Data Mining Noman Mohammed*, Rui Chen*, Benjamin C. M. Fung*, Philip S. Yu + *Concordia University, Montreal, Canada.
Thwarting Passive Privacy Attacks in Collaborative Filtering Rui Chen Min Xie Laks V.S. Lakshmanan HKBU, Hong Kong UBC, Canada UBC, Canada Introduction.
Protecting Sensitive Labels in Social Network Data Anonymization.
SFU Pushing Sensitive Transactions for Itemset Utility (IEEE ICDM 2008) Presenter: Yabo, Xu Authors: Yabo Xu, Benjam C.M. Fung, Ke Wang, Ada. W.C. Fu,
Accuracy-Constrained Privacy-Preserving Access Control Mechanism for Relational Data.
K-Anonymity & Algorithms
Data Anonymization (1). Outline  Problem  concepts  algorithms on domain generalization hierarchy  Algorithms on numerical data.
Data Anonymization – Introduction and k-anonymity Li Xiong CS573 Data Privacy and Security.
Security Control Methods for Statistical Database Li Xiong CS573 Data Privacy and Security.
Preservation of Proximity Privacy in Publishing Numerical Sensitive Data J. Li, Y. Tao, and X. Xiao SIGMOD 08 Presented by Hongwei Tian.
Additive Data Perturbation: the Basic Problem and Techniques.
1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian.
Randomization in Privacy Preserving Data Mining Agrawal, R., and Srikant, R. Privacy-Preserving Data Mining, ACM SIGMOD’00 the following slides include.
Privacy vs. Utility Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
Privacy-preserving data publishing
1/3/ A Framework for Privacy- Preserving Cluster Analysis IEEE ISI 2008 Benjamin C. M. Fung Concordia University Canada Lingyu.
m-Privacy for Collaborative Data Publishing
Thesis Sumathie Sundaresan Advisor: Dr. Huiping Guo.
Anonymizing Data with Quasi-Sensitive Attribute Values Pu Shi 1, Li Xiong 1, Benjamin C. M. Fung 2 1 Departmen of Mathematics and Computer Science, Emory.
Probabilistic km-anonymity (Efficient Anonymization of Large Set-valued Datasets) Gergely Acs (INRIA) Jagdish Achara (INRIA)
Privacy-Preserving Publication of User Locations in the Proximity of Sensitive Sites Bharath Krishnamachari Gabriel Ghinita Panos Kalnis National University.
Personalized Privacy Preservation: beyond k-anonymity and ℓ-diversity SIGMOD 2006 Presented By Hongwei Tian.
Privacy Preserving Outlier Detection using Locality Sensitive Hashing
Large-Scale Record Linkage Support for Cloud Computing Platforms Yuan Xue, Bradley Malin, Elizabeth Durham EECS Department, Biomedical Informatics Department,
Transforming Data to Satisfy Privacy Constraints 컴퓨터교육 전공 032CSE15 최미희.
Research in Health Informatics: Bangladesh Perspective Dr. Abu Sayed Md. Latiful Hoque Professor, Dept. of CSE, BUET Workshop on Health Data Analytics.
Versatile Publishing For Privacy Preservation
ACHIEVING k-ANONYMITY PRIVACY PROTECTION USING GENERALIZATION AND SUPPRESSION International Journal on Uncertainty, Fuzziness and Knowledge-based Systems,
Xiaokui Xiao and Yufei Tao Chinese University of Hong Kong
ADAPTIVE DATA ANONYMIZATION AGAINST INFORMATION FUSION BASED PRIVACY ATTACKS ON ENTERPRISE DATA Srivatsava Ranjit Ganta, Shruthi Prabhakara, Raj Acharya.
Privacy Preserving Data Publishing
Fenglong Ma1, Jing Gao1, Qiuling Suo1
By (Group 17) Mahesha Yelluru Rao Surabhee Sinha Deep Vakharia
Anonymizing Sequential Releases
Presented by : SaiVenkatanikhil Nimmagadda
Walking in the Crowd: Anonymizing Trajectory Data for Pattern Analysis
Presentation transcript:

Anonymizing Healthcare Data: A Case Study on the Blood Transfusion Service Benjamin C.M. Fung Concordia University Montreal, QC, Canada Noman Mohammed Concordia University Montreal, QC, Canada Cheuk-kwong Lee Hong Kong Red Cross Blood Transfusion Service Kowloon, Hong Kong Patrick C. K. Hung UOIT Oshawa, ON, Canada KDD 2009

Outline  Motivation & background  Privacy threats & information needs  Challenges  LKC-privacy model  Experimental results  Related work  Conclusions 2

Motivation & background  Organization: Hong Kong Red Cross Blood Transfusion Service and Hospital Authority 3

Data flow in Hong Kong Red Cross 4

Healthcare IT Policies  Hong Kong Personal Data (Privacy) Ordinance  Personal Information Protection and Electronic Documents Act (PIPEDA)  Underlying Principles  Principle 1: Purpose and manner of collection  Principle 2: Accuracy and duration of retention  Principle 3: Use of personal data  Principle 4: Security of Personal Data  Principle 5: Information to be Generally Available  Principle 6 : Access to Personal Data 5

Contributions  Very successful showcase of privacy-preserving technology  Proposed LKC-privacy model for anonymizing healthcare data  Provided an algorithm to satisfy both privacy and information requirement  Will benefit similar challenges in information sharing 6

Outline  Motivation & background  Privacy threats & information needs  Challenges  LKC-privacy model  Experimental results  Related work  Conclusions 7

Privacy threats  Identity Linkage: takes place when the number of records containing same QID values is small or unique. 8 Data recipientsAdversary Knowledge: Mover, age 34 Identity Linkage Attack

Privacy threats  Identity Linkage: takes place when the number of records that contain the known pair sequence is small or unique.  Attribute Linkage: takes place when the attacker can infer the value of the sensitive attribute with a higher confidence. 9 Knowledge: Male, age 34 Attribute Linkage Attack Adversary

Information needs TTwo types of data analysis CClassification model on blood transfusion data SSome general count statistics wwhy does not release a classifier or some statistical information? nno expertise and interest …. iimpractical to continuously request…. mmuch better flexibility to perform…. 10

Outline  Motivation & background  Privacy threats & information needs  Challenges  LKC-privacy model  Experimental results  Related work  Conclusions 11

Challenges  Why not use the existing techniques ?  The blood transfusion data is high-dimensional  It suffers from the “curse of dimensionality”  Our experiments also confirm this reality 12

Curse of High-dimensionality 13 IDJobSexAgeEducationSensitive Attribute 1JanitorM25Primary … 2JanitorM40Primary … 3JanitorF25Secondary … 4JanitorF40Secondary … 5MoverM25Secondary … 6MoverF40Primary … 7MoverM40Secondary … 8MoverF25Primary … K=2 QID = {Job, Sex, Age, Education} Job ANY MoverJanitor Sex ANY MaleFemale Age ANY 2540 Education ANY PrimarySecondary

14 IDJobSexAgeEducationSensitive Attribute 1AnyM25Primary … 2AnyM40Primary … 3AnyF25Secondary … 4AnyF40Secondary … 5AnyM25Secondary … 6AnyF40Primary … 7AnyM40Secondary … 8AnyF25Primary … K=2 QID = {Job, Sex, Age, Education} Job ANY MoverJanitor Sex ANY MaleFemale Age ANY 2540 Education ANY PrimarySecondary Curse of High-dimensionality

What if we have 10 attributes ? IDJobSexAgeEducationSensitive Attribute 1Any 25Primary … 2Any 40Primary … 3Any 25Secondary … 4Any 40Secondary … 5Any 25Secondary … 6Any 40Primary … 7Any 40Secondary … 8Any 25Primary … K=2 QID = {Job, Sex, Age, Education} Job ANY MoverJanitor Sex ANY MaleFemale Age ANY 2540 Education ANY PrimarySecondary What if we have 20 attributes ? What if we have 40 attributes ? Curse of High-dimensionality 15

Outline  Motivation & background  Privacy threats & information needs  Challenges  LKC-privacy model  Experimental results  Related work  Conclusions 16

17 L=2, K=2, C=50% QID 1 = QID 2 = QID 3 = QID 4 = QID 5 = QID 6 = IDJobSexAgeEducationSurgery 1JanitorM25Primary Plastic 2JanitorM40Primary Transgender 3JanitorF25Secondary Transgender 4JanitorF40Secondary Vascular 5MoverM25Secondary Urology 6MoverF40Primary Plastic 7MoverM40Secondary Vascular 8MoverF25Primary Urology Is it possible for an adversary to acquire all the information about a target victirm? Job ANY MoverJanitor Sex ANY MaleFemale Age ANY 2540 Education ANY PrimarySecondary LKC-privacy

18 L=2, K=2, C=50% QID 1 = QID 2 = QID 3 = QID 4 = QID 5 = QID 6 = IDJobSexAgeEducationSurgery 1JanitorM25Primary Plastic 2JanitorM40Primary Transgender 3JanitorF25Secondary Transgender 4JanitorF40Secondary Vascular 5MoverM25Secondary Urology 6MoverF40Primary Plastic 7MoverM40Secondary Vascular 8MoverF25Primary Urology Job ANY MoverJanitor Sex ANY MaleFemale Age ANY 2540 Education ANY PrimarySecondary LKC-privacy

19 L=2, K=2, C=50% QID 1 = QID 2 = QID 3 = QID 4 = QID 5 = QID 6 = IDJobSexAgeEducationSurgery 1JanitorM25Primary Plastic 2JanitorM40Primary Transgender 3JanitorF25Secondary Transgender 4JanitorF40Secondary Vascular 5MoverM25Secondary Urology 6MoverF40Primary Plastic 7MoverM40Secondary Vascular 8MoverF25Primary Urology Job ANY MoverJanitor Sex ANY MaleFemale Age ANY 2540 Education ANY PrimarySecondary LKC-privacy

20 L=2, K=2, C=50% QID 1 = QID 2 = QID 3 = QID 4 = QID 5 = QID 6 = IDJobSexAgeEducationSurgery 1JanitorM25Primary Plastic 2JanitorM40Primary Transgender 3JanitorF25Secondary Transgender 4JanitorF40Secondary Vascular 5MoverM25Secondary Urology 6MoverF40Primary Plastic 7MoverM40Secondary Vascular 8MoverF25Primary Urology Job ANY MoverJanitor Sex ANY MaleFemale Age ANY 2540 Education ANY PrimarySecondary LKC-privacy

21 L=2, K=2, C=50% QID 1 = QID 2 = QID 3 = QID 4 = QID 5 = QID 6 = IDJobSexAgeEducationSurgery 1JanitorM25Primary Plastic 2JanitorM40Primary Transgender 3JanitorF25Secondary Transgender 4JanitorF40Secondary Vascular 5MoverM25Secondary Urology 6MoverF40Primary Plastic 7MoverM40Secondary Vascular 8MoverF25Primary Urology Job ANY MoverJanitor Sex ANY MaleFemale Age ANY 2540 Education ANY PrimarySecondary LKC-privacy

22 L=2, K=2, C=50% QID 1 = QID 2 = QID 3 = QID 4 = QID 5 = QID 6 = IDJobSexAgeEducationSurgery 1JanitorM25Primary Plastic 2JanitorM40Primary Transgender 3JanitorF25Secondary Transgender 4JanitorF40Secondary Vascular 5MoverM25Secondary Urology 6MoverF40Primary Plastic 7MoverM40Secondary Vascular 8MoverF25Primary Urology Job ANY MoverJanitor Sex ANY MaleFemale Age ANY 2540 Education ANY PrimarySecondary LKC-privacy

23 L=2, K=2, C=50% QID 1 = QID 2 = QID 3 = QID 4 = QID 5 = QID 6 = IDJobSexAgeEducationSurgery 1JanitorM25Primary Plastic 2JanitorM40Primary Transgender 3JanitorF25Secondary Transgender 4JanitorF40Secondary Vascular 5MoverM25Secondary Urology 6MoverF40Primary Plastic 7MoverM40Secondary Vascular 8MoverF25Primary Urology Job ANY MoverJanitor Sex ANY MaleFemale Age ANY 2540 Education ANY PrimarySecondary LKC-privacy

 A database, T meets LKC-privacy if and only if |T(qid)|>=K and Pr(s|T(qid))<=C for any given attacker knowledge q, where |q|<=L  “s” is the sensitive attribute  “k” is a positive integer  “qid” to denote adversary’s prior knowledge  “T(qid)” is the group of records that contains “qid” 24 LKC-privacy

 Some properties of LKC-privacy:  it only requires a subset of QID attributes to be shared by at least K records  K-anonymity is a special case of LKC-privacy with L = |QID| and C = 100%  Confidence bounding is also a special case of LKC- privacy with L = |QID| and K = 1  (a, k)-anonymity is also a special case of LKC-privacy with L = |QID|, K = k, and C = a 25

Algorithm for LKC-privacy  We extended the TDS to incorporate LKC-privacy  B. C. M. Fung, K. Wang, and P. S. Yu. Anonymizing classification data for privacy preservation. In TKDE,  LKC-privacy model can also be achieved by other algorithms  R. J. Bayardo and R. Agrawal. Data Privacy Through Optimal k-Anonymization. In ICDE  K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Workload- aware anonymization techniques for large-scale data sets. In TODS,

Outline  Motivation & background  Privacy threats & information needs  Challenges  LKC-privacy model  Experimental results  Related work  Conclusions 27

Experimental Evaluation  We employ two real-life datasets  Blood: is a real-life blood transfusion dataset 41 attributes are QID attributes Blood Group represents the Class attribute (8 values) Diagnosis Codes represents sensitive attribute (15 values) 10,000 blood transfusion records in  Adult: is a Census data (from UCI repository) 6 continuous attributes. 8 categorical attributes. 45,222 census records 28

Data Utility  Blood dataset 29

Data Utility  Blood dataset 30

Data Utility  Adult dataset 31

Data Utility  Adult dataset 32

Efficiency and Scalability  Took at most 30 seconds for all previous experiments 33

Outline  Motivation & background  Privacy threats & information needs  Challenges  LKC-privacy model  Experimental results  Related work  Conclusions 34

Related work  Y. Xu, K. Wang, A. W. C. Fu, and P. S. Yu. Anonymizing transaction databases for publication. In SIGKDD,  Y. Xu, B. C. M. Fung, K. Wang, A. W. C. Fu, and J. Pei. Publishing sensitive transactions for itemset utility. In ICDM,  M. Terrovitis, N. Mamoulis, and P. Kalnis. Privacy- preserving anonymization of set-valued data. In VLDB,  G. Ghinita, Y. Tao, and P. Kalnis. On the anonymization of sparse high-dimensional data. In ICDE,

Outline  Motivation & background  Privacy threats & information needs  Challenges  LKC-privacy model  Experimental results  Related work  Conclusions 36

Conclusions  Successful demonstration of a real life application  It is important to educate health institute managements and medical practitioners  Health data are complex: combination of relational, transaction and textual data  Source codes and datasets download: 37

 Q&A Thank You Very Much 38