Presentation is loading. Please wait.

Presentation is loading. Please wait.

Methods of Secure Computation and Data Integration Jerome Reiter, Duke University Alan Karr, NISS Xiaodong Lin, University of Cincinnati Ashish Sanil,

Similar presentations


Presentation on theme: "Methods of Secure Computation and Data Integration Jerome Reiter, Duke University Alan Karr, NISS Xiaodong Lin, University of Cincinnati Ashish Sanil,"— Presentation transcript:

1 Methods of Secure Computation and Data Integration Jerome Reiter, Duke University Alan Karr, NISS Xiaodong Lin, University of Cincinnati Ashish Sanil, Bristol Myers Squibb

2 General setting Multiple agencies seek to improve analyses by “pooling” their data. Do not want to reveal individual data values unknown to other agencies. Want accurate results from pooling procedures.

3 Pooling situations Horizontally Partitioned: Agencies have different records but same variables. Purely Vertically Partitioned: Agencies have same records but different variables. Partially Overlapping, Vertically Partitioned: Agencies have different records and different variables, with some common records and variables.

4 Horizontal partitioning Karr, Lin, Sanil, Reiter (JCGS, 2005) Secure data integration -- shares data but protects sources. -- allows any analysis to be done. Secure summation -- shares sums without sharing data -- allows regressions, association rules, classifications, clustering

5 Secure summation Obtain without sharing individual values 1. Agency A passes (x + R) to 2 nd agency. 2. Agency B adds its x to this value and passes sum to Agency C. 3. Process continues until all agencies have added their x. 4. Agency A subtracts R from the sum.

6 Purely vertical partitioning Secure dot/matrix product -- shares dot/matrix products without sharing data. -- allows regressions, association rules, classification, clustering. -- assumes semi-honest. Synthetic data approaches -- share synthetic copies of data across agencies. -- allows any analysis when distributions used to generate data are accurate. -- generates public use data file.

7 Secure dot/matrix products Karr, Lin, Reiter, Sanil (NISS tech. report) Compute not revealing individual values 1. Agency A passes where for all i,j to Agency B. 2. Agency B sends to Agency A. 3. Agency A computes

8 Purely vertical partitioning Secure dot/matrix product -- share dot/matrix products without sharing data. -- allows regressions, association rules, classification, clustering. -- assumes semi-honest. Synthetic data approaches -- share synthetic copies of data across agencies. -- allows any analysis when distributions used to generate data are accurate. -- generates public use data file.

9 Synthetic data approach Kohnen (PhD thesis, 2005) Assume X not sensitive. Pass real X to Agency B. Agency B simulates multiple copies of Y for from f(Y|X) estimated using the dataset from Agency A. Pass the copies to Agency A.

10 Synthetic data approach Kohnen (PhD thesis, 2005) Agency A uses partially synthetic data methods (Reiter, Surv. Meth., 2003) for inferences based on Y|X. Agency A can release fully synthetic data to public.

11 Synthetic data approaches Kohnen (PhD thesis, 2005) 1. Agency A simulates disguiser X that look like the genuine values of X, ideally from distribution close to f(X|Y). Pass real X and disguisers to Agency B. 2. Agency B simulates multiple copies of Y for each f(Y|X) estimated using the datasets from Agency A. Pass the copies to Agency A.

12 Synthetic data approaches Kohnen (PhD thesis, 2005) Agency A discards disguisers and uses partially synthetic data methods (Reiter, Surv. Meth., 2003) to obtain inferences using the real X. Agency A can release fully synthetic data to public.

13 Partially overlapping, vertical partitioning Secure EM algorithm -- uses secure dot products -- continuous data: estimate covariance matrix for multivariate normal data -- categorical data: estimate parameters of log-linear models

14 Limitations of methods: Defining a research agenda Secure computation methods: - How to specify models without viewing data? - What if sophisticated models needed? - How to do posterior simulation? Synthetic data methods: - How to generate good disguisers? All methods: - How to incorporate matching errors, differences in data quality and definitions? - How to account for disclosure risks from models that “fit too well?”


Download ppt "Methods of Secure Computation and Data Integration Jerome Reiter, Duke University Alan Karr, NISS Xiaodong Lin, University of Cincinnati Ashish Sanil,"

Similar presentations


Ads by Google