Download presentation
Presentation is loading. Please wait.
Published byGabriel Harper Modified over 9 years ago
1
Methods of Secure Computation and Data Integration Jerome Reiter, Duke University Alan Karr, NISS Xiaodong Lin, University of Cincinnati Ashish Sanil, Bristol Myers Squibb
2
General setting Multiple agencies seek to improve analyses by “pooling” their data. Do not want to reveal individual data values unknown to other agencies. Want accurate results from pooling procedures.
3
Pooling situations Horizontally Partitioned: Agencies have different records but same variables. Purely Vertically Partitioned: Agencies have same records but different variables. Partially Overlapping, Vertically Partitioned: Agencies have different records and different variables, with some common records and variables.
4
Horizontal partitioning Karr, Lin, Sanil, Reiter (JCGS, 2005) Secure data integration -- shares data but protects sources. -- allows any analysis to be done. Secure summation -- shares sums without sharing data -- allows regressions, association rules, classifications, clustering
5
Secure summation Obtain without sharing individual values 1. Agency A passes (x + R) to 2 nd agency. 2. Agency B adds its x to this value and passes sum to Agency C. 3. Process continues until all agencies have added their x. 4. Agency A subtracts R from the sum.
6
Purely vertical partitioning Secure dot/matrix product -- shares dot/matrix products without sharing data. -- allows regressions, association rules, classification, clustering. -- assumes semi-honest. Synthetic data approaches -- share synthetic copies of data across agencies. -- allows any analysis when distributions used to generate data are accurate. -- generates public use data file.
7
Secure dot/matrix products Karr, Lin, Reiter, Sanil (NISS tech. report) Compute not revealing individual values 1. Agency A passes where for all i,j to Agency B. 2. Agency B sends to Agency A. 3. Agency A computes
8
Purely vertical partitioning Secure dot/matrix product -- share dot/matrix products without sharing data. -- allows regressions, association rules, classification, clustering. -- assumes semi-honest. Synthetic data approaches -- share synthetic copies of data across agencies. -- allows any analysis when distributions used to generate data are accurate. -- generates public use data file.
9
Synthetic data approach Kohnen (PhD thesis, 2005) Assume X not sensitive. Pass real X to Agency B. Agency B simulates multiple copies of Y for from f(Y|X) estimated using the dataset from Agency A. Pass the copies to Agency A.
10
Synthetic data approach Kohnen (PhD thesis, 2005) Agency A uses partially synthetic data methods (Reiter, Surv. Meth., 2003) for inferences based on Y|X. Agency A can release fully synthetic data to public.
11
Synthetic data approaches Kohnen (PhD thesis, 2005) 1. Agency A simulates disguiser X that look like the genuine values of X, ideally from distribution close to f(X|Y). Pass real X and disguisers to Agency B. 2. Agency B simulates multiple copies of Y for each f(Y|X) estimated using the datasets from Agency A. Pass the copies to Agency A.
12
Synthetic data approaches Kohnen (PhD thesis, 2005) Agency A discards disguisers and uses partially synthetic data methods (Reiter, Surv. Meth., 2003) to obtain inferences using the real X. Agency A can release fully synthetic data to public.
13
Partially overlapping, vertical partitioning Secure EM algorithm -- uses secure dot products -- continuous data: estimate covariance matrix for multivariate normal data -- categorical data: estimate parameters of log-linear models
14
Limitations of methods: Defining a research agenda Secure computation methods: - How to specify models without viewing data? - What if sophisticated models needed? - How to do posterior simulation? Synthetic data methods: - How to generate good disguisers? All methods: - How to incorporate matching errors, differences in data quality and definitions? - How to account for disclosure risks from models that “fit too well?”
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.