Characterization and Management of Multiple Components of Cost and Risk in Disclosure Protection for Establishment Surveys Discussion of Advances in Disclosure Protection: Releasing More Business and Farm Data to the Public John L. Eltinge Bureau of Labor Statistics ICES III Session 54 - June 21, 2007
2 Disclaimer and Acknowldegements: The views expressed in this paper are those of the author and do not necessarily reflect the policies of the Bureau of Labor Statistics. The author thanks the authors for the opportunity to review their papers and slides; and Steve Cohen and Larry Ernst for helpful comments on some of the issues considered here.
3 Each paper covers fascinating and important work Special importance for establishment surveys: Dominating units; unequal selection probabilities; plethora of subpopulations, cells Many potential topics for discussion I.Background: Manage Costs and Risks A. Disclosure Protection as a Form of Technology B. Multiple Stakeholders, Multiple Utility Functions II.Comments on Individual Papers
4 I.Background: Management of Costs and Risks in Disclosure Protection A. View Disclosure Protection as a Form of Technology Practical implication: Need to examine 1. Tangible costs to data users, producers 2. Risks to data users, providers, users, producers and other intermediaries a. Funding, primary dissemination b. Fieldwork, other direct contacts with data providers c. Secondary dissemination (U.S. example: states under fed-state programs)
5 B. Multiple Stakeholders and Multiple Utility Functions 1. Even within a given class, stakeholders may have very different utility functions and risk profiles a.Respondents: Publicity-phobic and publicity-philic b.Data users: - Some consider only published point estimates - Some want highly sophisticated inference methods 2. For disclosure: Utility function of data intruder? - May involve very high tolerance for error
6 II.Comments on Individual Papers A.Morehart and Towe 1.Note emphasis on system development - Impact of the underlying statistical science depends on the technology for implementation 2.Sec 1: expand access to farm survey data as a public good Costs incurred by users for partial data access: - Effort (more sophisticated statistical analysis) - Inconvenience (travel to data center) - Loss of some data quality, efficiency (added noise, cell suppression)
7 3.Relative weight assigned by NASS to: a.Privacy of respondents a.Production of standard aggregate reports a.Specific tasks in economic research 3.Low demand for advanced statistical analysis component (fewer than 30 requests in 30 months) - Limited general interest, or - Reflects high threshold requirement for a large suite of analytic tools?
8 B.Massell and Funk 1.Added-noise method of the Evans, Zayatz and Slanta (1998) has a very strong appeal a. Conceptual simplicity b. Coherence of resulting point estimates across, e.g., multiple levels of aggregation 2.Tuning of the added-noise process to specific high- priority sets of problem cells? 3.Possible extension: Weighted influence function associated with a given observation for a cell-level estimator - Impact of unit-level added noise?
9 4. Methods in All Three Papers (e.g., Weight Trimming, Added Noise and Controlled Tabular Adjustment) Lead to Examination of: a. Realistic inferential goals for disclosure-protected data? - Original true distribution (and parameters thereof)? - Trimmed version of the original dist? - Unspecified dist in neighborhood of original dist? (cf. similar comments on outliers by Pat Cantwell) b. Resulting magnitude of inferential error induced by disclosure-protection tool, relative to natural sources of error (sampling, nonresponse, measurement)?
10 C.Cox 1.Careful attention to performance criteria a. Link with utility functions of key stakeholders? b. Constrained optimization or satisficing: Degree of domination by constraints, optimality criteria? One stated objective: Analyses on original vs. adjusted data yield comparable results a. Emphasize: Many possible analyses b. Possible baselines (preservation and sensitivity): - Original data - Results from fit of loglinear model with moderate number of interactions, possible extramultinomial noise
11 3.Disclosure protection methods often involve complex constrained optimization approaches, but: a.Coherence of implied utility function with primary stakeholders concerns? a.Uncertainty in utility functions, constraints? a.True uncertainty vs. lack of a priori articulation of utility functions? 4.Characterization of cells: Sensitive/Not sensitive - Consistent with largely deterministic language in legislation and regulations
12 III.Closing Remarks Disclosure protection as a form of technology B.Risk is a fact of life for any technology, including disclosure protection technology C.Critical importance of a high level of sophistication regarding 1. Multiple stakeholders, utility functions and constraints 2. Mathematical solutions 3. Technological implementation D.Also consider operational risk 1. Will a given disclosure protection method be carried out as specified? 2. Distribution of the resulting impact on tangible costs and risks to stakeholders?