Download presentation
Presentation is loading. Please wait.
Published byCaroline Ross Modified over 8 years ago
1
1 New Implementations of Noise for Tabular Magnitude Data, Synthetic Tabular Frequency and Microdata, and a Remote Microdata Analysis System Laura Zayatz U.S. Census Bureau 4600 Silver Hill Road Washington, DC 20233-9100 301-763-4955 Fax 301-763-8399 laura.zayatz@census.gov
2
2 Legal Requirements and the Balancing Act Title 13, U.S. Code and the Confidential Information Protection and Statistical Efficiency Act (CIPSEA) of 2002 Publish as much valuable statistical information as possible without violating the confidentiality of respondents Preserve data utility while avoiding disclosure
3
3 This Presentation 1.Noise for Tabular Magnitude Data 2.Synthetic Tabular Frequency and Microdata 3.Remote Microdata Analysis System
4
4 This Presentation A.Introduction to the method B.What happened with real data C.How we altered the method D.Current uses of the method on real data products
5
5 Noise for Tabular Magnitude Data: Introduction to the Method Perturb each establishment’s underlying microdata by a small amount, e.g. 10%, randomly up or down prior to table creation Sensitive cells needing protection end up being changed by a large amount Non-sensitive cells end up being changed by a small amount Simple procedure, values can be shown for all cells, guarantees additivity, no coordination problems for related (overlapping) tables
6
6 Noise for Tabular Magnitude Data: Introduction to the Method To perturb an establishment’s value by about 10%, multiply that value by a random number close to 1.1 or 0.9 Distribution must be symmetric about 1 for no bias All establishments within the same company are perturbed in the same direction Can incorporate the increase in variance into published coefficients of variation
7
7 Noise for Tabular Magnitude Data: What Happened with Real Data Because of randomness, the method can occasionally add excessive amounts of noise to some non-sensitive cells (a problem that cell suppression avoids for non- suppressed cells and controlled tabular adjustment can minimize) Is there anything we can do to avoid this problem or at least improve results?
8
8 Noise for Tabular Magnitude Data: What Happened with Real Data US Census Bureau magnitude data is almost always published in rounded form (integer form representing thousands or millions) Noise changes individual response values by a small percentage Rounding can remove the effect of noise on small response values Is that OK --- does rounding provide enough protection --- or should additional steps be taken to protect such small values?
9
9 Noise for Tabular Magnitude Data: How we Altered the Method Balanced Noise (See Massell and Funk) Experiment and choose a table(s) --- quite often a lower level table (in the hierarchy) is a good choice and has a trickle up effect Random noise for estabs in sensitive cells and in companies represented in more than 1 cell For others, use a sort to choose noise directions to minimize change to non- sensitive cells
10
10 Noise for Tabular Magnitude Data: How we Altered the Method Currently testing various modifications to standard rounding techniques Options include rounding underlying microdata values and rounding tabulated cells values Want to ensure standard rounding does not undo the protection provided by the noise Ceiling/Floor techniques seem to work well, but results differ for different data products
11
11 Noise for Tabular Magnitude Data: Current Uses on Real Data Products Done:Quarterly Workforce Indicators Non-Employer Data Products Near Future:Commodity Flow Survey Census of Island Areas Survey of Business Owners Under Study:County Business Patterns
12
12 Synthetic Tabular Frequency and Microdata: Introduction to the Method Posterior predictive models generate synthetic data with many of the same statistical properties as the original data Sequential regression imputation, one variable in one record at a time (blank and impute variables causing a disclosure risk for a given record) Full or partial synthesis, demographic or economic, tables or microdata, one or more implicates
13
13 Synthetic Tabular Frequency and Microdata: What Happened with Real Data Problems with relationships between variables within a data set Records of households linked to records of all people within the household (father, mother, son, daughter, etc.) Structurally missing (blank) values because of skip patterns in survey instrument Examples: people under age 15 cannot have income, a mother cannot be 6 years older than her child
14
14 Synthetic Tabular Frequency and Microdata: How we Altered the Method Impute some of the structurally missing values, but then restore them to missing for standard imputation and edits For one product - additional layer of programming that became a nine-level collection of parent-child relationships to enforce all constraints
15
15 Synthetic Tabular Frequency and Microdata: Current Uses on Real Data Products Done:SSA Earnings and CB SIPP Data “On The Map” ACS Group Quarters Data Under Study:ACS Household Data Special Tabs for Veterans
16
16 Remote Microdata Analysis System: Introduction to the Method Advanced Query System allow users to generate tables from Census 2000 data Request passes through 2 firewalls to previously swapped, recoded, and topcoded files; tables are generated and electronically reviewed for disclosure problems; if none are found the results are sent to the user Can we extend this to data from demographic surveys and other types of statistical analyses?
17
17 Remote Microdata Analysis System: What Happened with Real Data Enabled or disabled system? We chose enabled Disabled is more flexible for the user but may require “babysitting” Enabled is more restricted in types of analyses but can be available to more people without strict monitoring Users choose from lists of data sets, geographic areas, universes, analyses, and variables (system writes the code)
18
18 Remote Microdata Analysis System: How we Altered the Method In looking for disclosure problems, we first focused on the model statements, but later realized the need to look at the underlying data tables (marginal totals of size 1 in particular) in various types of analyses Working on methods to best identify “cut points” in the detail of short, medium, and long lists of continuous variables that need to be categorized
19
19 Remote Microdata Analysis System: Current Uses on Real Data Products Done:Advanced Query System available to Census Bureau State Data Centers and Census Information Centers and researchers who request an account Under Study:Extended Microdata Analysis System being tested with American Community Survey and Current Population Survey
20
20 Conclusion Many recent developments in disclosure avoidance at the US Census Bureau Using the noise technique for several tabular magnitude data products Releasing several products based on partially synthetic data AQS is being used widely and work continues on the MAS It takes time, but it is worth the effort
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.