Census Data for Transportation Planning—Some Thoughts George Duncan 11/15/2018
Three Themes Confidentiality Disclosure limitation methods Multiple sources of Information 11/15/2018
Confidentiality Identified by Jay Waite of Census More restrictions? Concern of George Schoener of DOT Repeated theme throughout conference 11/15/2018
Privacy “Privacy is dead, deal with it,” Sun MicroSystems CEO Scott McNealy But like the rattlesnake, it can bite you … 11/15/2018
Why Confidentiality Matters Ethical: Keeping promises; basic value tied to privacy concerns of solitude, autonomy and individuality Pragmatic: Without confidentiality, respondent may not provide data; worse, may provide inaccurate data Legal: Required under law Fundamental to the workings of a democratic system and its respect for the individual. Pragmatically, patients fear not getting it. Without assurances of confidentiality will some patients shy from providing information about potentially embarrassing conditions or activities? 11/15/2018
Restricted Access Restricted Data There is an inverse relationship between restricted data and restricted access: as data restrictions increase, fewer restrictions on access are needed, and vice versa. Some user needs cannot be met with restricted data, however, because the data transformation required to ensure data confidentiality is so extreme that the restricted data are useless for inference purposes. In response, an agency may allow more restricted access to less restricted data. Conversely, to ensure confidentiality, access may have to be so restricted that a legitimate user cannot, as a practical matter, obtain the data. Again in response, an agency may allow less restricted access to more restricted data. Confidentiality protection shields must be both effective and faithful to the original data. 11/15/2018
R-U Confidentiality Map Data Utility U Disclosure Risk R Original Data Maximum Tolerable Risk Released Data Disclosure Risk: Information about Confidential Items Data Utility: Information about legitimate objects (DATA) No Data 11/15/2018
R-U Confidentiality Map Attention from Census Data Utility U Disclosure Risk R Original Data Maximum Tolerable Risk Attention from Transportation Community Disclosure Risk: Information about Confidential Items Data Utility: Information about legitimate objects (DATA) No Data 11/15/2018
Questions for Joint Consideration Does old data decay and so pose less disclosure risk? Can providing seasonal data increase data utility without increasing disclosure risk? How can the transportation community be educated about the impact of disclosure limitation? What specifically causes increased disclosure risk? How to improve access to Research Data Centers? 11/15/2018
Disclosure Limitation for Tables Coarsening Aggregate attributes Suppress some cells Publish only the marginal totals Suppress the sensitive cells, plus others as necessary Perturb some cells Round Fuzz 11/15/2018
Perturbation Methods Controlled rounding (Cox) Cyclic perturbation (Duncan & Roehrig) Stochastically modify cell values in a known way, allowing a Bayesian analysis of cell value distributions 11/15/2018
Multiple Sources of Information Alan Pisarski, Thomas Palumbo Research and Academic Breakout Group Session 8 “Mixing apples and oranges produces fruit salad” “Institutionalize creative quilting” (Nancy McGuckin) 11/15/2018
“Institutionalize Creative Quilting” Quilting is the systematic use of multiple data sources Creative means drawing on the available resources to address a particular problem Institutionalize means establishing an apparatus whereby creative quilting can consistently happen 11/15/2018
Quilting ACS can’t provide all transportation planning data needs—national data not be all and end all Integrate with other data sets to maximize combined utility to support decision making Complement with NHTS, private sector databases, public domain property data, satellite imagery, etc Build in dynamic databases that assess changing world—close to real-time data Need new methodologies to produce a high-quality data quilt (data stitching) 11/15/2018
Creative Constrain focus to particular decision making and policy needs Search broadly and be imaginative in use of data products 11/15/2018
Institutionalize Not left to individual planner/researcher Identify commonalities and differences among data users, say MPOs Find representation for users with common needs, say AASHTO Establish ongoing links with information organizations, say Census 11/15/2018
Scenario of Potential Several MPOs have similar data needs AASHTO solicits data needs and formulates a “quilt” to cover them AASHTO negotiates with Census for way to develop disclosure limited data product Access to Census Research Data Center to validate statistical inferences from original data (“red light”/”green light”) 11/15/2018
Opportunities and Threats Opportunity: Together, Census and Transportation support respective institutional needs for confidentiality and access to quality data Threat: Divided, Census and Transportation fall to privacy appeals or inability to cope with methodological challenges of disclosure limitation and complex, new data sources 11/15/2018