Screening Data for Disclosure Risk and the Research behind One Possible Tool Kristine Witkowski Research support from the National Institute of Child Health.

Slides:



Advertisements
Similar presentations
The Samples of Anonymised Records: Understanding Individual differences Mark Brown.
Advertisements

GETTING RURAL RIGHT IN THE AMERICAN HOUSING SURVEY American Housing Survey User Conference March 8, 2011 Washington DC.
Dissemination of U.S. Census Data and Results: The role of ICPSR First Conference of Al-Khawarezmi Committee on Statistics Doha, Qatar 6-8 December 2010.
Wisconsin HIV/AIDS Surveillance Annual Review: Slide Set New diagnoses, prevalent cases, and deaths through December 2014 April 2015 P Wisconsin.
BEN ANDERSON PROJECT MANAGER UNIVERSITY OF LOUISVILLE CENTER FOR HAZARDS RESEARCH AND POLICY DEVELOPMENT Using Dasymetric Mapping.
Chuck Humphrey Data Library University of Alberta.
11 ACS Public Use Microdata Samples of 2005 and 2006 – How to Use the Replicate Weights B. Dale Garrett and Michael Starsinic U.S. Census Bureau AAPOR.
The American Community Survey (ACS) Lisa Neidert NPC Workshop: Analyzing Poverty and Socioeconomic Trends Using the American Community Survey July 12 –
TEMPLATE DESIGN © Toxocara Infection in the United States: The Relevance of Poverty, Geography and Demography as Risk.
2010 Census and ACS in Oregon: Results and Resources Census Data Workshops November, 2011 Charles Rynerson Census State Data Center Coordinator Population.
The American Community Survey (ACS) Lisa Neidert McCormick Specialized Training Institute October , 2009.
The American Community Survey (ACS) Lisa Neidert NPC Workshop: Analyzing Poverty and Socioeconomic Trends Using the American Community Survey June 22 –
Sexual Risk Behaviors of Self- identified and Behaviorally Bisexual HIV+ Men. By: Matt G. Mutchler, PhD; Miguel Chion, MD, MPH; Nancy Wongvipat, MPH; Lee.
Recent Advances In Confidentiality Protection – Synthetic Data John M. Abowd April 2007.
Statistics and Data for Marketing Data Library, Rutherford North 1 st Floor Chuck Humphrey Data Library October 27, 2008.
Your Community by the Numbers Accessing the most current and relevant Census data Alexandra Barker Data Dissemination Specialist U.S Census Bureau New.
POLICIES AND PROCEDURES FOR ARCHIVING DATA IN BURUNDI.
Mapping Rates and Proportions. Incidence rates Mortality rates Birth rates Prevalence Proportions Percentages.
Educational Characteristics of Prisoners: Data from the ACS Stephanie Ewert & Tara Wildhagen U.S. Census Bureau Population Association of America Washington,
Census Transportation Planning Products (CTPP) Data Products June 18, 2010.
Aspects of the National Health Interview Survey (NHIS) Chris Moriarity National Conference on Health Statistics August 16, 2010
Employment and Earnings Outcomes for Young Adult Bachelor’s Degree Holders: Findings From the American Community Survey 25th Annual STATS-DC 2012 Data.
Meagan Hatton Department of Geography POVERTY AND HIV/AIDS IN EAST TEXAS.
American Community Survey Presented at the Meeting of the National Neighborhood Indicators Partnership Susan Schechter May
Issues Related to Data Dissemination in Official Statistics Presented at the European Conference On Quality in Official Statistics Helsinki, Finland May.
Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.
Liesl Eathington Iowa Community Indicators Program Iowa State University October 2014.
U.S. Decennial Census Finding and Accessing Data Summer Durrant October 20, 2014 Data & Geographical Information Librarian Research Data Services
Father Involvement and Child Well-Being: 2006 Survey of Income and Program Participation (SIPP) Child Well-Being Topical Module 1 By Jane Lawler Dye Fertility.
Exhibit 1. Uninsured Rates for Blacks and Hispanics Are One-and-a-Half to Two Times Higher Than for Whites (2013) Notes: Black and white refer to black.
Adaptive Kernel Density in Demographic Analysis Richard Lycan Institute on Aging Portland State University.
FAEIS Project User Opinion Survey 2005 Thursday, June 23, 2005 Washington, D.C. H. Dean SutphinYasamin Miller ProfessorDirector, SRI Agriculture & Extension.
Race Disparities in the Burden of Disease: The Tip of the Ice Berg Mark Hayward Professor of Sociology and Demography The Pennsylvania State University.
Disclosure Avoidance: An Overview Irene Wong ACCOLEDS/DLI Training December 8, 2003.
Using IPUMS.org Katie Genadek Minnesota Population Center University of Minnesota The IPUMS projects are funded by the National Science.
1 Statistical Disclosure Control for Communal Establishments in the UK 2011 Census Joe Frend Office for National Statistics.
Using the ACS: Issues with studying small areas and change over time Presented to Association of Public Data Users January 20, 2011.
American Community Survey Overview September 4, 2013 Tim Gilbert American Community Survey Office.
Post-Secondary Success – the Growing Challenge… A Closer Look Prepared by the Community Service Council with support from the Metropolitan Human Services.
American Community Survey Maryland State Data Center Affiliate Meeting September 16, 2010.
American Community Survey (ACS) 1 Oregon State Data Center Meeting Portland State University April 14,
Using ACS and Census 2010 in Communities and Neighborhoods: Guidelines and Tools POPULATION REFERENCE BUREAU | PRESENTATION BY MARK MATHER.
A Picture of Young Children in the U.S. Jerry West, Ph.D. National Center for Education Statistics Institute of Education Sciences EDUCATION SUMMIT ON.
Developing Survey Handbooks as Educational Tools for Data Users Presented at the European Conference on Quality in Official Statistics May 2010 Deborah.
1 Risk Factors for Children in the U.S., States, and Metropolitan Areas: Data from the 2007 American Community Survey Robert Kominski, U.S. Census Bureau.
New and easier ways of working with aggregate data and geographies from UK censuses Justin Hayes UK Data Service Census Support.
WP 19 Assessment of Statistical Disclosure Control Methods for the 2001 UK Census Natalie Shlomo University of Southampton Office for National Statistics.
Disclosure Avoidance at Statistics Canada INFO747 Session on Confidentiality Protection April 19, 2007 Jean-Louis Tambay, Statistics Canada
1 IPAM 2010 Privacy Protection from Sampling and Perturbation in Surveys Natalie Shlomo and Chris Skinner Southampton Statistical Sciences Research Institute.
American Community Survey (ACS) Product Types: Tables and Maps Samples Revised
Small Area (e.g. County-level) Estimates. Concepts Considerable interest in small area estimates of uninsured (e.g. County level) Two estimation methods.
United Nations Workshop on Revision 3 of Principles and Recommendations for Population and Housing Censuses and Evaluation of Census Data, Amman 19 – 23.
Creating Open Data whilst maintaining confidentiality Philip Lowthian, Caroline Tudor Office for National Statistics 1.
Sociology 343 Chuck Humphrey Data Library University of Alberta.
The Assessment of Vital Statistics in China China, Quan Shaowei 2014 May Daejeon.
2011 Census Data Quality Assurance Strategy: Plans and developments for the 2009 Rehearsal and 2011 Census Paula Guy BSPS 10 th September 2009.
Exploring Microsimulation Methodologies for the Estimation of Household Attributes Dimitris Ballas, Graham Clarke, and Ian Turton School of Geography University.
Methods for creating indices of child well-being: Examples from the National Survey of America’s Families Sharon Vandivere, Kristin Anderson Moore, Laura.
Quality of Race and Hispanic Origin Reporting on Death Certificates in the US Elizabeth Arias, Ph.D. Mortality Statistics Branch Division of Vital Statistics.
Census 2011 – A Question of Confidentiality Statistical Disclosure control for the 2011 Census Carole Abrahams ONS Methodology BSPS – York, September 2011.
Stephen Nkansah-Amankra, PhD, MPH, MA 1, Abdoulaye Diedhiou, MD, PHD, H.L.K. Agbanu, MPhil, Curtis Harrod, MPH, Ashish Dhawan, MD, MSPH 1 University of.
Data disclosure control Nordic Forum for Geography and Statistics Stockholm, 10 th September 2015.
2010 Census Data for Michigan Presentation to the House Redistricting and Elections Committee April 12, 2011.
Reconciling Confidentiality Risk Measures from Statistics and Computer Science Jerry Reiter Department of Statistical Science Duke University.
Intimate Partner Violence in Peru: An assessment of competing models Corey S. Sparks Alelhie Valencia Department of Demography Institute for Demographic.
Expanding the Role of Synthetic Data at the U.S. Census Bureau 59 th ISI World Statistics Congress August 28 th, 2013 By Ron S. Jarmin U.S. Census Bureau.
Census Data-Strictly Business?:
Dissemination Workshop for African countries on the Implementation of International Recommendations for Distributive Trade Statistics May 2008,
Presentation transcript:

Screening Data for Disclosure Risk and the Research behind One Possible Tool Kristine Witkowski Research support from the National Institute of Child Health and Human Development, Grant 5 P01 HD045753, supplement to the project Human Subject Protection and Disclosure Risk Analysis.

Finding a Needle in a Haystack: The Theoretical and Empirical Foundations of Assessing Disclosure Risk for Contextualized Microdata Kristine Witkowski Research support from the National Institute of Child Health and Human Development, Grant 5 P01 HD045753, supplement to the project Human Subject Protection and Disclosure Risk Analysis.

Disclosure Standards of Microdata: Population-Size Threshold of Geographies Established by U.S. Census Bureau in 1940, with expansion of guidelines in 1970 and after Directly identify geographic areas with 100, 250, or 400K persons Higher thresholds offset risk that is heightened by complex surveys and elevated sampling rates

Motivation of Study Limited population-size of rural counties, tracts, and blockgroups precludes safe release of their identifiers Possible solution: Contextual data, instead of directly identifying locations Lack of published research, little known about risk of contextualized microdata

Questions How do measures of geographic context affect the chances that a study participant’s identity is revealed? What factors should be considered in assessing disclosure risk? How can we design studies so that geographic data can be shared?

Needle-in-Haystack Elements Field of Total Population == Look-alike Geographies Haystacks of Subgroups == Look-alike Persons Needle as Target Respondent

Identifying Locations Increases Risk Subjects (needles) are easy to identify in a known neighborhood (field) because few people have the same traits (haystacks). Two approaches to reducing risk – Enlarge size of known location, by releasing county instead of neighborhood – Release attributes of geographies, without their identifiers

Contextual Data in Social Science Research Geographic attributes Neighborhood characteristics that reflect economic conditions, health services, etc. Institutional attributes Schools, hospitals, prisons

Laying the Groundwork for the Search Aggregation Process –Number of Look-Alike Geographies –Number of Look-Alike Persons Intruder Search Behavior –Search Behavior –Limited vs. Full Search –Search Priorities Ability to Assign Name –Accuracy of Haystack Size –Accuracy of ID Files –Coverage Error

Methodology Microdata composed of synthetic sample of persons, reflecting distribution of U.S. population Attributes of geographies –Metropolitan status –Five contextual variables at 3 spatial scales, collapsed into 10% categories U.S. counties, census tracts, and blockgroups % Persons, Non-Hispanic White; % Persons, Foreign- Born; % Persons, In-Poverty; % Housing-Units, Owner- Occupied, % Civilian Labor Forc e, Unemployed 0 – 9%, 10 – 19%, – 100%

Methodology Number of look-alike geographies, those having any subpopulation members of 20-year-old males, non-Hispanic whites & blacks 100% 2000 U.S. census counts of subpopulations characterized by age, sex, & race/ethnicity Population likely under- and over-counted, derived from hard-to-count scores (Bruce & Robinson 2003) Difference in rentership between race/ethnic groups

Methodology Calculate summary statistics that reflect the distribution of survey respondents within –Individual geographic units –Aggregated contexts –Less populated areas, “sparse”, <100K –Highly populated areas, “dense”, 100K+

Conclusions Contextualized microdata may be a viable method of safely distributing geographically rich information, particularly for county-level information Potentially important role of coverage error in ensuring the anonymity of respondents

Balancing Risk and Utility

References Witkowski, K. M. 2008a. Disclosure risk of contextual data: The role of spatial scale, identified geography, and measurement detail in public-use files. Revise and resubmitted to: Public Opinion Quarterly b. Disclosure risk components of contextualized microdata: Identifying unique geographic units and the implications for pinpointing survey respondents. Revise and resubmitted to: Sociological Methodology c. Finding a needle in a haystack: The theoretical and empirical foundations of assessing disclosure risk for contextualized microdata. Submitted to: Journal of Official Statistics.