Presentation is loading. Please wait.

Presentation is loading. Please wait.

Screening Data for Disclosure Risk and the Research behind One Possible Tool Kristine Witkowski Research support from the National Institute of Child Health.

Similar presentations


Presentation on theme: "Screening Data for Disclosure Risk and the Research behind One Possible Tool Kristine Witkowski Research support from the National Institute of Child Health."— Presentation transcript:

1 Screening Data for Disclosure Risk and the Research behind One Possible Tool Kristine Witkowski Research support from the National Institute of Child Health and Human Development, Grant 5 P01 HD045753, supplement to the project Human Subject Protection and Disclosure Risk Analysis.

2 Finding a Needle in a Haystack: The Theoretical and Empirical Foundations of Assessing Disclosure Risk for Contextualized Microdata Kristine Witkowski Research support from the National Institute of Child Health and Human Development, Grant 5 P01 HD045753, supplement to the project Human Subject Protection and Disclosure Risk Analysis.

3 Disclosure Standards of Microdata: Population-Size Threshold of Geographies Established by U.S. Census Bureau in 1940, with expansion of guidelines in 1970 and after Directly identify geographic areas with 100, 250, or 400K persons Higher thresholds offset risk that is heightened by complex surveys and elevated sampling rates

4 Motivation of Study Limited population-size of rural counties, tracts, and blockgroups precludes safe release of their identifiers Possible solution: Contextual data, instead of directly identifying locations Lack of published research, little known about risk of contextualized microdata

5 Questions How do measures of geographic context affect the chances that a study participant’s identity is revealed? What factors should be considered in assessing disclosure risk? How can we design studies so that geographic data can be shared?

6 Needle-in-Haystack Elements Field of Total Population == Look-alike Geographies Haystacks of Subgroups == Look-alike Persons Needle as Target Respondent

7 Identifying Locations Increases Risk Subjects (needles) are easy to identify in a known neighborhood (field) because few people have the same traits (haystacks). Two approaches to reducing risk – Enlarge size of known location, by releasing county instead of neighborhood – Release attributes of geographies, without their identifiers

8 Contextual Data in Social Science Research Geographic attributes Neighborhood characteristics that reflect economic conditions, health services, etc. Institutional attributes Schools, hospitals, prisons

9

10 Laying the Groundwork for the Search Aggregation Process –Number of Look-Alike Geographies –Number of Look-Alike Persons Intruder Search Behavior –Search Behavior –Limited vs. Full Search –Search Priorities Ability to Assign Name –Accuracy of Haystack Size –Accuracy of ID Files –Coverage Error

11

12 Methodology Microdata composed of synthetic sample of persons, reflecting distribution of U.S. population Attributes of geographies –Metropolitan status –Five contextual variables at 3 spatial scales, collapsed into 10% categories U.S. counties, census tracts, and blockgroups % Persons, Non-Hispanic White; % Persons, Foreign- Born; % Persons, In-Poverty; % Housing-Units, Owner- Occupied, % Civilian Labor Forc e, Unemployed 0 – 9%, 10 – 19%,... 90 – 100%

13 Methodology Number of look-alike geographies, those having any subpopulation members of 20-year-old males, non-Hispanic whites & blacks 100% 2000 U.S. census counts of subpopulations characterized by age, sex, & race/ethnicity Population likely under- and over-counted, derived from hard-to-count scores (Bruce & Robinson 2003) Difference in rentership between race/ethnic groups

14 Methodology Calculate summary statistics that reflect the distribution of survey respondents within –Individual geographic units –Aggregated contexts –Less populated areas, “sparse”, <100K –Highly populated areas, “dense”, 100K+

15

16 Conclusions Contextualized microdata may be a viable method of safely distributing geographically rich information, particularly for county-level information Potentially important role of coverage error in ensuring the anonymity of respondents

17 Balancing Risk and Utility

18

19

20

21

22 References Witkowski, K. M. 2008a. Disclosure risk of contextual data: The role of spatial scale, identified geography, and measurement detail in public-use files. Revise and resubmitted to: Public Opinion Quarterly. http://hdl.handle.net/2027.42/58626http://hdl.handle.net/2027.42/58626 ----. 2008b. Disclosure risk components of contextualized microdata: Identifying unique geographic units and the implications for pinpointing survey respondents. Revise and resubmitted to: Sociological Methodology. http://hdl.handle.net/2027.42/58627http://hdl.handle.net/2027.42/58627 ----. 2008c. Finding a needle in a haystack: The theoretical and empirical foundations of assessing disclosure risk for contextualized microdata. Submitted to: Journal of Official Statistics. http://hdl.handle.net/2027.42/58628 http://hdl.handle.net/2027.42/58628


Download ppt "Screening Data for Disclosure Risk and the Research behind One Possible Tool Kristine Witkowski Research support from the National Institute of Child Health."

Similar presentations


Ads by Google