Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 1 Anonymised Integrated Event History Datasets for Researchers Johan Heldal Statistics Norway.

Similar presentations


Presentation on theme: "1 1 Anonymised Integrated Event History Datasets for Researchers Johan Heldal Statistics Norway."— Presentation transcript:

1 1 1 Anonymised Integrated Event History Datasets for Researchers Johan Heldal Statistics Norway

2 2 Contents The Social Security Database FD-trygd About the Norwegian Social Science Data Service Event history data from FD-trygd Data to researchers – principles laid down Anonymisation Establishing measure of disclosure risk

3 3 Social security event history data base FD-trygd Contains all events related to the Norwegian social security system for every person with residence I Norway since 1992. All benefits and associated variables All dates for events All demographic histories (birth/immigration, sex, marital status, children) All places of residence Are kept in different oracle “tables” that can be exactly merged by a personal identification code.

4 4 FD-trygd Has high data quality Can also be merged to –Education histories –Incomes from the yearly assessment Is extremely valuable for research purposes.

5 5 NSD The Norwegian Social Science Data Services Established (1971) to simplify access to data for researchers and students in Norway. Distributes anonymised survey datasets from Statistics Norway and others. NSD requested (2009) a 20 % sample of all individual histories in FD-trygd for anonymisation to researchers at their own premises. The request has been approved by Statistics Norway, but –SN has (by law) the responsibility for the confidentiality –Anonymisation and dissemination must take place according to rules set by SN.

6 6 A hypothetical event history for a woman immigrating to Norway 22. November 1999. Bold face in cell indicates the changed variable. Event history variables Serial number Birth year Dates Resi- dent Sex Marital status Employ- ment Resi­- dence Child- ren Benefits 12345671975220199YesFUEmplOslo0… 12345671975170601YesFUSickOslo086 000… 12345671975170901YesFUEmplOslo0… 12345671975080202YesFUMaternityOslo1… 12345671975250502YesFMMaternityOslo1252 000… 12345671975081102YesFMEmplOslo1… 12345671975310303YesFMEmplSki1… 12345671975220205YesFMUnemplSki1… 12345671975270705YesFMUnemplSki2… 12345671975090407YesFMEmplSki2… 12345671975310508YesFDEmplSki2… 12345671975…………………

7 7 Classes of event-variables in FD-trygd Demographic variables Pensions Supports Rehabilitation Labour market Education Income from assessment

8 8 Principles laid down Combining tables in FD-trygd is Data Integration. Must respect the principles laid down in Principles and guidelines on Confidentiality Aspects of Data integration Undertaken for Statistical or Related Research Purposes It should not with reasonable means be possible to identify someone in the dataset. Important for SN to establish clear rules for NSD’s anonymisation based on this.

9 9 Rules should Manage realistic disclosure scenarios Be able to stand scrutiny from investigating journalists Be transparent for the researchers Adapt to each researchers needs as well as possible (Need to know principle) Creating one complete anonymised 20 % sample is out of question.

10 10 Restrict information to researchers wrt. Sample size (from the 20% sample) Variable scope Length of event histories Detail for each variable Details for dates If too large: Different samples with different variables for different analyses To find the best balance is a challenge

11 11 For strict rules: Need to establish a model for risk for this type of data. Can the μ -Argus risk measure (Franconi &Polettini 2004) be extended to event history data? Must take into account increased risk from –Identifying variables given at all times –Model for memory on event history –Precision of times for events

12 12 Preliminary rules Limit sample sizes to 10 percent of the target population as represented in the 20 % sample, i.e. about 2 % of the total target. Restrict detail for the most visible identifying variables. Round economic benefits associated with states Restrict all datings to YYYYMM. Only five levels for education Positive incomes only in quintiles of distribution. NSD has started test deliveries based on these rules The tests will be evaluated next year.

13 13 With a good measure of risk the researchers could be able to choose larger sample size and less variable scope or variable detail or smaller sample size and larger variable scope and detail as long as the total risk stays within a limit. We hope the experiences from the test deliveries will be useful here

14 14 Thank you for your attention


Download ppt "1 1 Anonymised Integrated Event History Datasets for Researchers Johan Heldal Statistics Norway."

Similar presentations


Ads by Google