1 Statistical Disclosure Control for Communal Establishments in the UK 2011 Census Joe Frend Office for National Statistics
2 Definitions UK SDC Policy decision Household methodology Communal establishment methodology Summary Q&A Overview
3 Communal Establishments (CEs): An establishment providing managed residential accommodation CE type: The broadened category in which a CE will appear as in the Census output tables Residents: All persons living in a CE Client: Non-staff residents that the CE caters for (eg. patients of a hospital, clients of a hotel etc.) Staff: Staff / Owners living in a CE Family: Family members / partners that live in a CE with either a member of staff or a client Definitions
4 104 Delivery Groups (DGs) in England & Wales (≈ 500,000 persons & 200,000 households per DG) LADs in a DG ≈ 20 MSOAs in an LAD (≈ 7,500 persons & 3,000 households per MSOA) Geography DG LAD MSOA
5 Registrars General’s Agreement, November 2006 In line with the Statistics and Registration Service Act, 2007 (SRSA) More importance placed on protecting attribute disclosure than identification Small cells (0s, 1s, 2s) allowed provided there is sufficient uncertainty that those cells counts are real, and that creating this uncertainty does not cause significant damage to the data. Targeted Record Swapping selected UK SDC Policy
6 Whole households are swapped Risk score calculated for each household Non-response affects the swap rate SDC for households I High risk = Higher chance of being sampled Low non-response rate in delivery group = Higher swap rate
7 House is sampled Matched only as far as necessary Match on household size and other variables SDC for households II MSOA 1MSOA 2 Household matched on: No. of adults No. of children Are there pets? Household unique within MSOA = Find match outside MSOA
8 House is swapped SDC for households III MSOA 1MSOA 2
9 1.SDC methodology for CEs to remain consistent with that of households Targeted record swapping 2.Keep the numbers of persons and the numbers of CEs unchanged at all geographies Individual records swapped 3.Keep swapping within delivery group SDC for CEs: The rules
10 The wide range of CE types and resident types Population characteristics will vary between CE types and resident types The risk and impact of disclosure will vary between CE types and resident types The public nature of CEs If a CE is identified it can essentially be viewed as a smaller geography SDC for CEs: The challenge
11 Maximise utility / Minimise damage Minimise swap rate Create an efficient matching process Swap individuals within the same CE type Swap individuals within the same resident type (eg. staff with staff, clients with clients, family residents with family residents) SDC for CEs: The aims
12 Response rates are likely to vary as much between CE types than between delivery group Impact and likelihood of disclosure varies between CE types The factors which will affect the disclosure risk are: Rarity of CE type in the area Number of residents in the CE type Other factors impacting on uncertainty Set swap rate for each CE type in each MSOA Minimising the swap rate I
13 Numbers of clients and staff vary within CE types Set protection scores for staff and client residents, in each CE type, in each MSOA Family residents have set swap rate within the delivery group Minimising the swap rate II
14 Calculating the Protection Scores For client residents: For staff residents:
15 Characteristics of residents will be different between CE types Swap within CE type The problem: Rule 2: Keep swapping within delivery group How do we swap individuals in a CE type, unique in the delivery group? Must swap between CE types when this happens Matching variables chosen to so key attributes remain consistent with the CE type Swap within CE type
16 Swap rates may not be the same Characteristics of staff, clients and family residents will be different Swap within resident type Matching variables chosen to so key attributes remain consistent with the CE type Matching variables will be different between staff, client and family residents Swap within resident type
17 Resident type: Clients 73 client residents CE type: Hotels, B&Bs and guest houses 8 CEs of this type in MSOA 1 Protection score: A = 1 B = 1 C = 1 D = 2 E = 1 1 x 1 x 1 x 2 x 1 = 2 So, CPS = 2 Example 1: Creating the CPS I MSOA 1
18 Resident type: Clients 73 client residents CE type: Hotels, B&Bs and guest houses 8 CEs of this type in MSOA 1 Protection score: A = 1 B = 1 C = 1 D = 2 E = 1 1 x 1 x 1 x 2 x 1 = 2 So, CPS = 2 Low swap rate Example 1: Creating the CPS II MSOA 1
19 Individuals are swapped Risky records are targeted Swap rate dependent on Protection Score Example 1: Matching I MSOA 1 High risk = Higher chance of being sampled Low protection score = Lower swap rate
20 Individual is sampled Matched only as far as necessary Matched on CE type, resident type and client specific variables Example 1: Matching II MSOA 1 MSOA 2 Clients matched on: Pattern of jumper Do they have a hat? Do they have glasses?
21 Example 1: Matching III MSOA 1 MSOA 2 Individual is swapped
22 Prison is unique within delivery group = Swap individual outside of the CE type Matched only as far as necessary Matched on resident type and client specific variables Example 2: Matching I MSOA 1 MSOA 2 Clients matched on: Pattern of jumper Do they have a hat? Do they have glasses?
23 Individual is swapped Still able to find a match Limit the damage to the data Example 2: Matching II MSOA 1 MSOA 2 Individual matched on: Pattern of jumper Is there a hat? Do they have glasses?
24 SDC for CEs: The Process Calculate protection score for each CE type and resident type in each MSOA Select a sample of records to be swapped using the risk score as a weighting Assign risk score for individual records Swap records Match records on CE type and a selection of variables, dependent on resident type Set swap rate for dependent on the protection score
25 Both CE and household methodology will use targeted record swapping Numbers of households, CEs and persons will remain unchanged at all geographies CE methodology will swap individuals SDC methodology aims to maximise utility: Minimise amount of swapping using protection scores Swap only as far as necessary Aim to swap within CE type and resident type Match on different variables for different resident types Summary
26 Q&A For general SDC questions: For CE SDC questions: