Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical Disclosure Control for the 2011 UK Census Jane Longhurst, Caroline Young and Caroline Miller (ONS)

Similar presentations


Presentation on theme: "Statistical Disclosure Control for the 2011 UK Census Jane Longhurst, Caroline Young and Caroline Miller (ONS)"— Presentation transcript:

1 Statistical Disclosure Control for the 2011 UK Census Jane Longhurst, Caroline Young and Caroline Miller (ONS)

2 Outline Context Workplan Progress Short-listing the SDC Methods Quantitative Evaluation Description of the Methods (Advantages and Disadvantages) Example Evaluation (Risk-Utility Framework) Summary

3 Context The UK takes a census every 10 years. Next census due in 2011. This will comprise separate, simultaneous Censuses for England & Wales (ONS), Scotland (GROS) and Northern Ireland (NISRA).

4 Context SDC for 2011 Census outputs is a major concern for users Different SDC methodologies were adopted for standard tabular 2001 Census outputs across UK Late addition of small cell adjustment by ONS/NISRA resulted in high level of user confusion and dissatisfaction Publicised commitment to aim for a common UK SDC methodology for all 2011 Census outputs

5 Workplan Phase 1 (March ’06 – Jan ’07) –UK agreement of key SDC policy issues Phase 2 (Jan ’07 – Sept ’08) –Evaluation of all methods complying with agreed SDC policy position in terms of risk/utility framework and feasibility of implementation Phase 3 (Sept ’08 – Spring/Summer ’09) –Recommendations and UK agreement of SDC methodologies for 2011 Census tabular outputs Phase 4 (Feb ’09 onwards) –Evaluate and develop SDC methods for microdata, future work on output specification, system specification, development and testing

6 Progress The UK SDC Policy Position (Nov ‘06) highlighted: –Key risk is attribute disclosure –Consideration of pre-tabular and post-tabular methods –Small cell counts can be included in tables provided uncertainty about the true value is created –Different access agreements for tabular outputs that are seriously compromised by SDC Tolerable threshold not yet determined, but steer towards less conservative approach

7 Progress Development of SDC Strategy –UK SDC working group established to take forward methodological work –UKCDMAC subgroup set up to QA work Initial stage of methodological research: –Review of SDC in census context (May ’07) –Qualitative evaluation of SDC methods for 2011 Census outputs Focus on tabular outputs whilst considering impact on other outputs

8 Progress UK SDC working group met in August –Produced short-list of SDC methods –SDC methods assessed against criteria in line with Registrars General policy statement Formal QA and sign-off of criteria and short- listed SDC methods Short-listed methods will undergo thorough quantitative evaluation and should maximise data utility whilst minimising disclosure risk

9 Short-listing: Criteria –Method should: prevent new information being derived prevent disclosure by differencing and enable flexible table generation –Could use special access arrangements if disclosure control seriously comprises some tabular outputs –Table design methods applied alongside chosen method

10 Short-listing: Criteria Trade off between risk and utility needs to be evaluated quantitatively Many potential SDC methods which could be used but not possible to conduct quantitative evaluation of each method Need to consider qualitative aspect using high-level review of advantages and disadvantages of SDC methods Qualitative and subsequent quantitative evaluations used in combination to establish recommended SDC method(s) for 2011 Census

11 Short-listing: Criteria Each method assessed against a set of 7 qualitative criteria (primary and secondary): Primary criteria –Additivity and consistency –Overall user acceptability –Protection against differencing –Feasibility of implementation Secondary criteria –Impact on microdata releases –Simple to understand –Easy to account for in analyses

12 Short-listing: Scoring Following methods considered for short-listing: –Record Swapping –Over-Imputation –Data Switching –Post Randomisation Method (PRAM) –Sampling –Conventional Rounding –Random Rounding –Small Cell adjustment –Controlled Rounding –Semi-Controlled Rounding –Suppression –Barnardisation –ABS Cell Perturbation Method

13 Short-listing: Scoring For each criteria, method assigned score: –0 = method not meet criteria –1 = method partly meets criteria –2 = method does meet criteria Primary criteria given double weighting Overall score and ranking assigned to each method Methods failing on primary criteria were discounted

14 Short-listing: Scoring Majority of SDC methods failed primary criteria and were discounted from short-list. For example: –PRAM - difficult to implement and not proven for Census data –Sampling – low user acceptance of weighted tables –Rounding – low user acceptance of rounding methods –Suppression – extremely difficult to implement to protect against differencing

15 Short-listed SDC Methods Record swapping Over-imputation ABS Cell Perturbation method Small cell adjustment with record swapping (to provide comparison with 2001)

16 Quantitative Evaluation Examine how methods protect and manage risk and how they impact on data utility Plan to use range of 2001 Census tables, varying parameters, different geographies Information Loss software will be used to evaluate each short-listed method Consideration will be given to other issues, e.g. comparisons over time, communal establishments, imputation rates

17 What do the methods do? The short-list Record Swapping ABS Cell Perturbation Over-imputation

18 Record Swapping - Summary 2001 Random Record Swapping method: % households swapped across OAs Swap within LA to preserve marginal distributions at this level Matches found using control variables –Age –Gender –Hard to Count Index (census enumeration) –Household Size All non-geographic fields swapped Random /Targeted

19 Record Swapping - Summary AdvantagesDisadvantages Consistent and additive Some protection against differencing Risk of inconsistent / illogical records low Flexibility of swapping rates Effects of perturbation hidden and hard to measured or account for Tables not visibly perturbed Geographic fields such as workplace not swapped (Origin-Destination tables not protected)

20 ABS Cell Perturbation - Summary Developed by the Australian Bureau of Statistics In use for their 2006 Census data Based on random numbers assigned to each record Then each table is adjusted independently in two stages: –(1) Adding perturbations to each cell –(2) Restoring additivity of whole table

21 ABS Cell Perturbation - Summary Assign each microdata record a random number between 1 and m called an rkey For each cell in a particular table: –Calculate the cell key according to a function of the rkeys Using a look-up table, read off the perturbation to add where ckeys are the columns and original values are the rows of the lookup table Perturbation added to original cell value ABS additivity module not yet evaluated

22 Example Look-up Table Original Cell valuePerturbation drawn from following distribution (using the cell key) 0No Perturbation 1Normal (0, 2) truncated at -1 and +5 2Normal (0, 2) truncated at -2 and +5 3Normal (0, 2) truncated at -3 and +5 4Normal (0, 2) truncated at -4 and +5 5+Normal (0, 2) truncated at -5 and +5

23 ABS Cell Perturbation - Summary AdvantagesDisadvantages Tables consistent Protects against differencing Efficient – allegedly quick run-time Flexible – lookup table can be designed to suit needs After additivity stage, consistency is lost to some extent Needs to be applied to each table separately

24 Over-imputation - Summary Involves randomly selecting a percentage of microdata records which then have certain variables erased. Select donors matching on control variables and the erased variables are then imputed Various approaches to over-imputation will be considered

25 Over-imputation - Summary AdvantagesDisadvantages Imputation software already in place Can target risky records Can protect workplace tables if includes geographical fields Provides some protection against differencing Errors (bias and variance of estimates) may be introduced Difficult to account for impacts e.g. standard errors at high levels of geography. Can alter association between characteristics of members within same household.

26 Quantitative Evaluation An example of how the quantitative evaluation will be carried out…. Preliminary study comparing swapping and ABS cell perturbation using ideas developed by Natalie Shlomo (framework of balancing risk and utility)

27 Preliminary Evaluation: Tables used 2001 UK Census Tables EA: Southampton, Eastleigh, Test Valley (SJ) TableVariables Persons in table Cells in table Avg cell size % zero cell % small cells A Religion(9) * Age-sex(6) * OA(1487) 437,74480,2985.559.112.6 B Sex(2) * LLTI(2) * Econ- Activity(9) * Ward(70) 317,06425,250125.816.99.0

28 Measuring Disclosure Risk Main risk –small cells in tables –small cells in differenced tables Disclosure Risk = proportion of records in the small cells that have not been perturbed

29 Disclosure Risk: OA and Ward

30 Measuring Information Loss Utility (information loss) measures compare statistical quality of original and protected tables Measure distortion to internal cell distributions Compare variance of cell counts Measure impact on rank correlations

31 Distance Metrics at Output Area level

32 Variance of Cell Counts: OA and Ward

33 Impact on Rank Correlations: OA and Ward

34 Summary Ongoing progress made for 2011 Census Thorough quantitative evaluation of short-list over next year, using 2001 method as benchmark Important to strike balance between minimising disclosure risk and maximising data utility Qualitative and quantitative evaluations used in combination to establish recommended approach to SDC for 2011 Census User communication and consultation will take place throughout the work programme

35 Contact Details Jane.Longhurst@ons.gov.uk Caroline.Miller@ons.gov.uk Caroline.Young@ons.gov.uk


Download ppt "Statistical Disclosure Control for the 2011 UK Census Jane Longhurst, Caroline Young and Caroline Miller (ONS)"

Similar presentations


Ads by Google