Michelle Simard Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality Tarragona, Spain, November 23 rd, 2011 Progress on Real Time Remote Access
19/02/2016 Statistics Canada Statistique Canada 2 Since 2009 Developed a Prototype offering tabulated counts Developed Statistical Disclosure Control (SDC) Continued development on different fronts
19/02/2016 Statistics Canada Statistique Canada 3 The Prototype
19/02/2016 Statistics Canada Statistique Canada 4 Spring 2010 Tabular(counts) outputs only - SAS only Modified PROC FREQ, Data steps Limited to particular household surveys data sets Confidentiality automated, no manual intervention Limited to some Canadian Federal Departments only Ability to query RTRA micro data at any time Access from any computer with internet access, using a secure username and password No travel to Research Data Centres The Prototype
19/02/2016 Statistics Canada Statistique Canada 5 Minimum 4 minutes plus process time Maximum 3 hours plus process time notification for outputs with 7-day retention Formatted table in HTML or in SAS The Prototype
19/02/2016 Statistics Canada Statistique Canada 6 Additive and Controlled Rounding (ACROUND) Create rounded additive table close to original table with controlled grand total → semi-controlled rounding Use an iterative process to improve the semi- controlled result → controlled rounding Protects against possible matching of information with PUMF and small impact on precision Maximum : 5 dimensions Current SDC
7 Proc Percentile Release the percentile only if 1.there are at least n 1 observations ≥ the percentile value and at least n 2 observations ≤ the percentile value 2.it is ≠ minimum or maximum value 3.the total number of unweighted observations is ≥ m 4.the rounded frequency associated (from ACROUND) with the percentile is ≠ 0 Statistics Canada Statistique Canada 19/02/2016 Recent Development
8 Proc Mean Release the mean only if 1.there are at least n 3 observations present in the domain 2.the rounded frequency associated with the mean (from ACROUND) is ≠ 0 For both PROC, “magnitude” rounding will be applied on statistics to balance precision and noise Statistics Canada Statistique Canada 19/02/2016 Recent Development
Not only balancing confidentiality and precision BUT quality measures as well Evaluating the risk Displaying information (What and How) Statistics, Standard Error(SE), Variance, Coefficient of Variation (CV), Confidence Interval (CI),Quality Indicator (QI), weighted counts, unweighted counts, ACROUND outputs? 19/02/2016 Statistics Canada Statistique Canada 9 Challenges and Issues
10 Value of CV* Quality Indicator Guideline 0 ≤ C.V. ≤ 0.10(a)very good 0.10 < C.V. ≤ 0.20(b)acceptable 0.20 < C.V. ≤ 0.35(c)marginal C.V. > 0.35(d)very poor Quality Measures Release estimate with SE and a Quality Indicator (QI) If not releasable ==> put ‘X’s or other symbol otherwise release SE and QI as follows: *Note: CV is calculated from original non-rounded S.E. and percentile Statistics Canada Statistique Canada 19/02/2016
Statistics Canada Statistique Canada 11 Next steps Used output control techniques rather than input control techniques Next step: proportions, ratios, totals, models May need input control techniques when going into modeling Expansion to the academic community Expansion to Censuses, then administrative data Streamlining the approval processes Developing a “fee” structure and “penalty” processes
19/02/2016 Statistics Canada Statistique Canada 12 THANK YOU For more information, Pour plus d’information, please contact:veuillez contacter : Michelle Simard