Operationalising ‘safe statistics’ the case of linear regression Felix Ritchie Bristol Business School, University of the West of England, Bristol.

Slides:



Advertisements
Similar presentations
Output Consultation Plans and Statistical Disclosure Control Strategy developments Angele Storey and Jane Longhurst ONS.
Advertisements

Standardized Scales.
Statistical Disclosure Control (SDC) at SURS Andreja Smukavec General Methodology and Standards Sector.
A2 coursework What do I have to do? What is required? You have to carry out a piece of research that is related to the specification You have to carry.
Discriminant Analysis Database Marketing Instructor:Nanda Kumar.
In a Virtual Data Centre Protecting Confidentiality COMPUTATIONAL INFORMATICS Christine O’Keefe, Mark Westcott, Adrien Ickowicz, Maree O’Sullivan, CSIRO.
Chapter 8 Linear Regression © 2010 Pearson Education 1.
Using Statistics in Research Psych 231: Research Methods in Psychology.
Using Statistics in Research Psych 231: Research Methods in Psychology.
Using Statistics in Research Psych 231: Research Methods in Psychology.
SLIDE 1IS 240 – Spring 2010 Logistic Regression The logistic function: The logistic function is useful because it can take as an input any.
Developing a Statistical Disclosure Standard for Europe Tanvi Desai LSE Research Laboratory Data Manager Research Laboratory IASSIST 2010: Cornell.
Using Statistics in Research Psych 231: Research Methods in Psychology.
4/3/20011 Ethics in Special Education Assessment and Testing and Maintenance of Student Information.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Are the results valid? Was the validity of the included studies appraised?
The Research Process Interpretivist Positivist
Using Concept Maps in Planning an Introductory Statistics Course. Roger Woodard.
Personal Budgets People First Bath and North East Somerset.
Chapter 6: Objections to the Physical Symbol System Hypothesis.
Confidentiality Issues with “Small Cell” Data Michael C. Samuel, DrPH STD Control Branch California Department of Public Health 2008 National STD Prevention.
Private and Personal Information Common Sense Unit l – Lesson 2 (Cross-Curricular Categories) Privacy and Security Information Literacy.
User-focused Threat Identification For Anonymised Microdata Hans-Peter Hafner HTW Saar – Saarland University of Applied Sciences
Disclosure detection & control in research environments Felix Ritchie.
Multiple Regression Lab Chapter Topics Multiple Linear Regression Effects Levels of Measurement Dummy Variables 2.
Access to sensitive data in the UK: a principles-based approach Felix Ritchie.
Access to Microdata Felix Ritchie Business Data Linking.
UK Data Access Practices Felix Ritchie. Overview The legislative model The data model The security model Developments Current key concerns.
Looking forward Callum Foster, ONS. Quick summary/update on Beyond 2011/Census 2021  Beyond 2011 programme assessed a range of potential approaches 
Analysis of Variance 1 Dr. Mohammed Alahmed Ph.D. in BioStatistics (011)
User-centred, evidence-based, risk- managed access to data Hans-Peter Hafner 1, Rainer Lenz 1,2, Felix Ritchie 3, Richard Welpton 4 1 Technical University.
Type I and Type II Error AP Stat Review, April 18, 2009.
Differential Privacy Some contents are borrowed from Adam Smith’s slides.
1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
144 DRAFT PROGRAMME ADVOCACY COURSE DAY 3 SESSION 9 Creating the message Quiz 5 Creating the message SESSION 10 Working with the media (1) Website development.
UNIT 2, LESSON 1 POLYNOMIAL FUNCTIONS. WHAT IS A POLYNOMIAL FUNCTION? Coefficients must be real numbers. Exponents must be whole numbers.
Development of UK Virtual Microdata Laboratory Felix Ritchie Shanghai, March 2010.
Logistic Regression. Linear regression – numerical response Logistic regression – binary categorical response eg. has the disease, or unaffected by the.
Census 2011 – A Question of Confidentiality Statistical Disclosure control for the 2011 Census Carole Abrahams ONS Methodology BSPS – York, September 2011.
The Year of the Curriculum: Life Without Levels The programme consists of a bridging unit and five further units: © Curriculum Foundation1 Bridging Unit.
Key Knowledge Confidentiality Year 4 Medical Ethics and Law Thread Course The Ethox Centre, University of Oxford.
Written Reports Guidelines. Basic Guidelines The group projects are 35% of your course grade. Half of this will be the group presentations made today,
Steven Cole Cole Corporate November 2012 © cole corporate.
COMMENTARY LL2 - Coursework. Assessment Objectives Below is the breakdown of how many marks you get for each Assessment Objective you meet: AO1: Select.
Inferential Statistics Psych 231: Research Methods in Psychology.
Gender differences in conversations that play roles in preventing dementia among the elderly in Japan Yoshitaka SAITO Katsunori KONDO Chiyoe MURATA.
Spontaneous recognition: Risk or distraction
Development of UK Virtual Microdata Laboratory
Creation of synthetic microdata in 2021 Census Transformation Programme (proof of concept) Robert Rendell.
Confidentiality in Published Statistical Tables
General principles in building a predictive model
Shapley Value Regression
Linear Regression.
Treatment of statistical confidentiality Table protection using Excel and tau-Argus Practical course Trainer: Felix Ritchie CONTRACTOR IS ACTING UNDER.
Treatment of statistical confidentiality Table protection using Excel and tau-Argus Practical course Trainer: Felix Ritchie CONTRACTOR IS ACTING UNDER.
بعض النقاط التي تؤخذ في الحسبان عند تقييم الاستثمارات الزراعية
Open data: who needs it? Presentation by Felix Ritchie
Treatment of statistical confidentiality Part 5 Summary & reflection: rules versus principles Introductory course Trainer: Felix Ritchie CONTRACTOR IS.
Data from statistical modeling (e. g
The ‘Five Safes’ framework for data access management
The ‘Five Safes’ framework for data access management
Lessons learned in training ‘safe users’ of confidential data
Federal Statistical Office Germany Research Data Centre
Pilots on Data validation
Treatment of statistical confidentiality Part 3: Generalised Output SDC Introductory course Trainer: Felix Ritchie CONTRACTOR IS ACTING UNDER A FRAMEWORK.
Treatment of statistical confidentiality Part 5: Rules versus principles Introductory course Trainer: Felix Ritchie CONTRACTOR IS ACTING UNDER A FRAMEWORK.
Dealing with confidential data Introductory course Trainer: Felix Ritchie CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION.
Treatment of statistical confidentiality Part 1: Principles Introductory course Trainer: Felix Ritchie CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT.
Treatment of statistical confidentiality Introductory course Trainer: Felix Ritchie CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE.
Presentation transcript:

Operationalising ‘safe statistics’ the case of linear regression Felix Ritchie Bristol Business School, University of the West of England, Bristol

Background: output SDC Safe statistics in principle Making it work: regression and totals What does ‘safe’ mean? Plan

Researchers increasingly using very sensitive data ‘Traditional’ SDC research (tables and anonymisation) of limited relevance Need rules for generalised output  ‘Output SDC’  Ideally, principles-based Background: output SDC

How do you devise guidelines for output when everything is possible?  The ‘research zoo’ –Separate lions from rabbits –Focus on the lions –Forget about the rabbits Making output SDC manageable

Define a statistic (sum, regression, odds ratio, index etc) as ‘safe’ or ‘unsafe’ –safe: release unless there’s a reason –unsafe: don’t release unless shown to be safe in the specific context SDC efforts concentrated on problematic output ‘Safe statistics’

1.Define the functional form 2.Identify the disclosure potential 1.Can it directly reveal a single data point? 2.Can it be differenced? 3.Anything else? 3.If provisionally ‘safe’, identify special cases 4.Draft guidelines Categorising ‘safe statistics’

Example: regression coefficients

Example: total

Safe statistics decision chart

We can’t –nor are we trying to ‘safe’ = ‘for all practical purposes posing no significant disclosure risk’ ‘unsafe’ = ‘high risk; spend time on this’ How can we say X is unconditionally ‘safe’?

Everything has a theoretical risk Resources are limited Overall risk protection is maximised by concentrating on real risks …and also the non-experienced Risk assessment for grown-ups

Questions