School of Geography FACULTY OF EARTH & ENVIRONMENT New disclosure threats in Census interaction data Presented at the 6 th International Conference on.

Slides:



Advertisements
Similar presentations
Using Interaction Data in the (Public Sector) GLA John Hollis Exploring the Research Potential of the 2011 Census University of Manchester 7 th July 2011.
Advertisements

Will 2011 be the last Census of its kind in England and Wales? Roma Chappell, Programme Director Beyond 2011 Office for National Statistics, July 2011.
The Census Area Statistics Myles Gould Understanding area-level inequality & change.
RELEASE OF THE 2001 CENSUS RESULTS March Release of the 2001 Census Content Media and formats Release schedule Arrangements for using the results.
Household Projections for England Yolanda Ruiz DCLG 16 th July 2012.
School of Geography FACULTY OF EARTH & ENVIRONMENT Using OAC for analysis of the 2001 Census interaction data Oliver Duke-Williams
Opportunities and Challenges of the 2011 Census: A view from academia Tony Champion Paper for the TWRI Policy & Research Conference.
In the case of Holywell and Heslington, there are clear differences in the ages of migrants involved in these flows, and the pattern fits well with this.
Understanding Population Trends and Processes: Links between internal migration, commuting and within household relationships Oliver Duke-Williams School.
Internet access to UK Census interaction data: that's WICID! John Stillwell Centre for Computational Geography University of Leeds, Leeds LS2 9JT
T HE W EB - BASED I NTERFACE TO C ENSUS I NTERACTION D ATA - WICID Presentation to the ESRC Research Methods Festival Adam Dennett Centre for Interaction.
Sample of Anonymised Records: User Meeting Propensity to migrate by ethnic group: 1991 & 2001 Paul Norman 1, John Stillwell 2 & Serena Hussain 2 School.
Access routes to 2001 UK Census Microdata: Issues and Solutions Jo Wathan SARs support Unit, CCSR University of Manchester, UK
Scotland’s 2011 Census Migration Matters Scotland Thematic Event Cecilia Macintyre 26 February 2015.
The Changing Face of the Polish-Born Population in the UK * Stephen Drinkwater John Eade Michal Garapich CRONEM University of Surrey Conference on Polish.
Geography and Geographical Analysis using the ONS Longitudinal Study Christopher Marshall & Julian Buxton CeLSIUS.
Methods of Geographical Perturbation for Disclosure Control Division of Social Statistics And Department of Geography Caroline Young Supervised jointly.
Migration, methodologies and health inequality SEED Group
The ONS Longitudinal Study. © London School of Hygiene and Tropical Medicine The Office for National Statistics Longitudinal Study (LS) o What is it o.
Population and migration analysis from the 2011 Census Lorraine Ireland and Vicky Field Census Analysis Unit, Population Statistics Division, ONS 17 July.
DESIGN OF A SUBNATIONAL POPULATION PROJECTION MODEL FOR ETHNIC GROUPS AND FOR DEALING WITH UNCERTAINTY IN INTERNAL MIGRATION Philip Rees BSPS Day Meeting.
1 Commuting and Migration Data Products from the American Community Survey Journey-to-Work and Migration Statistics Branch U.S. Census Bureau State Data.
Access to UK Census Data for Spatial Analysis: Towards an Integrated Census Support Service John Stillwell 1, Justin Hayes 2, Rob Dymond-Green 2, James.
Interaction Data John Stillwell and Oliver Duke-Williams Centre for Interaction Data Estimation and Research (CIDER) School of Geography, University of.
Secondary Data Analysis Using the Census Stephen Drinkwater WISERD School of Business and Economics Swansea University.
Census Interaction Data: Characteristics and Access John Stillwell Centre for Interaction Data Estimation and Research (CIDER) School of Geography, University.
Nigel James Bodleian Library The Census Accessing and mapping British Census Data.
Household projections for Scotland Hugh Mackenzie April 2014.
Interaction Data: Progress and Potential John Stillwell and Oliver Duke-Williams Centre for Interaction Data Estimation and Research (CIDER) School of.
Census.ac.uk Introduction to international migration data Oliver Duke-Williams Adam Dennett.
GEOG3025 Census and administrative data sources 2: Outputs and access.
School of Geography FACULTY OF EARTH & ENVIRONMENT Where do you come from? The changing nature of questions about migration in UK Censuses Oliver Duke-Williams.
Internal migration of Britain’s ethnic populations Serena Hussain and John Stillwell School of Geography University of Leeds Presentation for the UPTAP.
Handling Migration and Commuting Flow Data Day session at the ESRC Research Methods Festival at St Catherine’s College, University of Oxford, 2 July 2008.
National Statistics Quality Review on International Migration Estimates Update on taking forward the recommendations of the review Emma Wright & Giles.
JOINT UNECE-UNFPA TRAINING WORKSHOP ON POPULATION AND HOUSING CENSUSES GENEVA, 5-6 JULY 2010 GOOD PRACTICES IN DISSEMINATING POPULATION CENSUS RESULTS.
Plans for Access to UK Microdata from 2011 Census Emma White Office for National Statistics 24 May 2012.
2011 CENSUS Coverage Assessment – What’s new? OWEN ABBOTT.
Monitoring UK internal migration in the twenty-first century John Stillwell Centre for Interaction Data Estimation and Research (CIDER), School of Geography,
Providing Access to Census- based Interaction Data in the UK: That’s WICID! John Stillwell School of Geography, University of Leeds Leeds, LS2 9JT, United.
School of Geography FACULTY OF EARTH & ENVIRONMENT Relationships between migration, commuting and household structure Oliver Duke-Williams
Sustainable rural populations: the case of two National Park areas Alan Marshall Ludi Simpson Cathie Marsh Centre for Census and Survey Research.
EDAT 17 December 2014 Local demographic trends – An older and ageing population Andy Cornelius Corporate Research & Consultation Team.
The Impact of Disclosure Control on Labour Market Statistics (& other issues)– the User’s Gripes Jill Tuffnell Head of Research Cambridgeshire County Council.
New and easier ways of working with aggregate data and geographies from UK censuses Justin Hayes UK Data Service Census Support.
Census.ac.uk The UK Census Longitudinal Studies Chris Dibben, University of St Andrews.
Internet Access to Census Migration and Journey-to-Work Data John Stillwell and Oliver Duke-Williams Centre for Computational Geography University of Leeds,
WICID AND THE 2001 INTERACTION DATA John Stillwell and Oliver Duke-Williams School of Geography, University of Leeds Presentation at the Ninth International.
Data on the Foreign Born in 2010: Accessing Information on Immigrants and Immigration from the U.S. Census Bureau’s American Community Survey Thomas A.
Teaching Research Methods: Resources for HE Social Sciences Practitioners Workshop 2: Using Census 2011.
General Register Office for S C O T L A N D information about Scotland's people Comparison between NHSCR and Community health index sources of migration.
Web Access to Census Interaction Data John Stillwell and Oliver Duke-Williams Centre for Computational Geography University of Leeds, Leeds LS2 9JT Paper.
WP 19 Assessment of Statistical Disclosure Control Methods for the 2001 UK Census Natalie Shlomo University of Southampton Office for National Statistics.
1 Research Methods Festival 2008 Zhiqiang Feng 1,2 and Paul Boyle 1 1 School of Geography & Geosciences University of St Andrews 2 The Centre for Census.
JOINT UN-ECE/EUROSTAT MEETING ON POPULATION AND HOUSING CENSUSES GENEVA, MAY 2009 DETERMINING USER NEEDS FOR THE 2011 UK CENSUS IAN WHITE, Office.
Creating Open Data whilst maintaining confidentiality Philip Lowthian, Caroline Tudor Office for National Statistics 1.
School of Geography FACULTY OF ENVIRONMENT ESRC Research Award RES What happens when international migrants settle? Ethnic group population.
2011 Census Data Quality Assurance Strategy: Plans and developments for the 2009 Rehearsal and 2011 Census Paula Guy BSPS 10 th September 2009.
United Nations Sub-Regional Workshop on Census Data Evaluation Phnom Penh, Cambodia, November 2011 Evaluation of Internal Migration Data Collected.
The 2011 Census: Estimating the Population Alexa Courtney.
JOINT UN-ECE/EUROSTAT MEETING ON POPULATION AND HOUSING CENSUSES GENEVA, 7-9 JULY 2010 DISSEMINATING THE RESULTS OF THE 2011 CENSUS IN ENGLAND AND WALES.
Census 2011 – A Question of Confidentiality Statistical Disclosure control for the 2011 Census Carole Abrahams ONS Methodology BSPS – York, September 2011.
JOINT UN-ECE/EUROSTAT WORK SESSION ON MIGRATION STATISTICS GENEVA, OCTOBETR 2012 COLLECTING MIGRATION DATA IN THE UK CENSUS IAN WHITE, Office for.
Samples of Anonymised Records from the U.K. Census 1991 and 2001 Integrating Census Microdata Workshop Barcelona th July 2005 Dr. Ed Fieldhouse Cathie.
Jo Watson sepho South East Public Health Observatory Solutions for Public Health Day 2: Session 2 Populations and geography.
Dd/mm/yyyyyRef/Title Jon Carling Head of NERIP North East Regional Information Partnership Workplace and Commuting Research – Phase 1.
2011 Census results for Edinburgh some insights into demographic, social and economic change CEC Planning Information, Services for Communities, January.
Beyond 2011 Voluntary Sector Statistics User Event Minda Phillips Amelia Ash.
Evaluating the potential for moving away from a traditional census Becky Tinsley Office for National Statistics (ONS), UK.
School of Geography, University of Leeds
Presentation transcript:

School of Geography FACULTY OF EARTH & ENVIRONMENT New disclosure threats in Census interaction data Presented at the 6 th International Conference on Population Geographies, Umeå, Sweden 17 th June 2011 Oliver Duke-Williams

What are interaction data? Migration data Journey to work data Journey to school data Interaction data are flow data Also referred to as ‘origin-destination’ data

Sources of interaction data Interaction data are derived from census questions that are asked in many countries What is your place of work? What was your usual address 1 year ago? Different time periods used in different countries These can (potentially) be used to derive detailed flow matrices Special Migration Statistics (SMS) Special Workplace Statistics (SWS) Special Travel Statistics (STS)

UK Census questions 2011

1991 SMSSet 2 White: No migrants Blue: 1-9 migrants Red: 10+ migrants LondonMetropolitan counties

Location tracing Mobile phone handsets (and other equipment) are often location aware: they can determine a location using GPS or cell tower triangulation This can be used to offer a variety of location based services Restaurant recommendation Local services etc

tripadviser.com

geocaching.com

getlondonreading.com

Social networking and location A potential use of location aware hardware is to add location data to social networking updates Geo-tagged Tweets Google Places FourSquare

fourwhere.com

4mapper.appspot.com

seekatweet.com

twittervision.com

The risks If third party location tracing makes it easy to monitor the location of an individual (possibly without their knowledge), what does it tell us about them? Association with particular locations Many adults spend a lot of time in particular locations: Their home Their workplace

Location trace examples

Can people be identified from location traces? Krumm (2007) – using volunteer donated records Using 2 weeks worth of in-car GPS data Try to determine the subject’s home location Given a location, try to determine personal attributes Source: Krumm (2007); Fig 2 Source: Krumm (2007); un-numbered table

How easy is to get location data? Data can be purchased Danezis et al (2005) Median value placed on data by students of £10 Cvrcek et al (2006) Variations by country, purpose and length of observation Data can be exchanged for chance to win Krumm (2007) Data donated for 1/100 chance to win a $200 MP3 player Data can be exchanged for services Location based apps etc. Data could be obtained without permission iPhone tracking etc.

Source:Crvcek et al (2006); Figure 1(a)

Uniqueness Combinations of characteristics make people unique 87% of Americans have unique combination of gender, birth date and ZIP-code (Sweeney, 2000) This includes spatial identifiers Area of residence is not unique Area of workplace is not unique A combination of home and workplace might be unique...

Workplace flow data Golle and Partridge (2009) Examined US Longitudinal Employer-Household Dynamics dataset Studied potential disclosure threat from generalised home and workplace location

Golle & Partridge (2007)

Assessing risk in the UK Workplace data What proportion of all workers are uniquely identifiable on the basis of their home and workplace location alone? Migration data What proportion of all migrants are uniquely identifiable on the basis of their origin and destination alone? What if we add extra attributes such as gender or age group?

1991 SMS 1991 SMS Set 1 Migration within and between 10,000+ wards Limited attribute detail 1991 SMS Set 2 Migration within and between 459 districts More attribute detail Subject to suppression Scope of both sets is Great Britain

Key1991 SMS SMS 2 broad age groups 1991 SMS 2 narrow age groups Origin, destination12.5%0.6% Origin, destination, sex21.5%1.7% Origin, destination, age group22.2%2.7%6.8% Origin, destination, age group, sex37.4%6.2%11.0% 1991 SMS Proportion of migrants uniquely disclosed by various keys, 1991 SMS

2001 SMS Data are released at different ‘Levels’ 2001 SMS Level 1 Migration within and between 426 Local Authorities 2001 SMS Level 2 Migration within and between 10,000+ wards 2001 SMS Level 3 Migration within and between 223,000+ Output Areas As the spatial detail increases, the attribute detail is reduced Scope of the data is United Kingdom Some variations in detail in different countries

2001 SMS Analysis is affected by disclosure control Small Cell Adjustment Methodology: small counts randomly adjusted Not applied to data for destinations in Scotland Persons Male Female All ages pensionable age1275 Pensionable age+514 Persons Male Female All ages pensionable age1275 Pensionable age+534 Persons Male Female All ages pensionable age1275 Pensionable age+734

2001 SMS Data set Proportion of all migrants Additional attributes Level 335.1%2 (age, sex) Level 211.2%3 (age, sex, ethnic group) Level 10.3%7 (age, sex, ethnic group, family status, household status, illness status, economic activity) Proportions of migrants uniquely disclosed by origin and destination, 2001 SMS, destinations in Scotland

2001 SWS / STS Published for same set of Levels as 2001 SMS SWS published for residences in England, Wales and Northern Ireland STS published for residences in Scotland Includes SWS components, plus Students’ travel to place of study Non-working, non-student population Similar disclosure control issues, but more confusing Small Cell Adjustment applied to residences in Scotland at Level 3 Not applied to residences in Scotland at Levels 1 and 2

Accommodating SCAM Much of the data is affected by SCAM, so values of ‘1’ are not seen We can identify general ‘small’ flows by looking at those with a revised total of 3 We can estimate the proportion of these that had originally had a value of ‘1’ By applying flow frequencies observed in un-modified Scottish data By generating a mean value across multiple affected tables

2001 SWS Data set Flows with total=3 Estimated flow = 1 [mean method][sts2 method] Level 3 (England and Wales)63.4%16.8%27.4% Level 2 (UK exc. Scotland)9.2%2.4%4.0% Level 1 (UK exc. Scotland)0.4%0.1%0.2% Proportion of all workers uniquely identifiable (or in small flows) from home and workplace locations only

Varying geography Spatial scale has an important effect on the results What happens if we vary the scale for only one end of the flow? e.g. Detailed workplace geography, but less detailed home geography? Three sets of flows constructed from mean flow data Ward-to-ward District-to-ward Government Office Region-to-ward

Asymmetric 2001 SWS results Data setFlow=1 Ward to ward2.4% District to ward0.5% GOR to ward0.03% Proportions of workers uniquely identifiable given home and workplace location only

Anonymity sets for asymmetric SWS

Existing asymmetric data sets As well as standard outputs, there are also commissioned outputs Some of these have been interaction data, including some asymmetric data sets Two data sets were studied, both showing commuting flows by mode of transport to Output Areas in Greater London

Flows to OAs in Greater London Flow=3 Estimated flow=1 [mean method] C0310 Origins: Wards in Greater London 26.9%7.0% C0311 Origins: Districts in E&W 8.8%2.3% C0311 – subset Origins: Districts in Greater London 4.6%1.2% Proportions of workers uniquely identifiable given home and workplace location only

Does it matter? What are the implications of these results? The 1991 and 2001 data sets are old No location traces from then are likely to exist Even if they did, the individuals may have moved, died, changed status etc. Any risk from location tracing will apply to future data sets e.g. from the 2011 Census Publishing data sets for detailed geographies may constitute a potential risk, but it is limited

Disclosure risks The most obvious risks are posed by the OA-OA flows However, there is little potential for attribute disclosure A simple headcount data set at this scale would allow modelling of coarser flows, but with little attribute disclosure risk The ward-to-ward flows pose a smaller risk Proposed record-swapping based disclosure control can introduce enough noise to reduce the risk The district-to-district flows pose no practical risk Statistical agencies should not be afraid of publishing detailed flow data at this level

Asymmetric data sets have important potential Can act as hybrid between flexible interaction data and area-based data Can show high level of spatial detail Could be published as complementary pairs Utility would depend on user satisfaction

Questions?

References Danezis, G., Lewis, S., Anderson, R.: How much is location privacy worth? In: Fourth Workshop on the Economics of Information Security. (2005) Cvrcek, D, Kumpost, M, Matyas V, and Danezis G. (2006). A study on the value of location privacy. In Proceedings of the 5th ACM workshop on Privacy in electronic society (WPES '06). ACM, New York, NY, USA, DOI= / Golle, P and Partridge K (2009) On the Anonymity of Home/Work Location Pairs, Pervasive Computing, Lecture Notes in Computer Science, vol 5538/2009, Springer Berlin / Heidelberg Krumm, J (2007) Inference Attacks on Location Tracks. In Proc. of Fifth International Conference on Pervasive Computing (Pervasive 2007), pp Sweeney, L (2000) Uniqueness of Simple Demographics in the U.S. Population, Laboratory for International Data Privacy, Carnegie Mellon University

Distribution of 2001 SWS Level 2 mean flows