An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of.

Slides:



Advertisements
Similar presentations
Innovation data collection: Advice from the Oslo Manual South East Asian Regional Workshop on Science, Technology and Innovation Statistics.
Advertisements

1 ESDS Government Vanessa Higgins Cathie Marsh Centre for Census and Survey Research University of Manchester ESDS Awareness Day December 2003.
Accessing longitudinal data via the UK Data Archive / ESDS Jack Kneeshaw NCDS summer school course, July 2005 ESDS Longitudinal.
The Economic and Social Data Service (ESDS) Kevin Schürer ESDS/UKDA ESDS Awareness Day 5 December 2003.
Accessing the MCS via the Economic and Social Data Service Jack Kneeshaw MCS workshop 10 November 2004 ESDS Longitudinal.
The Economic and Social Data Service (ESDS) Karen Dennison, Support Services Manager, UK Data Archive April 2008.
Accessing the MCS from the Economic and Social Data Service Jack Kneeshaw MCS workshop 13 October 2009 ESDS Longitudinal.
Accessing the NCDS and BCS70 via the Economic and Social Data Service Jack Kneeshaw NCDS/BCS70 workshop 27 October 2004 ESDS Longitudinal.
Access to Data via the ESDS/UKDA Jack Kneeshaw ESDS/UKDA.
Large-scale, cross-sectional government datasets; research published and recent developments. Jo Wathan Data Support Economic and Social Data Service (Government)
ESDS Government Facilitating more effective use of large-scale government surveys
An Introduction to the UK Data Archive and the Economic and Social Data Service November 2007 Jack Kneeshaw, UKDA.
Accessing the NCDS and the BCS70 via the Economic and Social Data Service Jack Kneeshaw NCDS/BCS70 workshop 16 October 2007 ESDS Longitudinal.
The Economic and Social Data Service (ESDS) Karen Dennison UK Data Archive Improving access to government datasets 18 January 2007.
Accessing the MCS via the Economic and Social Data Service Jack Kneeshaw and Alasdair Crockett MCS workshop 20 November 2003 ESDS Longitudinal.
Samples of Anonymised Records: a resource for ethnicity research Ed Fieldhouse Director, SARs Support team
The 2001 SARs The Individual Licensed SAR Accessing the data Quality and analysis issues Controlled Access Microdata files The Household SAR Small Area.
ESDS Resources Anthony Rafferty ESDS Government Centre for Census and Survey Research University of Manchester.
Using ESDS Government Resources for Health Research Alan Marshall ESDS Government Centre for Census and Survey Research University of Manchester.
Requirements for 2011 Cross-sectional Microdata SARs Support Team University of Manchester
Requirements for 2011 Cross-sectional Microdata Ed Fieldhouse SARs Support Team University of Manchester
ESDS Government Tel: (0161) Jo Wathan CCSR, University of Manchester.
Outline of talk The ONS surveys Why should we weight?
Longitudinal LFS Catherine Barham and Paul Smith ONS.
ESDS Resources for EFS Users Jo Wathan ESDS Government Centre for Census and Survey Research University of Manchester.
1 Using the government data in employment research Vanessa Higgins CCSR University of Manchester.
Large-scale Microdata workshop: An introduction to the SARs and ESDS Government Surveys University of Plymouth 15 April 2005 Jo Wathan & Reza Afkhami.
Introduction to the key large-scale government surveys Jo Wathan, Paul Norman & Angela Dale ESDS Government Centre for Census and Survey Research (CCSR)
1 Welcome to the Williamson Building… In the event of fire alarm Alarm is a constant ring Head left down corridor, down stairs Assembly point on grass.
ESDS Government Resources for the GLF/ GHS ESDS Government Centre for Census and Survey Research University of Manchester.
Working with the 2001 Licensed Individual SAR Coverage and quality SAR data issues Analysing SAR data Software The other datasets…
Introduction to ESDS Government surveys and services Vanessa Higgins ESDS Government Centre for Census and Survey Research (CCSR) University of Manchester.
ESDS Government Resources ESDS Government Centre for Census and Survey Research University of Manchester.
ESDS Government Resources for Government Crime Surveys ESDS Government Centre for Census and Survey Research University of Manchester.
ESDS Resources for BCS Users Vanessa Higgins Centre for Census and Survey Research University of Manchester.
User views Jo Wathan SARs Support team
Using ESDS Government Resources for Health Research Dr. Anthony Rafferty ESDS Government Centre for Census and Survey Research University of Manchester.
1 Large-scale Government Surveys Benefits of the data Data covered by ESDS Aspects of the data in research Useful resources.
ESDS Government Resources for the LFS and APS Anthony Rafferty ESDS Government Centre for Census and Survey Research University of Manchester.
IHS: Requirements for Secondary Analysts Jo Wathan ESDS Government University of Manchester.
Data and Resources for Learning and Teaching from ESDS Government ESDS Government data and resources Issues for teachers.
ESDS Resources for BCS Users Vanessa Higgins ESDS Government Centre for Census and Survey Research University of Manchester.
Conference Programme Introduction to the Samples of Anonymised Records - Keith Spicer, ONS CCSR's role in providing SAR's support - Jo Wathan,
What would you use the data for? Straightforward secondary analysis –To assess theoretical accounts –To quantify characteristics or behaviours –To challenge.
Introduction to the key large-scale government surveys Vanessa Higgins, Jo Wathan and Reza Afkhami ESDS Government Centre for Census and Survey Research.
Using LFS Longitudinal & Household Datasets Marilyn Thomas & William Barnes Office for National Statistics.
Family Resources Survey Data Collection Methods Jo Maher (National Centre for Social Research) Tom Howe (Office for National Statistics)
Data and Resources for Learning and Teaching from ESDS Government ESDS Government data and resources Issues for teachers.
ESDS Resources Vanessa Higgins ESDS Government Centre for Census and Survey Research University of Manchester.
The Samples of Anonymised Records: Understanding Individual differences Mark Brown.
The Census Area Statistics Myles Gould Understanding area-level inequality & change.
SI0131 – Dissertation Week 5 Luke Sloan Using & Sourcing Secondary Data Week 5 Luke Sloan Using & Sourcing Secondary Data.
1 ESDS Government: added value for large-scale government datasets Vanessa Higgins, Economic and Social Data Service CCSR, University of Manchester MOF.
Accessing and Using ESDS Government surveys Vanessa Higgins ESDS Government Centre for Census and Survey Research (CCSR) University of Manchester.
Access routes to 2001 UK Census Microdata: Issues and Solutions Jo Wathan SARs support Unit, CCSR University of Manchester, UK
Settings, Practices and Data Access: Results of a Survey of UK Social Scientists Jo Wathan Centre for Census and Survey Research University of Manchester.
Shirley Crompton Source: Rob Allan. Institutional Repository Subject Repository Data Producer Repository share resources solve bigger problems integrate.
UK survey data available via the UK Data Service Sarah King-Hele Research Associate, User Support and Training ESRC Research Methods Festival St Catherine’s.
ESDS Resources Anthony Rafferty ESDS Government Centre for Census and Survey Research University of Manchester.
Using IPUMS.org Katie Genadek Minnesota Population Center University of Minnesota The IPUMS projects are funded by the National Science.
Plans for Access to UK Microdata from 2011 Census Emma White Office for National Statistics 24 May 2012.
Understanding Wales: Opportunities for Secondary Data Analysis Annual Population Survey/Labour Force Survey Melanie Jones School of Business.
ESDS Government Karen Dennison, Economic and Social Data Service UK Data Archive ESDS Awareness Day Belfast, 6 February 2004 Queens University.
SURVEYS WORKSHOP Aberdeen, 18 th May 2009 ScotStat Network of Analysts from Local Government and Other Public Bodies.
Panel discussion: Q2a A.S. Young ILO Bureau of Statistics.
 Using SHS Lite in support of policy development in Fife Coryn Barclay Community Budgeting Project Manager, Corporate Research, Fife Council.
Creating Open Data whilst maintaining confidentiality Philip Lowthian, Caroline Tudor Office for National Statistics 1.
Using ESDS Government Resources for Health Research Vanessa Higgins ESDS Government Centre for Census and Survey Research University of Manchester.
Samples of Anonymised Records from the U.K. Census 1991 and 2001 Integrating Census Microdata Workshop Barcelona th July 2005 Dr. Ed Fieldhouse Cathie.
Samples of Anonymised Records: a resource for ethnicity research
Presentation transcript:

An introduction to the large-scale Government Surveys & Samples of Anonymised Records Jo Wathan ESDS(Government) & SARs support team CCSR, University of Manchester

Today What data is available? What is it like? Considerations when using the data How are they used in research? How do you access them? Resources & Support

Why should you want to know? Because the data are... Very cost effective: data free of charge to academic researchers Saves time: no need to conduct survey Access to high quality, well documented data Can provide nationally representative data allows generalisation to population Allows historical and geographical comparisons to be made ESRC funded data support services

What data am I talking about? UK is particularly rich in microdata which is available for secondary analysis Today focus on cross-sectional microdata from government surveys and The Census –Samples of Anonymised Records –ESDS Government Surveys (e.g. LFS, GHS) Other major sources: –Longitudinal data (e.g. LS, BHPS) –International microdata (e.g. ESS) –ESDS core function/UK Data Archive –Aggregate data

The Samples of Anonymised Records (SARs) Microdata samples from Census 1991 & 2001 Available for the first time after research into the confidentiality risk More flexible than conventional aggregate tables SAR FilesIndividualHouseholdSmall Area Microdata 1991 (GB/NI) 2% with SAR area 1% with Region licensed data 3% with GOR (UK) 1% England & Wales only (special license) 5% with LA/UA/PC 2001 Controlled Access Microdata 3% with LA/UA/PC 1% with LA/UA/PC -

Whats in the SARs? UK Census Microdata Census has high response rate because compulsory –1991 only enumerated cases in data –2001 missing people are imputed Census topics only – brief self-completion form –Accomodation, transport, socio-economic characteristics, ethnicity, religion, health Anonymised and data limited to ensure confidentiality –Most restrictive in the end user license files for 2001, e.g. less geography in the individual and household files, age banded –Unusual cases perturbed Extremely large sample sizes!

ESDS Government Surveys General Household Survey Labour Force Survey Family Resources Survey Expenditure and Food Survey (previously the National Food Survey and Family Expenditure Survey) ONS Omnibus Survey National Travel Survey Time Use Survey British Crime Survey/Scottish Crime Survey British Social Attitudes/Scottish Social Attitudes/Northern Ireland Life & Times/Young Peoples Social Attitudes Health Survey for England/Wales/Scotland Survey of English Housing (England only)

What are ESDS Government data like? Nationally representative survey microdata Large sample sizes (but smaller than the SARs) Identifying information is removed Most are conducted on an annual basis Continuous surveys – always up-to-date Cross-sectional (although the LFS has a 5- quarter panel element) Specialist topic surveys – more depth than the Census

All of these microdata are: Individual information akin to the sort of data you would collect if you were conducting your own survey Need to be analysed in an appropriate software package (like SPSS or Stata) Cross-sectional snapshots (exception: the LFS is actually 5 snapshots per address!) Good quality collected by a professional data collection organisation –Office for National Statistics –National Centre for Social Research Collected for policy purposes Has good quality documentation & support services

Thinking about using the data? 1.What is your research question? 2.What evidence do you need to answer your research question? 3.Is the evidence you need already available check the literature and published reports. 4.Is cross-sectional secondary microdata appropriate for your research question? Is your question quantitative? Do you need to follow individuals over time? 5.Is data available?

Locating and assessing data Locating data: –What data is available for my topic? –Are the variables I need available? Assessing data for analysis: –What population is the sample drawn from? –What sampling scheme was used? –Do I need to weight?

What datasets cover my topic? Question Bank –has topic guides and a search engine across questionnaires Census topics: –Limited due to legislation, scale & self- completion; –View the codebooks to see what data is in which files on SARs web pages Finding topics in surveys: –Much wider range of topics from large number of different sources –ESDS Government topic guides on employment, health, social capital, Scotland –ESDS/UK Data Archive search engine

What variables are available for my topic? To understand the variables you have available –View the documentation/user guide –A list of variables & codings should be available –Information on how derived variables were created should be available –Double check in the dataset!

What do the variables mean? Unless... you can track your variable back to the question(s) asked on the questionnaire Know who the questions were asked of And what was done with the raw data to turn it into the final data... You dont understand the data

Routeing in the documentation: GHS

Variable Name : ECSTILO Variable Label : Economic status (harmonised) Topic : Employment Population : Adults Hhld/indiv.level : Individual Range : 1 to 10 Missing values : -6, -8 1 'Working (incl Unpaid FW' 2 'Gov sch with emp' 3 'Gov sch at coll' 4 'Unemployed (ILO)' 5 'Other Unemployed' 7 'Retired' 6 'Perm unable to work' 8 'Keeping house' 9 'Student' 10 'Other inactive' -8 'NA, ECSTA not known' -6 'Child/No int'. Derived variables DO IF SCHEDTYP = 3 OR AGE LT COMPUTE ECSTILO = -6. ELSE. + DO IF DVILO3A = 1. + DO IF SCHEMEET = 1. + DO IF TRN = 1. + COMPUTE ECSTILO = 2. + ELSE IF TRN = 2. + COMPUTE ECSTILO = 3. + END IF. + ELSE. + COMPUTE ECSTILO = 1. + END IF. + ELSE IF DVILO3A = 2. + COMPUTE ECSTILO = 4. + ELSE IF DVILO3A = 3. + DO IF YINACT = 1. + COMPUTE ECSTILO = 9. + ELSE IF YINACT = 2. + COMPUTE ECSTILO = 8. + ELSE IF YINACT = 3. + COMPUTE ECSTILO = 10.

The population base: nation Most large scale surveys seek to be nationally representative but what is a nation? –Labour Force Survey = UK –General Household Survey = GB (but strange things can happen North of the Caledonian Canal) –Health Survey for England = England –Not always apparent from the name –Increase of country-specific surveys following devolution Over 80% of the population live in England (9% Scotland, 5% Wales, 3% NI) so surveys designed for UK wide analyses will not generally have large enough samples to analyse separate countries

Population base: type of survey Most large scale surveys are household surveys they interview 1+ person in private households –This will exclude people in institutions –Has knock effects for particular topics; health, age etc. Surveys tend to gather limited information about children –May only relate to their existence age and relationships to other household members –There may also be other age restrictions on all or part of the survey

Population base - setting You may need to subset to obtain a reasonable database –SARs 1991 could double count visitors (at place of residence AND location on Census night) –SARs 2001 can double count students (at place of termtime residence AND parental address) –Need to subset to prevent double counting

The sampling strategy will affect your results Few data sources approximate simple random sampling – the SARs does Stratification increases the precision of estimates – the Labour Force Survey is stratified Clustering reduces the precision of estimates – e.g. the General Household Survey Many major surveys use stratification and clustering Guidance should be available in the documentation PEAS website

Disproportionate sampling The British Social Attitudes survey takes only 1 person per household –If left like this the chance of selection in the sample would be inversely proportional to the size of ones household Over-sampling in order to obtain satisfactory sample sizes for minority groups (often referred to as boosts) –Health Survey for England has done this with ethnic minorities

Weighting can be used to prevent bias from disproportionate sampling weightedunweighted Frequency% of allFrequency% of all Number in household including R? Q Total Dataset: British Social Attitudes Survey, 2003

Non-response trends – another reason for weighting Source: Barton in ESDS weighting guide

Imputation: 2001SARs Not ONC imputed ONC imputed White Mixed Asian Black Chinese/Other All

Exercise Suggest datasets which would fulfil the following criteria, for a range of employment projects: 1.A large up-to-date UK dataset with extensive questions on employment and training 2.The maximum possible sample size for a single time point to allow minority groups to be distinguished in analysis. 3.Any 1960s employment microdata 4.A dataset with extensive questions on income from sources other than just earnings 5.A dataset which could be used to look at attitudes to work

What would you use the data for? Straightforward secondary analysis –To assess theoretical accounts –To quantify characteristics or behaviours –To challenge official views –To apply alternative definitions Context to your own primary research –Your research could be quantitative or qualitative –To assess the national context of an area study –To assess whether your sample is typical –To assess the scale of behaviours

Practical research uses of the data Looking at change over time Look at sub-populations Using the flexibility of the data to look at alternative definitions Looking within households

Secondary analysis: change for subpopulations Marmot, M (2003)

Using successive cross-sectional data over time Pros… Reasonable amount of comparability Can pool years/quarters Data is representative at each time point Good at looking at impacts on groups Cons… Limits to continuity in the data (e.g. ethnic) Cannot establish individual change

Looking at small populations Many surveys with 10+k respondents –Permits minority groups to be represented –Rare subpopulations sample size may be too small… can consider combining years if appropriate Largest sample sizes available from the Samples of Anonymised Records –The Small Area Microdata file contains nearly 3 million records!

Survey data is subject to sampling error! Example: Pregnancy and Employment Using General Household Survey data alone there are only 168 pregnant women aged % Confidence interval for % pregnant women economically inactive 34.2 – 49.1% Combined 3 years data to obtain sample of 465 pregnant women Confidence interval using 3 years data: 34.9 – 43.9% Combining datasets to increase sample size

Using the flexibility of the data to look at alternative definitions What are hours worked? Is it just paid work? Or unpaid as well? Hours usually worked, or actually worked last week? In main job, or in any job? What about students? Overtime – paid? Overtime – unpaid? Lunch hours? Do non-workers work zero hours or should they be excluded?

Hierarchical data: conceptually Household 1 North West Social rented Person 1 HoH Female 28 GCSE P/T Work No LTILL Person 2 Son of HoH Male 12 N/A No LTILL Household 2 Wales Owner occupier Person 1 HoH Male 33 Degree F/T Employee No LTILL Person 2 Spouse of HOH Female 31 Degree P/T Employee No LTILL Person 3 Parent of HoH Female 72 No quals Econ Inactive LTILL

Source: Richard Dickens, Paul Gregg and Jonathan Wadsworth (2000) New Labour and the Labour Market, CMPO Working Paper Series 00/19 Table 5

Finding out about whats been/being done with the data User meetings –General Household Survey –Labour Force Survey –Health Surveys –Samples of Anonymised Records ESDS Government –Publications database –Usage pages

Accessing & Support Services The data teams: –ESDS Government –SARs team at CCSR Registering to use the data Special license and CAM data Getting support

SARs Data team CENSUS MICRODATA SUPPORT Register for the data Access SARs documentation for all SARs dataset Explore data online or download datasets in SPSS, Stata, or tab delimited form for: –1991 data, 2001 Individual licensed file, 2001 Small Area Microdata Information about 2001 Special Licence Household SAR – link to UK Data Archive for download

ESDS Government MAJOR CROSS-SECTIONAL UK SURVEYS Survey pages Introductory guides and resources including topic guides, weighting guide, software guides Links to relevant external resources Links to the UK Data Archive for –Register for the data –Download the data in Stata, SPSS etc. –Explore the data online in Nesstar –Access documentation

The licence All users need to be licensed Academics complete license as part of the Census Registration System Process Non-academic users contact UK Data Archive (Surveys) or CCSR (SARs) to arrange registration – charges may apply Cannot pass the data to an unlicensed user Cannot attempt to identify an individual

The licence – good practice Keep your data password protected Destroy your data when you have finished using it Remove files before passing on your PC to someone else Tell the data team about your publications Tell the data team if you leave your institution

Special licence files Special licence is new way of making more detailed data available to social researchers –Annual Population Survey data –Household SAR 2001 Full & legally binding paper registration process – requires institutional signature & ONS approval Must agree to extensive data stewardship conditions

Controlled Access Microdata SARs Controlled Access Microdata designed for professional researchers who have no other data options open to them Access in safe setting only at ONS site Specification on SARs website Individual file and Household file Files contains much more detail; e.g. –Individual year of age (topcoded at 95) –Full coding on country of birth –SOC Unit Goup –Local authority geography –Index of Deprivation for SOAs –Index of Deprivation for migrants last address Further information and appropriate forms at Contact for more details

User support SARs: helpdesk tel: (0161) SARS jiscmail list ESDS Government: helpdesk tel: (0161) ESDS-Govsurveys jiscmail list