tinyurl.com/ydfyav8p
Data Sources and Data Challenges
https://data.gov.uk/publisher Data Publishers We will be taking a look at a small number of national datasets produced by several data publishers: For a full list see: https://data.gov.uk/publisher
Datasets, reports and documentation Datasets are often produced as part of a series. For example, the children looked after statistics produced by the DfE are released in September every year. Depending on your question, you may simply be interested in the latest release, or all entries in the series. As a dataset is released so does a report which typically covers: Headline statements. Current status of each measure in the dataset. Any changes in trend from the previous releases. These reports are a good way for you to quickly assess how useful a dataset might be.
Areas and Mapping
Areas and Mapping Two main ways of defining a geographical area: Administrative boundaries Output Areas and Super Output Areas For more information on these, as well as information on other geographical units go to: https://www.ons.gov.uk/methodology/geography/ukgeographies
Administrative Boundaries for England
Administrative Boundaries for England Electoral wards/divisions are the building blocks of administrative and electoral areas. In 2016 there were 7,445 electoral wards/divisions in England. They can be subject to change and changes can occur every year. Boundaries are reviewed by the Boundary Commission (LGBCE) and they try to maintain equal populations across neighbouring wards for the purposes of voting.
Output Areas for England Specifically designed for analysis. Output Areas were introduced as part of the 2001 Census for England and Wales, and last updated as part of the 2011 Census. Built from clusters of neighbouring postcodes. Each OA was designed to be as socially homogenous as possible based on tenure of household and dwelling type. Mixtures of urban and rural postcodes were avoided where possible. Output Area has an average population size of 309. 32,844 Lower-layer Super Output Areas (LSOA) 6,791 Middle-layer Super Output Areas (MSOA) OAs and SOAs were designed so they can be aligned to local authority district (LAD) boundaries.
Output Areas for England
How to assign data to an area? If your data has post codes, you can use the ONS Postcode Directory to lookup the Ward, Output Area, LSOA, etc. The ONSPD provides lots of different areas, some historic. Key columns in the ONSPD: PCD – Postcode DOINTR – Date of introduction DOTERM – Date of termination OSWARD – Electoral ward division OSLAUA – Local authority district (LAD, LB, UA, MB) OA01, OA11 – 2001 and 2011 Census Output Area LSOA01, LSOA11 – 2001 and 2011 Census LSOA MSOA01, MSOA11 – 2001 and 2011 Census MSOA For a full definition see the User Guide that you download with the ONSPD. [Example: ONSPD_AUG_UK_2017_UK_LA.csv]
Getting area boundaries Just some tips when trying to find boundaries online. ONS provide boundaries at each of the different area levels. http://geoportal.statistics.gov.uk/datasets?q=Administrative%20Boundaries Resolution: Full – highest resolution, used for Advanced GIS analysis, very large file sizes (Super | Ultra) Generalised – much lower resolution, file size can be 10% of ‘Full’, used for visualisation Extent: Extent of the realm – used for analysis Clipped to the coastline – used for visualisation, as maps often resemble what someone would expect the costal boundary to look like. Map boundaries can come in a variety of formats: Shapefile, KML, GeoJSON, WMN or WFS
Resolution: Full vs Generalised
Population Estimates and Projections
ONS population estimates What are they? They represent the number of people who usually live in an area on 30 June of each year, regardless of nationality. ONS provide annual population estimates at all the area levels already mentioned. Annually estimates are available from 1981 onwards. Estimates for an area are provided by gender and year of age (capped at 90 and above) ONS use a variety of data sources and methods to produce the estimates, information on this can be found in their “QMI report”. They release estimates for the previous year normally on June each year.
ONS population estimates What do the estimates look like? https://www.ons.gov.uk/visualisations/dvc411/pyramids/pyramids/pyramids.html What can population estimates be used for? Used by local government and the health sector, where they are used for planning and monitoring service delivery, and managing resource allocation. Often used as an input. So the population estimates by themselves are not necessarily helpful. But can be used to calculate measures such as rates. Where can I find the data? Two download options from ONS population estimates page: Mid-2016 single spreadsheet – good for visual inspection Mid-2016 detailed time series – good for loading into a database and using for analysis
Rates of stop and search (1) How does the rate of stop and searches differ by age? Define rate: 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑒𝑜𝑝𝑙𝑒 𝑠𝑡𝑜𝑝𝑝𝑒𝑑 𝑆𝑖𝑧𝑒 𝑜𝑓 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 ×10,000 Look at Lancashire, Q4 (Oct, Nov, Dec) 2016 Age Pop. Est. Number Stopped Rate 0-9 140,000 10-17 106,700 150 14 18-24 108,600 222 20 25-34 142,800 175 12
Rates of stop and search (2) Table of rates in Lancashire, from Q2 2015 to Q4 2016 Age 2015 2016 Q2 Q3 Q4 Q1 0-9 10-17 23 31 38 32 21 13 14 18-24 36 46 37 28 18 20 25-34 19 25 12
Rates of stop and search (3) Figures of counts and rates in Lancashire, from Q2 2015 to Q4 2016
ONS population projections What are they? Projections are published every 2-3 years, but there’s no set timetable. ONS provide annual estimates of the future size, gender and age structure of the population, down to the Local Authority District level. Latest release was published in 2016, and contains annual projections from mid-2016 up to mid-2116 Estimates are provided by gender and year of age. Uses mid-year population estimates, assumptions about future fertility, mortality and migration. They represent the number of people who are expected to be usually living in an area on 30 June of each year, regardless of nationality.
ONS population projections What do the projections look like?
ONS population projections What can population projections be used for? Similar usage as population estimates. Used by local government and the health sector, where they are used for planning service delivery, and managing future resource allocation. Often used as an input, along with existing rates to calculate future numbers. Forecasting methods can use the population projections to improve the quality of predicted values. Where can I find the data? Several dataset options from ONS population projections page: Table A2-4: Principal projection - England population in age groups Table 2: Subnational Population Projections for Local Authorities in England
English Indices of Deprivation
English Indices of Deprivation What are they? A measure of relative deprivation in small areas, not actual deprivation. Good for identifying the most deprived area, not deprived people. Good for comparing small areas across England, but not Scotland, Wales and Northern Ireland. A general measure of deprivation is published called the Index of Multiple Deprivation (IMD). IMD scores are provided for each LSOA, and averages are provided for the Local Authority Districts. As well as the overall IMD measure, scores are provided across seven domains.
English Indices of Deprivation What have they been used for? Targeting resources: Use by national and local organisations to identify places for prioritising resources and more effective targeting of funding. Example: used to distribute £448m of funding to local authorities for the Troubled Families Programme. Example: the most deprived 15 per cent of neighbourhoods were eligible for insulation measures from energy companies. Policy and strategy: Development of evidence base to help understand current need and demand for services. Example: used to assess the equality of access to local health and other services. Research: understanding challenges and performance of different areas subject to different services and policies. Example: Relationship between pupil attainment and neighbourhood deprivation.
Seven domains of deprivation Income Deprivation - proportion of the population experiencing deprivation relating to low income. Employment Deprivation – proportion of the working-age population who would like to work but are unable to do so.
Seven domains of deprivation Education Deprivation – lack of attainment and skills in the local population. Health Deprivation – measures risk of premature death and the impairment of quality of like through poor health.
Seven domains of deprivation Crime – risk of personal and material victimisation at a local level. Barriers to Housing and Services – physical and financial accessibility of housing and local services.
Seven domains of deprivation Living Environment Deprivation – quality of the local environment: housing, air quality, road traffic accidents.
Association between the domains
English Indices of Deprivation Where can I find the data? The Indices have already been aggregated and summarised in lots of different ways at the LSOA and Local Authority level. https://www.gov.uk/government/statistics/english-indices-of- deprivation-2015 File 2: Domains of deprivation – Ranks of each LSOA regarding the Index of multiple deprivation, as well as in each of the seven domains. File 5: Scores for the indices of deprivation – Score for each LSOA regarding IMD and each of the seven domains. File 10: Local Authority District Summaries – Average rank for IMD and the seven domains for each Local Authority District.
Pupil Absence
Pupil Absence What is it? DfE reports on absence levels of all pupils using two key measures: One session is equal to half a day. Overall absence: total of authorised and unauthorised absent sessions, including illness. Overall absence rate: 𝑇𝑜𝑡𝑎𝑙 𝑎𝑏𝑠𝑒𝑛𝑐𝑒 𝑠𝑒𝑠𝑠𝑖𝑜𝑛𝑠 𝑇𝑜𝑡𝑎𝑙 𝑠𝑒𝑠𝑠𝑖𝑜𝑛𝑠 ×100 Persistent absence: overall absence equates to 10 percent or more of the possible sessions. Persistent absence rate: 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑢𝑝𝑖𝑙𝑠 𝑐𝑙𝑎𝑠𝑠 𝑎𝑠 𝑝𝑒𝑟𝑠𝑖𝑠𝑡𝑒𝑛𝑡 𝑎𝑏𝑠𝑒𝑛𝑡𝑒𝑠𝑠 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑢𝑝𝑖𝑙𝑠 ×100
Pupil Absence What level is are the rates reported at? National, local authority, and school. What is it used for? Specialist publications such as the Good Schools Guide. Development of policy (Education Policy Institute). Research: Does missing one week of school lead to lower grades?
Pupil Absence What does the data look like? 2015/16 map of local authority overall absence rates. Lancashire appears to be fairly low. Liverpool, Blackpool, the Wirral fairly high.
Pupil Absence Where can I find the data? https://www.gov.uk/government/collections/statistics-pupil-absence Date released by Autumn term, Spring term, or as a full-year release.] Main Text – this is the associated report. Main Tables – pretty tables, easy to read. Underlying Data – good for loading into stats software.
Limitations of area-based analyses
Ecological fallacy The ecological fallacy is when someone takes inferences and associations made at an area-level and tries to make statements about individual people. We cannot use information we have about an area to make inferences about an individual. Associations at an area-level may not exist at the individual level. Example: over a 12 month period look at two groups of pupils in terms of: whether or not a pupil has been absent whether or not they have been stopped and searched.
Example Absent Stopped Size Absent Stopped 10 4 3 5 Absent Stopped
Example Area ID Absent Stopped 1 36 15 2 42 9 3 35 17 4 33 5 39 13 6 40 7 8 32 10 26 Can only make statements about the area. From the example, it’s true to say in areas of higher absence we have higher rates of children getting stopped and searched.
Finally
Two data sources I have not talked about Understanding Society Crime Survey for England and Wales https://www.understandingsociety.ac.uk/ http://www.crimesurvey.co.uk/ UK household longitudinal survey. Monitors the extent of crime. Interview people every year, of all ages about their lives. Used to evaluate and develop crime deduction policies. Publish data at the Police Force Area. National, regional and local data. Sample size in 2015/16 was 35,000. Annual findings reports Minimum of 650 per police force area. Case studies. Amazing data documentation. Data for these is accessed via the UK Data Service which is beyond this session.
Lab exercise: Explore! Pick a data publisher or a dataset Find the latest release and the corresponding report. Look for the Underlying data. What’s the lowest area-level available for the data? How might you use it? Are you interested in having data over time, or just one year snapshot? TIP: If you get lost in the streams of tables that get published, go back to Google and try again.