Statistical Research Update Becky Tinsley Louise Morris
Overview Brief reminder of what you saw last April Update on the research we have been doing on population estimates using admin data Findings from some case studies we have undertaken to produce statistics about population characteristics using admin data
Population Estimates
Framework for producing population estimates using linked admin data
Last time.....on matching Recall = percentage of true matches, out of all possible true matches available Precision = percentage of true matches, out of all the matches that Beyond 2011 made
New developments – associative matching
Framework for producing population estimates using linked admin data
NHS patient register DWP/HMRC Customer information system 1% coverage survey HESA data (students) population estimates Statistical Population Dataset (SPD) Last time.....on SPDs
SPD 5 Admin data method lower than 2011 Census Admin data method higher than 2011 Census
Population Pyramids using admin data
New developments – SPDs by quinary age, sex, and LA
New developments – SPDs at OA level Percentage of OAs Percentage difference from 2011 Census Estimates
New developments – evidence-based rules Research findingsRefinement Some overseas students do not work whilst they are at university, and some overseas students are unable to work due to the nature of their visas. As a result, they are unlikely to register for a National Insurance Number, thus not appearing on the CIS dataset. From current rules, this means that they will not be counted in the SPD, resulting in an undercount of students. A new rule: If a student appears on the HESA dataset and are linked with either the PR or the CIS, include them in the SPD Children are only included in the CIS dataset if their parents claim Child Benefit. We have seen an undercount of children in the SPD which supports this and in particular affects 0-4 year olds. This will worsen as changes to Child Benefit eligibility rules are implemented. A new rule: All children aged 0-4 on the PR will be included in the SPD, even if they are not found on the CIS School Census data contains accurate, regularly updated address information for children. A new rule: All children aged 5 to 15 who are either on PR or CIS and on School Census are included in the SPD. The SC address takes precedence over addresses on the other sources (much in the same way as HESA does for students) In some instances, people who have died are not removed from some data sources. A new rule: any records that link to a death registration are removed from the SPD
SPD 5
Using only PR for 0- 4 year olds SPD 9
New developments – the potential of ‘activity’ data
Framework for producing population estimates using linked admin data
Questions
Population Characteristics
Population Characteristics – last time Statistics about population and household characteristics Population - ethnicity, education qualifications, health status etc Household & housing – household size, accommodation type etc Single variable and cross-classifications Range of geographic areas Integrated system: Combine admin data & direct data collection (survey) Survey (census) initially - more use of admin data and modelling as coverage of topic improves, methods develop Trade-off – detail vs. frequency: Need to better understand requirements for small area attribute data? Outputs from 4% survey design – combining across time and consistency issues (complex output structure)
Consultation & Research Conclusions Clear need for multivariate statistics for small populations, within small geographic areas Admin data will be key to production of these outputs Survey based approach cannot provide detail o Insufficient power to measure differences within LAs o Between subgroups, or through time Significant further research to explore the potential of admin data in this context o Information available o Methods of application
Admin data research – last time High level assessment of wide range of data sources (approx 150) for socio-demographic coverage Considered direct and indirect use Focussed on shortlist of priority topics Coverage of characteristics data but also challenges: For some sources - limited population coverage quality issues (definitional differences, limited response etc) Priority list of topics and sources identified (M12) Initial thinking on other applications: Covariates – for modelling/integrated system approach Modelling health variable using health records Combining census and admin records on education qualifications
Further research - case studies Initial relatively simple applications Based on assessment of available data sources Significant further research needed Better understand sources & develop methods of application Considered options for potential applications E.g. direct use, model-based application Research of an exploratory nature - no conclusions drawn Objective: to help inform development of a long- term plan
Case studies – topics & aims TopicAimData EthnicityComparison of admin sources with 2011 census responses to: compare consistency provide initial assessment of coverage/ definitional differences. Access to record level data Household Estimates Unemployment Estimates Potential for model-based estimates at lower geographies Access to aggregate data IncomeInitial assessment of issues in direct use of admin data No data access
2011 Census ethnicity English School Census ethnicity White British Irish Irish Traveller/ Gypsy/Romany Indian Bangladeshi Pakistani White and Asian Other Asian Chinese African White and Black African Caribbean White and Black Caribbean Other White Other Black Other Mixed Other Ethnicity Missing Total (denominator) White British95%0.50% 2%0.50% 2%5,048,672 Irish41%47%1%0.50% 1%0.50% 5%0.50%3%0.50%2%22,609 Irish Traveller/ Gypsy/Romany 35%2%54%0.50% 6%0.50% 1%2%9,150 Indian 0.50 % 89%0.50%1% 5%0.50% 1%0.50%2%169,609 Bangladeshi 0.50 % 92%1%0.50%2%0.50% 4%99,905 Pakistani 0.50 % 1%0.50%86%1%4%0.50% 1%3%4%252,189 White and Asian 11%0.50% 1%0.50%2%54%3%0.50% 3%0.50%15%3%4%82,152 Other Asian1%0.50% 12%0.50%2% 58%1%0.50% 4%17%2%84,028 Chinese2%0.50% 1%2%83%0.50% 7%2% 27,577 African1%0.50% 1%0.50%83%1% 0.50%1%7%2%1%3%190,489 White and Black African 6%0.50% 8%55%1%3% 14%2%4%38,611 Caribbean1%0.50% 3%0.50%77%3%0.50%9%3%1%4%71,256 White and Black Caribbean 12%0.50% 2%3%62%1%2%12%1%4%108,920 Other White8%0.50%1%0.50% 1%0.50% 75%0.50%6%5%3%169,626 Other Black1%0.50% 34%1%30%2%0.50%20%6%1%4%27,625 Other Mixed10%0.50% 1%0.50%1%5%3%1%2%3%2%8%5%4%47%5% 23,763 Other Ethnicity5%0.50% 2%12%0.50%2%1%0.50% 10%2%10%50%4%66,760 Missing69%0.50% 2%1%3%1%2%0.50%4%0.50%2%1%5%1%2% 3%222,193 Census and England School Census Ethnicity
Census Comparison of Percentage of Households of Each Size for selected LAs Percentage of Households Household Size Birmingham Boston Bournemouth Brent Cambridge Camden Cardiff Ceredigion Cheshire East Chesterfield Coventry East Devon Eastbourne Forest Heath Herefordshire, County of Hillingdon Kensington and Chelsea Kingston upon Thames Lambeth Leicester Manchester Newcastle upon Tyne Newham Northumberland Oxford Powys Reading Richmondshire Rotherham Stratford-on-Avon Tonbridge and Malling Waltham Forest Warwick Waveney Westminster Wirral Administrative Data Method
Unemployment estimates Overview Methodology focus – model based estimation Extending current LA level approach: No census covariates Uses DWP jobseekers allowance data (aggregate) Predicting unemployment as a proportion of population within MSOA Findings Did not perform well compared to other SAE models Confidence Intervals showed high level of uncertaninty: Limited ability to distinguish differences between MSOAs Assessment of CVs provided comparison against quality standards…… Standard – attribute for 3% of population estimated with cv of 20% or less Model: CVs consistently high - >20.1%, median 30% Highlights need to think about methodological issues Models applied for one topic / geographic area don’t necessarily work elsewhere Access to record level data will help understand sources and allow scope for other applications
Income case study Income – key topic of interest to users Not included – response concerns Potential sources: admin data - definitional issues Surveys - limited geographic level, Available sources: Range of administrative sources – PAYE, benefits, pension etc Survey and model based estimates – ONS and DWP (FRS), HMRC Opportunities – collaborative working with other departments: Better understand sources Review statistical methods / applications – Survey of Personal Income Solution – combining admin data and survey sources?
Admin data case studies - findings Potential for using admin and survey data in combination to produce statistics about population characteristics High level of agreement for children between ethnicity on School Census and Census (but differed by ethnic group) Distribution of household size and composition on admin sources similar to Census However differences due to definitions, collection processes, classifications and lags And need to think about methods Will carry on research – more on this tomorrow
Questions