Download presentation
Presentation is loading. Please wait.
1
Identifying the cash-rich and the cash-poor: Lessons from the Census Rehearsal Dr Paul Williamson Department of Geography ESRC Census Development Programme
2
Most requested addition to 2001 Census INCOME…
3
The 2001 Census Geography of income:
4
Other sources of data on income Benefits data Government surveys (e.g. GHS, LFS, FES, FRS, NES) Commercially-held data [Postcode sector and postcode unit estimates] The Census Rehearsal (1999)
5
Objectives Evaluation of: Extant methods for small-area income estimation New approaches Utility of non-census information (e.g. council tax; house price; benefits data) [ Methods of imputing income band means ]
6
Definition of ‘income’ Income Wealth Gross or net income? Pre or post housing costs? Adult or Household? Household? –Total –Equivalised [Per capita / OECD / McClements]
7
Surrogates Univariate –% unemployed –% 2+ car households –% residents in Social Classes I + II –% owner-occupation
8
Multivariate (deprivation indices) –Carstairs –Townsend –Breadline –DLTR Index of Multiple Deprivation 2000 –Green (Wealth) [owning 2+ cars; NS-SEC I or II; High qualifications]
9
Geodemographic –SuperProfiles –MOSAIC –GB Profiles
10
Model Individual income –Dale (SOC2000; Economic activity; age; sex; Region] –Lee (SOC2000; Economic activity] –Regression (individual and/or ecological) Household income –Regression (household and/or ecological) –Bramley & Smart (H/h comp.; earners; tenure; area level deprivation)
11
The 1999 Census Rehearsal Key features full census questionnaire + INCOME Large achieved sample Spatially contiguous –c. 65,000 households –c. 140,000 individuals
12
Clustered sampling strategy: –7 part districts [Excluding NI] –38 wards –650 EDs
13
non-response rate –overall (~ 50%) –income (~15%) –other variables (5-20%) –full responses for ~ 55 % of achieved sample [individuals and households] non-response bias Potential problems
17
Banding of income question What is your total current gross income from all sources? Per week or Per year (approximately) Nil_ Nil Less than £60_ Less than £3,000 £60 to £119_ £3,000 to £5,999 £120 to £199_ £6,000 to £9,999 £200 to £299_ £10,000 to £14,999 £300 to £479_ £15,000 to £24,999 £480 or more_ £25,000 or more
18
–Only 10% of adults in top band –but problem compounded when individual incomes aggregated to estimate household income –band mid-point band mean –value of band means area sensitive?
19
Source: FRS 1998/9 (Crown Copyright)
20
Digression: modelling income band means Alternative modelling strategies include: National mean Sub-group mean (e.g. by council tax band) Statistical distributions (log-normal; pareto) New variant of log-normal approach with addition of modelled median etc.
21
Results For all bands sub-group mean best – if possible For closed-bands, national mean is next best For open (top) band, new proposed log- normal approach is best, particularly where there is evidence of strong spatial clustering
22
–At what scale does income vary most? MAUP –1991 vs 1998/9 boundaries –zones with <10 households or 25 residents excluded from analysis SOC 2000 / NS-SEC –Lack of alternative SOC2000 coded data –Therefore have to use Census Rehearsal data –Use partitioned data to avoid unduly advantaging SOC2000 based approaches Spatial scale
23
Results
24
Census Rehearsal Income Distribution
25
At ward level the % household reps. in top income-band averaged 9.1% – but ranged from 2.8% to 21.6% 89% of EDs contained one or more household reps. in top income-band –i.e. in top income-decile of the population Heterogeneity rules OK!
27
Missing data Missing data have minimal impact on results –From ‘Raw’ to ‘Ideal’ data, most correlations change by <0.02 –Very few values change by >0.05 –Exception is NS-SEC 8 [by definition!] –Correlations lower for ‘Ideal’ than ‘Raw’ Surrogates calculated direct from Rehearsal –circumvents data response bias?
28
Scale Higher correlations at higher geographies District effect small but significant –BUT none of districts in SE England Overfitting No significant impact
29
MAUP Correlations vary by up to 0.1 between alternative boundaries at same spatial scale BUT No detectable effect on rankings of surrogate income measures
30
Adult income (r 2 )
31
Caveats ‘Best’ performing surrogates in danger of over-fitting? –For Dale, Lee and Voas mean occupational income calculated directly from Census Rehearsal dataset (no other SOC2000 sources available at time of analysis) BUT –No significant difference if SOC minor or unit codes used –No significant difference if data partitioned
32
Household income (r 2 )
33
Accuracy For many purposes relative, rather than absolute, accuracy is most important ranking
38
< 1% of unexplained spatial variation in income attributable to area level effects House price has no significant impact –could be due to data problems Council tax band has small but significant effect [for areas of enumeration district size and below] Lack of utility counter-intuitive? –current value purchase price –purchase income current income Other data sources
39
Conclusions (I) Best approaches capture 80-90% of spatial variation in income, even for smallest spatial units But considerable within-area heterogeneity Best approaches are regression or sub- group mean based Conventional deprivation indices a poor second to % social class / NS-SEC I+II
40
Conclusions (II) Geodemographic classifications at best perform as well as % NS-SEC I+II, and perform best for areas of ward size and above Qualified support for use of statistical distributions in modelling top income band means
41
Implications Moral for marketers: Target people, not places Moral for policy makers: Deprivation indices not the best proxy for income ONS ward income estimates (based on ecological regression) likely to perform well
42
Longer term Consider external correlates (e.g. IMD 2000; benefits data) Lobby for Census Office to create small- area income estimate –by imputing income on Census microdata –include non-census information (?)
43
Acknowledgements House price data were taken from the Experían Limited Postal Sector Data, ESRC/JISC Agreement. Grateful thanks are due to the Census Custodians of England, Wales and Scotland for granting permission to access the Census Rehearsal dataset. A debt of gratitude is also owed to a number at the Office for National Statistics, in particular Keith Whitfield and Philip Clarke. Finally, thanks are due to David Voas for undertaking some of the preparatory work for this project. All analyses and conclusions remain my sole responsibility.
44
Definitions (I) NS-SEC I+II: % persons aged 16-74 in NS-SEC I or II Townsend: Multiple deprivation indicator based on % economically active unemployed; % overcrowded households; % households with no car and % of households not owner occupied Green (Wealth): Affluence indicator based on % households with 2+ cars; % persons aged 16-74 in NS-SEC I and % adults with high educational qualifications PCA_96: Geodemographic classification based on principal components analysis of 20 normalised census variables, individuals in each of 96 area types assumed to have mean income of all persons in area type Voas: Alternative geodemographic classification, in which five census variables are divided into above or below median, one variable into thirds; with all cross-tabulated to give a total of 96 discrete area types
45
Definitions (II) Dale: Income imputed given mean income for population sub-group defined by sex, SOC 2000 minor group, economic activity (missing; employed full-time; employed part-time; self-employed; other), age (missing; 0-15; 16-19; 20-29; 30-49; 50+) [Maximum of 4860 valid sub-groups] Lee: Income imputed given mean income for population sub-group defined by SOC 2000 minor group, economic activity (child; not applicable; employed full-time; employed part-time; self-employed; unemployed; retired; other inactive) [maximum of 649 valid sub- groups]
46
Definitions (III) Voas (individual): Regression model for adult income (children assumed to have 0 income); INCOME 0.5 predicted given: mean income by SOC2000 unit; mean income by Industry category, age, age 2, residents, residents 2, rooms and cars plus dummy variables for sex, white, full-time student, married, Single/Widowed/Divorced, Long- term ill, No qualifications, GCSE or equivalent, A levels or equivalent, Undergraduate degree or equivalent, employed full-time, employed part-time, self-employed, unemployed, retired, permanently sick, other economically inactive excluding pensioners and students, Semi-detached, terrace, flat, caravan, privately rented, social rented, employed manager or supervisor and district of residence Voas (household): Regression model for total household income; HHINC 0.5 predicted given same set of predictors as for Voas (individual), but based only upon head of household’s characteristics
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.