Jean-Luc LIPATZ INSEE - France 2007/10 Using gridded census data to analyze socio-spatial structure of french cities Short history of grids in the INSEE
Page 2 EFGS meeting JL.LIPATZ 2009/10 1) The used of gridded data in the INSEE 2) The production of gridded data in the complex environment of the french Census
Page 3 EFGS meeting JL.LIPATZ 2009/10 Starting point Sub-city districts for public action › A question from the DIV (« Délégation Interministérielle à la Ville »), ministry responsible for urban social policies (2005) –Context : 2005 urban riots. Are public actions ineffective or geographical areas for them badly choosen? ‐ Redesign new one ‐ Check relevance of existing ones –Question : How to check the relevance of deprived districts design by local authorities? › Cannot use existing zones –Existing districts : outdated, partial –Existing output area for statistical products: too large, too much internal heterogeneity –No data source was completely usable at point level. –=> Use more detailed data but how transforming a set of points to a boundary of zone?
Page 4 EFGS meeting JL.LIPATZ 2009/10 The tool – an example, what data says… Poitiers - Health insurance register Blue : existing deprived districts Red : areas of high probabiblity of low income Grey shade : population density 200 m² grid cells clusterSurfaceEffectif totalPart Sous-populationxy Z Z Z Z Z
Page 5 EFGS meeting JL.LIPATZ 2009/10 … and an effective result Blue: new deprived districts as defined by local government
Page 6 EFGS meeting JL.LIPATZ 2009/10 The tool – how it works Grids everywhere! › Probability density estimates using kernel method –-> gridded data instead of individual data ‐ Part of data cannot be fully (up to the adress) localized ‐ Quicker processes without quality loss ‐ Weaker confidentiality issues allowing use in regional delegations of INSEE –Estimate 1: Whole Population in the data source –Estimate 2: « Deprived » population relative to this data source › Ratio of probabilities to compute relative risk –-> grid cells as a support of estimated functions › Cartography of high estimated risk › Zones are a selection of contiguous grid cells using an automatic rule –Signal but not design
Page 7 EFGS meeting JL.LIPATZ 2009/10 From data to final map Sub population1) Simplify the maps 4)Superpose the maps Whole population 2) Combine the maps 3) Extract the outlines Rough data Probability estimates Relative risk estimate
Page 8 EFGS meeting JL.LIPATZ 2009/10 And census? › The tool is now used (within INSEE) to describe other phenomenas, with every available source › Using the census –Small LAU2s (out of reach for the tool : no detail for small geographical levels, but mainly not urban) ‐ Exhaustive ‐ Data collection over 5 years (each LAU in one year) –Large LAU2s (city cores) ‐ Sample 40 % ‐ Addresses register maintained for smapling purpose, used as a reference when localizing administrative registers ‐ Data collection over 5 five years
Page 9 EFGS meeting JL.LIPATZ 2009/10 Idea › To compare census data and administrative at location where they are both available to estimate together : –The administrative bias –The time shift
Page 10 EFGS meeting JL.LIPATZ 2009/10 Filling the gaps of census collection Collected census data Data from an administrative register An address from the census sampling register Can we deduce this from that?
Page 11 EFGS meeting JL.LIPATZ 2009/10 GWR › A regression, but not a global one –Standard regression gives correlated residuals : spatial distribution will be biased –Regression models with autocorrelated residuals seem not to be applicable easily (different variograms for different city parts) => Local regressions (Geographical Weighted Regression cf. Fotheringham)
Page 12 EFGS meeting JL.LIPATZ 2009/10 GWR Space as an explicative factor Estimates Local subsets for regressions Decreasing weights with the distance
Page 13 EFGS meeting JL.LIPATZ 2009/10 Data Grids coming back! › Two kind of data –Census data + explicative data (administrative and dwellings from the address register) –Explicative data only –Administrative data not connected to the address register (20 %) is ignored but corresponding addresses are used with zeroed administrative data › …added up to avoid singularity problem in matrixes during estimation –-> grids › Multiplication of cells by intersecting with: –Housing type –Administrative districts
Page 14 EFGS meeting JL.LIPATZ 2009/10 Internals › Weights –Actual weighting function doesn’t really matter –Classical –Added penalty (doubled distance) when cells have different building types (houses vs. Appartments) › Radius –Derived from a fixed number of neighbours –Actual number of neighbours minimizes the Aikake Information Criterion (AICC)
Page 15 EFGS meeting JL.LIPATZ 2009/10 Prediction Small Area estimation is : Unable to compute locally Not spatial error term : ignored Spatial trend
Page 16 EFGS meeting JL.LIPATZ 2009/10 Accuracy questions › Key issue is spatial autocorrelation –Local regressions behaviour (adjusted R², residuals) –Classical LISAs (local Moran…) › But no local accurary measure › Just a trend anyway › => Validation at global level, where census gives its own figure (mainly Horwitz/Thomson estimate) –Must include omitted correction term –Theorically the GWR gives best results, but there is no estimation of accuracy in both cases (now developping simulations to produce abacuses)
Page 17 EFGS meeting JL.LIPATZ 2009/10 Example Young people in Toulouse › From census : (5 years) › From estimations –Model (1year of data collection) –With fiscal source : –With health insurance source : 95440
Page 18 EFGS meeting JL.LIPATZ 2009/10 Strasbourg – High unemployment areas Estimations with 2004, 2005, 2006 census surveys Final census figures (5 years estimation) Estimated populations in deprived districts
Page 19 EFGS meeting JL.LIPATZ 2009/10 Thank you Any question?