Presentation is loading. Please wait.

Presentation is loading. Please wait.

4 th International Conference on e-Social Science: Workshop 5: Agent-Based Modelling for the Spatial-Social Sciences. 2008-06-18 Reconstruction of the.

Similar presentations


Presentation on theme: "4 th International Conference on e-Social Science: Workshop 5: Agent-Based Modelling for the Spatial-Social Sciences. 2008-06-18 Reconstruction of the."— Presentation transcript:

1 4 th International Conference on e-Social Science: Workshop 5: Agent-Based Modelling for the Spatial-Social Sciences. 2008-06-18 Reconstruction of the entire UK population using microsimulation Andy Turner http://www.geog.leeds.ac.uk/people/a.turner/

2 Overview Introduction What Why How What next

3 Introduction The title is a bit odd and vague –“reconstruction… using microsimulation” I can only guess what this is. I don’t think this presentation addresses that. Hopefully it does address something relevant and of interest!

4 This presentation focuses on: Developing digital demographic data for the UK –A reconstruction of data which has existed for 2001 since around 2003. MoSeS Genetic Algorithm that attempts to reconstruct individual level data for every individual in the UK in 2001 How you can reconstruct the MoSeS reconstruction

5 What is MoSeS? Modelling and Simulation for e-Social Science –http://www.ncess.ac.uk/research/nodes/MoSeS/http://www.ncess.ac.uk/research/nodes/MoSeS/ –e-Social Science being the application of e-Science concepts to social science problem domains e-Science is enhanced science that uses the Internet, software tools and structured information for collaborative work A first phase research node of NCeSS –Part of a UK collaborative partnership developing e-Social Science –The key part of it’s program of work is to develop an individually based demographic model of the UK for 2001 to 2031 MoSeS people

6 What am I on about and what do we want? UK demographic data reconstruction for 2001. The UK demographic data we want largely exists as 2001 human population (census) data, but it is not available as 2001 census output

7 Why do we want it? Reconstructed data is input into a dynamic model that operates at the individual and household level to simulate population change for MoSeS applications. Belinda and/or Mark will be talking about the dynamic model work later on It is theorised that in order to be realistic and of use in local service and transport planning, the demographic models have to operate at this individual and household level.

8 Enriching the base population Efforts are being made to enrich the census reconstruction with additional data from other sources (e.g. British Household Panel Survey) The results of this data integration are new constructions, data that has not previously existed. The idea is to add non-census variables to the base census data reconstruction. Chengchao Zuo is doing some of this work, but is not presenting it here.

9 Introducing Census Data In reconstructing the census data it is necessary to: –know some details of the available published data; –consider the different ways of doing it. So I’m going to describe the available census data and then introduce a couple of ways of reconstructing the individual census data for all individuals.

10 2001 UK Human Population Census: Scope and general characteristics Attempt to collect demographic data about all individuals in the UK at a specific time. Data collected via a paper form and digitised. Includes (in the region of) a hundred variables that detail each individual.

11 2001 UK Human Population Census: Key Units and references Data collected for households and communal establishments –For each household there is a household reference person (HRP) and there are some variables that inform of the relationships between each household individual and this HRP –Communal establishments include hospitals, hospices, prisons etc and in Scotland, residential schools. The definition and difference between households and communal establishments is important. Output Areas (OAs) –Smallest regions of aggregated data dissemination –Grouped into MSOA, Wards, Regions –New to 2001 –A typical OA might contain 300 people and about a hundred households and may contain a communal establishment.

12 Households

13 Communal Establishments

14 2001 UK Human Population Census: Anonymisation and the individual data Digitised data was anonymised –A new version was produced that had names and addresses removed. Data with names and addresses is more useful than the anonymised form, but due to various concerns the file that would link individual records with the name and address information is classified. In MoSeS we have not been concerned with trying to assign the correct names to our individual data. –It is the anonymised data that we are trying to reconstruct.

15 2001 UK Human Population Census: The individual data exists! The individual data are not available due to concerns over abuse of the data. –This is a legitimate concern, but it could be harmless to allow some way to link other data on names and addresses with this individual census data. This has been done for some epidemiological work It is not routine to do this even in controlled facilities AFAIK For similar reasons of concern the anonymised data is subjected to further obfuscation by Disclosure Control Measure (DCMs)

16 2001 UK Human Population Census: Variable aggregation For the different data products variables (e.g. age) are aggregated into groups differently. Consequently reconstruction is non-trivial. NB. Although the full address is removed from the data, for some outputs it is necessary to know which Output Area or higher spatial unit an individual is from.

17 2001 UK Human Population Census: Available census outputs Sample of Anonymised Records (SARs) and Small Area Microdata (SAM) Census Aggregate Statistics (CAS) Special Transport Statistics (STS) Special Migration Statistics (SMS) Longitudinal Study (LS) Commissioned Tables

18 HSAR The 2001 Household SAR is available for England and Wales only. 1% stratified sample of households 225436 household records 525715 individual records Individual records are available only for households with 11 or fewer residents There are 60 variables some of which are aggregated. –Age is in 2 year bands

19 ISAR The 2001 Individual SAR is for all of the UK. 3% Sample 1843525 Records Includes people from the Communal Establishment Population (CEP) Very similar variables to HSAR, but some cruicial differences (e.g. Age)

20 CAS Census Aggregate Statistics Available at Output Area Level (and larger aggregate spatial units) for all the UK Various table types –Key Statistics –Univariate –Standard –Multivariate –Themed

21 2001 UK Human Population Census: DCMs again Disclosure control measures (DCM) on CAS add additional and unknown levels of error to the data –The Small Cell Adjustment Measure (SCAM) ensures that no count in any aggregate table that is disseminated is 1 or 2. This DCM is notorious for adding unwanted error (making the census very difficult to use) Among other issues it raises, it has the undesirable effect that counts from different tables that represent the same thing, will not necessarily match.

22 2 ways to reconstruct individual level data 1.Take the CAS and create synthetic individuals that match the aggregate characteristics 2.Select from the Individual and Household SAR populations such that the aggregate characteristics closely match those in the CAS

23 General limitations It is not possible to be sure that the data for individuals assigned to any location exactly matches the characteristics of the individuals that were there at the time of the census. In doing 1 it is possible to make a perfect match for every area, but in doing 2, it might not be possible for any area.

24 Option 1 (Synthetic Individuals) Constraints can be added to try to make the data reasonable –(e.g. someone aged 85 and with limiting long term illness probably does not work). –This is either arbitrary or non-trivial. There is no census data that can be used to inform if there exist individuals with the synthetically assigned characteristics (combination of age group, ethnicity, socio-economic group, educational attainment, health status etc...) except for the SAR, which is Option 2. Scales well in that it is not much more work to produce outputs for regions containing much larger populations.

25 Option 2 Selecting from the SARs It is too much to consider every combination of individuals from the SARs for the average Output Area (and there are 223060 OAs). Indeed, the number of combinations increases for regions with larger populations and greater numbers of households. –NAreas * (NRecords in SAR Population of area ) –Some heuristic or strategy is needed to help select a good solution.

26 Option 2 using a genetic algorithm to guide the search. Various ways to do this. An algorithm 1.Select Household Population (HP) from Household SAR records and Communal Establishment Population (CEP) from the Individual SAR a number of times 2.Measure performance 3.Select a number of the best performing sets 4.Breed these sets by swapping some HP and CEP 5.Repeat Steps 2 to 5 until convergence

27 Enhancements: Constraints 2 types of constraint –Control constraints These things must be met for a solution to be viable –From CAS003 constrain by age of HRP for HP –From CAS001 constrain by age for CEP –Optimisation constraints Can be any number of variables from the 60 or so in the SARs that are also in CAS Done in the performance measure Some are household population based Some total population based

28 Swapping records in breeding This becomes harder the more control constraints are applied –The aggregate constraint characteristics from the set being swapped must match those selected –Being able to swap multiple records is a big advantage More breadth of search Less chance of getting stuck in a local minima

29 HSAR ISAR Aggregate HPControl Characteristics Aggregate CEP Control Characteristics

30 Breeding parameters Need to not swap too much HP or CEP –Else optimisation is slow –Swapping a random amount each time is good, and swapping up to about a third of the HP and CEP seems OK Good to keep a diversity in the breeding population of solutions –Especially in the early iterations

31 Re-constraining There are a limit to the number of control constraints that can be used New optimisation constraints can be added and others removed by modifying the fitness function –e.g. For some applications it might be more important to get household composition right rather than socio-economic group

32 Results Sorry, no results to show here! Results for Leeds produced optimise constraining on household compoition, employment, health, age and gender. –The same type of result for the UK is nearly available A week away… I have produced graphs that indicate how well the results perform Maps of the residuals can also be produced and any spatial patterns may provide clues for improvement

33 Reconstructing the reconstructions Each HSAR record and ISAR record and Output Area have unique IDs and these can be publicly disseminated. Using a simple structure of two lists, one for the HP (either all records or just the HRP), the other for the CEP for each OA it is straightforward to recreate the result.

34 Plans in the near future Archive what we have done (results and code) and run for the UK again with some additional transport variables included in the optimisation. –Can be done by restarting from the previous best optimisation –Do some experiments with modifying the optimisation function during training.

35 MoSeS meets NEC 10 th March 2008 Acknowledgements and Thanks Thanks to MoSeS researchers, collaborators and funders. Thanks to all involved in eResearch for improving our hardware, software and data resources so that we can all do our bit to better understand and plan our future. Thank you for listening!

36 More Background on MoSeS follows in the next 6 slides

37 MoSeS meets NEC 10 th March 2008 Initial Tasks Develop methods to generate individual human population data for the UK from 2001 UK human population census data Develop a Toy Model –Dynamic agent based microsimulation modelling toolkit and apply it to simulate change in the UK Develop applications for –Health –Business –Transport

38 Challenges Grid enabling the data and tools Visualisation –Google Earth –Computer Games Collaboration Retaining a problem focus Design and Development

39 Generic MoSeS Approach MoSeS to date has approached Modelling and Simulation from a specific angle –Geographic –Demographic –Contemporary –About the UK –Targeted towards supporting a developing set of applications It is not a requirement to make it clear what steps can be followed by other Social Scientists wanting to Model and Simulate something different –However, the generic work of MoSeS should be relevant and we are working towards this

40 MoSeS Vision Suppose that computational power and data storage were not an issue what would you build? –SimCity http://en.wikipedia. org/wiki/SimCityhttp://en.wikipedia. org/wiki/SimCity For real on a national scale

41 MoSeS Rationale The idea is to provide planners, policy makers and the public with a tool to help them analyse the potential impacts and the likely effect of planning and policy changes. Example Application: –There may be a housing policy to do with joint ownership, taxation and planning restriction legislation that can be developed to alleviate problems to do with lack of affordable housing and workers without precipitating a crash in the housing market and economy as a whole –A balanced policy may be easier to develop by running a large number of simulations within a system like SimCity for real to understand the sensitivities involved

42 MoSeS First Steps The development of a national demographic model The development of 3 applications –Health care –Transport –Business The development of a portal interface to support the development and resulting applications by providing access to the data, models and simulations and presenting information to users (application developers) in a secure way

43

44


Download ppt "4 th International Conference on e-Social Science: Workshop 5: Agent-Based Modelling for the Spatial-Social Sciences. 2008-06-18 Reconstruction of the."

Similar presentations


Ads by Google