InFuse: Data Feeds for the UK 2001 and 2011 Censuses and Beyond Justin Hayes Census Dissemination Unit (CDU) Mimas The University of Manchester
»CDU background »Recent work on CAIRD Project »Current work on InFuse Project »Forthcoming work in collaboration with ONS »Future ideas Where are we going?
Data Feed? Structure Describe Interoperable Open Standards Expose Consolidate Understandable Usable Transferable Comprehensive Comparable Online Integrate Flexible Consolidates information relating to a dataset and integrates it by enforcing a structure which it describes using open standards to allow comprehensive and comparable information to be exposed and transferred online in ways that make it understandable, interoperable, flexible, and, most importantly, usable.
Dimensions, Codelists and Codes General Health
»Dissemination of aggregate outputs from recent UK censuses to UK academics »Small team funded by ESRC »Service, research and engagement roles »Two decades of pioneering work ›Casweb ›Retrieval and reprocessing of UK 1971 Census ›GeoConvert CDU Background
»Large and complex dataset »Lack of global structures ›‘Hand crafted’ tables as primary instrument ›Inconsistent structures ›‘Age’ particularly problematic example »No comprehensive description »Scattered information ›Poor connection of data and metadata ›Approximately 300 tables with many inconsistencies ›Metadata in multiple locations with varying access Barriers to Effective Dissemination
Age Bands 99 age bandings 76 unique to a single table
223 Age Codes
Standard Table 13 Framework
Standard Table 13 Data
Text String Cell Descriptions S013:37 (AGE OF HRP 24 OR UNDER - Rented from council : ALL HOUSEHOLDS ) S013:38 (AGE OF HRP 24 OR UNDER - Rented from council : Economically Active - Total ) S013:39 (AGE OF HRP 24 OR UNDER - Rented from council : Economically Active - Employee ) S013:40 (AGE OF HRP 24 OR UNDER - Rented from council : Economically Active - Self-employed ) S013:41 (AGE OF HRP 24 OR UNDER - Rented from council : Economically Active - Unemployed ) S013:42 (AGE OF HRP 24 OR UNDER - Rented from council : Economically Active - Full-time students ) S013:43 (AGE OF HRP 24 OR UNDER - Rented from council : Economically Inactive - Total ) S013:44 (AGE OF HRP 24 OR UNDER - Rented from council : Economically Inactive - Retired ) S013:45 (AGE OF HRP 24 OR UNDER - Rented from council : Economically Inactive - Student ) S013:46 (AGE OF HRP 24 OR UNDER - Rented from council : Economically Inactive - Looking after home/family ) S013:47 (AGE OF HRP 24 OR UNDER - Rented from council : Economically Inactive - Permanently sick or disabled ) S013:48 (AGE OF HRP 24 OR UNDER - Rented from council : Economically Inactive - Other )
»Incomplete and unconnected information »Poor exploration »Potential for misinterpretation and misuse »Not interoperable »Applications must provide specific metadata »Frustrating for users and service providers Effects of Barriers
»Consolidate all related information »Extract and apply consistent structures »Describe to make understandable and transferable »Publish data via web service and API »Build our own user applications »Use open standards wherever possible »Take advantage of external development »Encourage ONS to do the same for 2011 »Find money to do all this! Challenges to Improve Services
»Additional funding from ESRC »One researcher for one year from June 2008 »Feasibility project ›Dimensionalised sample of 40 tables ›Conceptual structure based on SDMX ›SOAP-based web service and API »CAIRD application ›Codelist-based data selector ›CSV and SDMX outputs CAIRD Project
CAIRD Geography Selector
CAIRD Data Selector
CAIRD SDMX output
»Mimas strategic funding to take results of CAIRD Project into service »One researcher from August 2009 to present »Initial application launch September 2010 »2001 Census for England and Wales »Tangible outputs just commencing InFuse
»Initial phase of work ›Workshop for expert academic census users ›Questionnaire ›Functional and requirements specifications ›IASSIST 2010 InFuse User Requirements
»Restructuring and parsing of output tables »Information from Census Definitions Volume »Development of master set of codelists »Creation of geography codelists »De-universification »Encoding of hierarchies »Incorporation of core set of metadata »Multiple value counts problem Structuring the 2001 Census
»Theme based exploration »Handling sparsity through guided exploration »Text search ›Thesaurus and gazetteer »Move to RESTful web service with private API »URI schema for RDF development »Encoding of, and operation on hierarchies »Modular, open source design for re-use »Integration of digital boundary data »Initial text output InFuse Features
»InFuse URI schema › data/contenttype/datasets?format=htmlhttp:// /InFuseWS/InFuseWS.svc/ data/contenttype/datasets?format=html »InFuse text search with thesaurus ›Search targets: codelists, codes, glossary, areas, areatypes › data/contenttype/datasets/dsid/1/glossary/searc h?keywords=racehttp:// /InFuseWS/InFuseWS.svc/ data/contenttype/datasets/dsid/1/glossary/searc h?keywords=race Initial InFuse Outputs
InFuse URI Schema
InFuse URI Schema: Codelists
InFuse Thesaurus Text Search
»Data feed influence on ONS 2001 plans ›Data Feed Network ›Census Web Services Working Group (CWSWG) ›ONS commitment to disseminate via API »Collaborative funding ›Two researchers for one year! ›Test datasets for ONS API ›Work on 2001 to 2011 comparability ›Application development for testing of ONS API CDU/ESRC/ONS Collaboration for 2011
»More datasets »More metadata »Work on definitional and geographical comparability »Further application development »SDMX and RDF interaction »Release of a public API »GeoConvert module »Linkage of unit and aggregate data In the InFuse Pipeline
»It’s possible to retrospectively structure and disseminate complex datasets via data feeds, but much easier to do at source. »Potential for improved and expanded secondary usability of datasets will act as a stimulus for the development and use of open standards methods and structures in dataset creation. Summary
» Contact