Presentation is loading. Please wait.

Presentation is loading. Please wait.

InFuse: Data Feeds for the UK 2001 and 2011 Censuses and Beyond Justin Hayes Census Dissemination Unit (CDU) Mimas The University of Manchester.

Similar presentations


Presentation on theme: "InFuse: Data Feeds for the UK 2001 and 2011 Censuses and Beyond Justin Hayes Census Dissemination Unit (CDU) Mimas The University of Manchester."— Presentation transcript:

1 InFuse: Data Feeds for the UK 2001 and 2011 Censuses and Beyond Justin Hayes Census Dissemination Unit (CDU) Mimas The University of Manchester

2 »CDU background »Recent work on CAIRD Project »Current work on InFuse Project »Forthcoming work in collaboration with ONS »Future ideas Where are we going?

3 Data Feed? Structure Describe Interoperable Open Standards Expose Consolidate Understandable Usable Transferable Comprehensive Comparable Online Integrate Flexible Consolidates information relating to a dataset and integrates it by enforcing a structure which it describes using open standards to allow comprehensive and comparable information to be exposed and transferred online in ways that make it understandable, interoperable, flexible, and, most importantly, usable.

4 Dimensions, Codelists and Codes General Health

5 »Dissemination of aggregate outputs from recent UK censuses to UK academics »Small team funded by ESRC »Service, research and engagement roles »Two decades of pioneering work ›Casweb ›Retrieval and reprocessing of UK 1971 Census ›GeoConvert CDU Background

6 »Large and complex dataset »Lack of global structures ›‘Hand crafted’ tables as primary instrument ›Inconsistent structures ›‘Age’ particularly problematic example »No comprehensive description »Scattered information ›Poor connection of data and metadata ›Approximately 300 tables with many inconsistencies ›Metadata in multiple locations with varying access Barriers to Effective Dissemination

7 Age Bands 99 age bandings 76 unique to a single table

8 223 Age Codes

9 Standard Table 13 Framework

10 Standard Table 13 Data

11 Text String Cell Descriptions S013:37 (AGE OF HRP 24 OR UNDER - Rented from council : ALL HOUSEHOLDS ) S013:38 (AGE OF HRP 24 OR UNDER - Rented from council : Economically Active - Total ) S013:39 (AGE OF HRP 24 OR UNDER - Rented from council : Economically Active - Employee ) S013:40 (AGE OF HRP 24 OR UNDER - Rented from council : Economically Active - Self-employed ) S013:41 (AGE OF HRP 24 OR UNDER - Rented from council : Economically Active - Unemployed ) S013:42 (AGE OF HRP 24 OR UNDER - Rented from council : Economically Active - Full-time students ) S013:43 (AGE OF HRP 24 OR UNDER - Rented from council : Economically Inactive - Total ) S013:44 (AGE OF HRP 24 OR UNDER - Rented from council : Economically Inactive - Retired ) S013:45 (AGE OF HRP 24 OR UNDER - Rented from council : Economically Inactive - Student ) S013:46 (AGE OF HRP 24 OR UNDER - Rented from council : Economically Inactive - Looking after home/family ) S013:47 (AGE OF HRP 24 OR UNDER - Rented from council : Economically Inactive - Permanently sick or disabled ) S013:48 (AGE OF HRP 24 OR UNDER - Rented from council : Economically Inactive - Other )

12 »Incomplete and unconnected information »Poor exploration »Potential for misinterpretation and misuse »Not interoperable »Applications must provide specific metadata »Frustrating for users and service providers Effects of Barriers

13 »Consolidate all related information »Extract and apply consistent structures »Describe to make understandable and transferable »Publish data via web service and API »Build our own user applications »Use open standards wherever possible »Take advantage of external development »Encourage ONS to do the same for 2011 »Find money to do all this! Challenges to Improve Services

14 »Additional funding from ESRC »One researcher for one year from June 2008 »Feasibility project ›Dimensionalised sample of 40 tables ›Conceptual structure based on SDMX ›SOAP-based web service and API »CAIRD application ›Codelist-based data selector ›CSV and SDMX outputs CAIRD Project

15 CAIRD Geography Selector

16 CAIRD Data Selector

17 CAIRD SDMX output

18 »Mimas strategic funding to take results of CAIRD Project into service »One researcher from August 2009 to present »Initial application launch September 2010 »2001 Census for England and Wales »Tangible outputs just commencing InFuse

19 »Initial phase of work ›Workshop for expert academic census users ›Questionnaire ›Functional and requirements specifications ›IASSIST 2010 InFuse User Requirements

20 »Restructuring and parsing of output tables »Information from Census Definitions Volume »Development of master set of codelists »Creation of geography codelists »De-universification »Encoding of hierarchies »Incorporation of core set of metadata »Multiple value counts problem Structuring the 2001 Census

21 »Theme based exploration »Handling sparsity through guided exploration »Text search ›Thesaurus and gazetteer »Move to RESTful web service with private API »URI schema for RDF development »Encoding of, and operation on hierarchies »Modular, open source design for re-use »Integration of digital boundary data »Initial text output InFuse Features

22 »InFuse URI schema ›http://130.88.120.139/InFuseWS/InFuseWS.svc/ data/contenttype/datasets?format=htmlhttp://130.88.120.139/InFuseWS/InFuseWS.svc/ data/contenttype/datasets?format=html »InFuse text search with thesaurus ›Search targets: codelists, codes, glossary, areas, areatypes ›http://130.88.120.139/InFuseWS/InFuseWS.svc/ data/contenttype/datasets/dsid/1/glossary/searc h?keywords=racehttp://130.88.120.139/InFuseWS/InFuseWS.svc/ data/contenttype/datasets/dsid/1/glossary/searc h?keywords=race Initial InFuse Outputs

23 InFuse URI Schema

24 InFuse URI Schema: Codelists

25 InFuse Thesaurus Text Search

26 »Data feed influence on ONS 2001 plans ›Data Feed Network ›Census Web Services Working Group (CWSWG) ›ONS commitment to disseminate via API »Collaborative funding ›Two researchers for one year! ›Test datasets for ONS API ›Work on 2001 to 2011 comparability ›Application development for testing of ONS API CDU/ESRC/ONS Collaboration for 2011

27 »More datasets »More metadata »Work on definitional and geographical comparability »Further application development »SDMX and RDF interaction »Release of a public API »GeoConvert module »Linkage of unit and aggregate data In the InFuse Pipeline

28 »It’s possible to retrospectively structure and disseminate complex datasets via data feeds, but much easier to do at source. »Potential for improved and expanded secondary usability of datasets will act as a stimulus for the development and use of open standards methods and structures in dataset creation. Summary

29 »justin.hayes@manchester.ac.ukjustin.hayes@manchester.ac.uk »census@mimas.ac.ukcensus@mimas.ac.uk »0044 161 275 6109 Contact


Download ppt "InFuse: Data Feeds for the UK 2001 and 2011 Censuses and Beyond Justin Hayes Census Dissemination Unit (CDU) Mimas The University of Manchester."

Similar presentations


Ads by Google