Linked Data for SDG Reporting Bill Roberts Swirrl 23 January 2018
What is Linked Data?
“Data you can link to” Use the mechanisms of the web to give fine grained access to data You can link to a file on the web, but only at the level of the whole file If you can link to individual things within the data – specific countries or regions, specific indicators, specific data points then can be more precise and more selective and it gives a mechanism for attaching all kind of metadata, and making relationships between the topics of interest, for example combining complex geospatial data with statistical data.
Needs for SDG National Reporting Platforms Different presentation for different groups of users (analysts, ministers/managers, public) Easy to find – available on the web API access to data Need to automate data preparation and publishing processes Interoperable data One size does not fit all – individual country needs and constraints
What does ‘interoperable’ mean? We need to agree on what data means, and how to get it: Shared identifiers Data transfer protocol Data format Data models Data models and systems of identifiers still need to be agreed and that’s a hard organisational problem – but once you’ve done that, Linked Data gives you a mechanism for systematically encoding that
Use URIs as names for things Use HTTP URIs so that people can look up those names When someone looks up a URI, provide useful information, using the standards (RDF and SPARQL) Include links to other URIs, so that they can discover more things Directly exploit the plumbing of the web as a way of making data available Globally unique identifiers for things of interest A mechanism for looking up information about those things Standards-based machine-readable way of representing that information With a way of describing relationships between things Identifiers Protocol Format and data models Connections Berners-Lee, 2006
Resource Description Framework
What is RDF? Property Subject Object “graph” representation of data – social graph, eg Facebook, LinkedIn All kinds of enterprise databases Strength is its flexibility in dealing with very diverse data, and to highlight the connections between things
What is RDF? Is a United Kingdom Country “graph” representation of data – social graph, eg Facebook, LinkedIn
What is RDF? 4.29 Death rate United Kingdom refArea refPeriod 2015 Observation123 unit Number per 1000 Can represent statistical data in RDF Age range 0-5 years indicator 3.2.1 Under-5 mortality
What is RDF? 4.29 Death rate United Kingdom refArea refPeriod 2015 Observation123 unit Number per 1000 Can represent any data in RDF – as the ’schema’ is part of the data Which means it’s a flexible system for combining statistical data with other contextual data Age range 0-5 years indicator 3.2.1 Under-5 mortality
<http://statistics. gov <http://statistics.gov.scot/data/population-estimates-current-geographic-boundaries/year/2016/S92000003/age/all/sex/all/people/count> a <http://purl.org/linked-data/cube#Observation> ; sg-measure:count 5404700 ; sdmx-dimension:refArea <http://statistics.gov.scot/id/statistical-geography/S92000003> ; sdmx-attribute:unitMeasure <http://statistics.gov.scot/def/concept/measure-units/people> ; sdmx-dimension:refPeriod <http://reference.data.gov.uk/id/year/2016> ; qb:measureType sg-measure:count ; qb:dataSet <http://statistics.gov.scot/data/population-estimates-current-geographic-boundaries> ; sdmx-dimension:age <http://statistics.gov.scot/def/concept/age/all> ; sdmx-dimension:sex <http://statistics.gov.scot/def/concept/sex/all> . You can link directly to this observation – and get it in various machine readable ways You can add data markers Or annotations Or say this observation has been revised and replaced by some other observation
What does linking enable? Connect datasets, indicators, features of interest, data points to: Other data, other features Definitions Context Provenance Annotations/feedback
SPARQL Standardised query language for RDF Flexible, powerful – can be complex Use it directly, or build simpler APIs or user interfaces on top of it
Sustainable Development Goals Where are we now? What is most urgent? What should we do about it? Is it working? Answering these questions needs a lot of information about context Where are we now? For one indicator for one country, how does it look: Compared to the target Compared to other countries Compared to other ‘similar’ countries – similar perhaps in terms of size, income, demographic profile, economic activities, climate… Compared to other indicators Compared to previous years
Challenges Data sources Dealing with the big challenges of today Changing demographics – people living longer, but also more years of unhealthy life Changing world of work – different kinds of jobs Climate change (the real challenge for a smart city) Limited environmental resources – balancing the economy, health and the natural environment That feeds into crucial business as usual for government: where to allocate limited resources for the biggest societal benefit Local government strategies – what are the special constraints and opportunities where you live and work – how to coordinate and balance all the aspects of the community. All of these problems are complex; all depend on the interaction of many different aspects of society, and many different strands of government policy. All of them need diverse sources of data. Importance of understanding context to be able to decide how to act on a particular piece of information
Linked Data works behind the scenes Strength is for underlying data representation and integration Automatic import of data from CSV, XML, JSON, Shapefile… Select, filter, export data as CSV, XML, JSON, Shapefile... W3C ‘Tabular Data on the Web’ standards: CSV plus JSON metadata
Building on existing standard data models for multidimensional statistical data: RDF Data Cube Vocabulary – Linked data version of SDMX for metadata: Dublin Core, DCAT for provenance: PROV for annotation: Web annotation ontology for data quality: Data Quality Vocabulary
Making it work for SDGs Choose or create agreed identifiers and definitions for: the SDG indicators dimensions, measures and units concept schemes for dimension values
Gartner hype cycle Swirrl operates production systems for Scottish Govt, MHCLG, NHS Working with ONS on applying this technique for sharing of data across UK official stats publishers Eurostat is quite active
@billroberts http://www.swirrl.com