GISCO Working Party Mirosław Migacz Chief GIS Specialist

Slides:



Advertisements
Similar presentations
Status on the Mapping of Metadata Standards
Advertisements

A multi-level metadata approach for a Public Sector Information data infrastructure Nikos Houssos 1,2, Brigitte Jörg 1,3, Brian Matthews 4 1 euroCRIS 2.
OneGeology-Europe - the first step to the European Geological SDI INSPIRE Conference 2010, Session Thematic Communities: Geology Krakow, June 24 th 2010.
Update on INSPIRE: INSPIRE maintenance and implementation and INSPIRE related EEA activities on biodiversity CDDA/European protected areas technical meeting.
Data on the Web Life Cycle Bernadette Farias Lóscio March, 2014.
North American Profile: Partnership across borders. Sharon Shin, Metadata Coordinator, Federal Geographic Data Committee Raphael Sussman; Manager, Lands.
SDMX Standards Relationships to ISO/IEC 11179/CMR Arofan Gregory Chris Nelson Joint UNECE/Eurostat/OECD workshop on statistical metadata (METIS): Geneva.
Interoperable Visualization Framework towards enhancing mapping and integration of official statistics Haitham Zeidan Palestinian Central.
Eurostat SDMX and Global Standardisation Marco Pellegrino Eurostat, Statistical Office of the European Union Bangkok,
Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, October.
Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
Toward a framework for statistical data integration Ba-Lam Do, Peb Ruswono Aryan, Tuan-Dat Trinh, Peter Wetz, Elmar Kiesling, A Min Tjoa Linked Data Lab,
Differences and distinctions: metadata types and their uses Stephen Winch Information Architecture Officer, SLIC.
Linked Open Data for European Earth Observation Products Carlo Matteo Scalzo CTO, Epistematica epistematica.
Bavarian Agency for Surveying and Geoinformation AAA - The contribution of the AdV in an increasing European Spatial Data Infrastructure - the German Way.
Linked Open Data Dataset from Related Documents Petya Osenova and Kiril Simov IICT-BAS LDL-2016, LREC, Portoroz.
Geospatial metadata Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
Implementing ModernStats Standards Linked Open Metadata
Linking Big Data from Space to Apps on Earth
Piloting unique identifiers for geospatial data
Metadata Issues in Long-term Management of Data and Metadata
European Monitoring Platform for Mapping of QoS and QoE
Prepared by: Galya STATEVA, Chief expert
Highlighting the added value of Statistical Linked Open Data
CUAHSI HIS Sharing hydrologic data
Geospatial Knowledge Base (GKB) Training Platform
Integration of INSPIRE & SDMX data infrastructures for the 2021 Census
Progress Update MSIS: Bratislava, April 2005
SDMX Information Model
SISAI STATISTICAL INFORMATION SYSTEMS ARCHITECTURE AND INTEGRATION
The Re3gistry software and the INSPIRE Registry
Linked Data for SDG Reporting
Geographical Information Systems for Statistics Luxembourg March 2008
Eurostat activities update
Generic Statistical Business Process Model (GSBPM)
Application of Dublin Core and XML/RDF standards in the KIKERES
Working Group on regional, urban and rural development statistics
11. The future of SDMX Introducing the SDMX Roadmap 2020
Indicator structure and common elements for information flow
Demography, GISCO and Regional Statistics
Enabling direct data access to social science research data
How can DDI make the most of RDF?
2. An overview of SDMX (What is SDMX? Part I)
The Data Cube Vocabulary: Deploying SDMX as RDF from Existing Systems
Management of NUTS change in Poland
Working Party on Regional Statistics 1-2 October 2012
2. An overview of SDMX (What is SDMX? Part I)
Session 2: Metadata and Catalogues
SDMX Information Model: An Introduction
LOD reference architecture
Developing a Data Model
Rural development statistics
Working Group on Population and Housing Censuses
SDMX Tools Overview and architecture
RAMON Re-engineering An Update
ESS VIP ICT Project Task Force Meeting 5-6 March 2013.
Proposal of a Geographic Metadata Profile for WISE
The Next Generation of the Microdata Information System MISSY: An Integrated Solution for the Documentation of European Microdata European DDI User Conference,
TOOLS & Projects overview
The role of metadata in census data dissemination
NTTS 2019 Conference / Brussels / Belgium
Access to Base Registries ISA2 action
Item 9.6 Mainstreaming geo-statistics
Introduction to reference metadata and quality reporting
The Role of Metadata in Census Data Dissemination
Geo-enabling the SDG indicators – experiences from the UN Global Geospatial Management and the GEOSTAT 3 project Agenda item 12 Ekkehard PETRI – Eurostat,
Classifications and Linked Open Data Formalizing the structure and content of statistical classifications Item 9.1 Standards Working Group Luxembourg,
SDMX Global Conference Francesco Rizzo – ISTAT, Italy
Presentation transcript:

Publishing georeferenced statistical data using linked open data technologies GISCO Working Party Mirosław Migacz Chief GIS Specialist Statistics Poland 17.04.2018 Luxembourg

The project Title: „Development of guidelines for publishing statistical data as linked open data” „Merging statistics and geospatial information” grant series 2016 – 2017 main goal: prepare a background for LOD implementation in official statistics

Before 3218 4.4.32.64.18 powiat łobeski (LAU 1) lobeski 4326418

After powiat łobeski http:// nts.stat.gov.pl/4/4/32/64/18

Specific objectives identify data sources identify statistical units harmonize, generalize and build URIs for statistical units transform statistical data, geospatial data and metadata into RDF (pilot) conclude the pilot transformation and fomulate recommendations for a full-on implementation

Development monitoring system STRATEG Primary data sources Local Data Bank biggest set of statistical information available for a wide range of years updated monthly Demography Database integrated data source for state and structure of population, vital statistics and migrations Development monitoring system STRATEG a system for facilitating and monitoring the development policy key measures to monitor execution of strategies at local, regional, transregional and EU level.

Identification of data sources Other data sources: publications tables communiques announcements articles

Data sources - inventory Metadata: thematic category, format (PDF, DOC, XLS, CSV), spatial reference (country, NUTS, LAU, functional areas, urban areas), temporal reference (years) presence of identifiers (TERYT, NTS, NUTS) update cycle Preliminary analysis of data sources: openness redundance of information popularity (based on view / download stats)

Statistical units inventory administrative boundaries: administrative units NUTS Non-standard statistical units: functional areas / urban areas Groups of administrative / statistical units Derive mostly from strategic documents gmina (LAU 2) powiat (LAU 1) subregion (NUTS 3) region (NUTS 2) voivodship macroregion (NUTS 1) NUTS ADMINISTRATIVE

Statistical units harmonization – KTS KTS – classification combining administrative and statistical units introduced last year to comply with NUTS 2016 14-digit code symbol name 10000000000000 Poland 10020000000000 macroregion 10023200000000 voivodship 10023210000000 region 10023216400000 subregion 10023216418000 powiat 10023216418053 gmina

Geometry harmonization/generalization Input data: administrative boundaries since 2002 for LAU 2 (gmina), excluding 2007 Harmonization process: structure standardization standardization of identifiers (creating KTS identifiers) aggregation to higher level units (LAU 1 -> NUTS 1) Generalization: several generalization scenarios tested for purposes of choosing an optimal one datasets with generalized and non-generalized geometries prepared for 2002-2016

data Linked open data pilot statistical data geospatial data demographic classifications geospatial data statistical unit geometries data sources catalogue metadata

LOD pilot – statistical data demographic data for 2016 from three major databases (Local Data Bank, Demography Database, STRATEG system), ontologies for classifications: age codelist defined using SKOS (skos) & Dublin Core (dct), sex codelist re-used from SDMX, added Polish translation, definining metadata for statistical values (observations): based primarily on SDMX ontologies (attribute, code, measure, dimension), qb:Observation class from Data Cube.

LOD pilot – geospatial data input geometries: voivodship geometries for 2016, ontologies: ontology for the KTS classification defined using RDF Schema (rdfs) & GeoSPARQL (geo) vocabularies, geometry encoding: separate geo:Geometry entities with geometry encoded in WKT (Well Known Text) format (geo:wktLiteral).

LOD pilot – data sources catalogue DCAT-AP (dcat) application profile for data portals in Europe, data sources as dcat:Dataset classes, links to other vocabularies: EuroVoc (for thematic categories), EU Publication Office continent / country codelist (for spatial reference) Internet Media Type (MIME)

spatial domain for datasets LOD pilot – linking dataset catalogue statistical data geospatial data spatial domain for datasets dataset definitions for statistical data geometries for observations

Data transformation into RDF 1. Source files in CSV

Data transformation into RDF 2. Python script using RDFlib module for transformation:

Data transformation into RDF 3a. Results in any desired format (RDF-XML):

Data transformation into RDF 3b. Results in any desired format (Turtle):

LOD pilot – triple store Apache Jena Fuseki used as a SPARQL server, 71717 triples loaded, single Fuseki dataset (STAT_LOD) to allow cross- querying and cross-browsing data created initially in separate files SPARQL endpoint for querying

LOD pilot – SPARQL endpoint

LOD pilot – Pubby frontend (catalogue)

LOD pilot – Pubby frontend (dataset)

LOD pilot – Pubby frontend (value)

LOD pilot – Pubby frontend (geometry)

LOD pilot – conclusions No reference implementation for statistical linked open data: lack of integrity between RDF metadata sets published by one authority, links to non-existing entities, lack of maintenance, Lack of pan-European guidelines for statistical linked open data: common vocabularies, recommended or dedicated software components, DIGICOM ESSNet LOD project.

LOD pilot – conclusions Some software / programming components not being developed anymore, implementations might become unstable, Python-based implementation seem sustainable at this point, Semantic harmonization of statistical classifications: different meanings for supposedly the same classification elements, e.g. 0-5 can be “0 to 5” or “0 to less than five”, not only a pan-European issue, may exist at country level,

LOD pilot – conclusions Methodology for publishing spatial data as linked open data: single entity per single geometry: inventory of boundary changes, geometry instances with non-meaningful identifiers (UUIDs), separate geometries for respective years: a complete set of geometries each year, regardless of changes, geometry instances with meaningful identifiers (KTS + year).

LOD pilot – conclusions Most linked open data implementations are technically correct: it is nearly impossible to produce incorrect RDF metadata files, you can put anything in the RDF graph, but does it make sense semantically? Linked open data implementations based on Python scripts are easy to amend in the future, RDF vocabulary specifications are easier to interpret with a UML model provided (Thank you, Captain Obvious )

Publishing georeferenced statistical data using linked open data technologies GISCO Working Party Mirosław Migacz www.linkedin.com/in/migacz m.migacz@stat.gov.pl