Linked Environment Data and how we are implementing SEIS Søren Roug
The current situation Find dataset
The current situation Find dataset Download it
The current situation Find dataset Download it Import it
The current situation Find dataset Download it Import it Clean it
The current situation Find dataset Download it Import it Clean it Create chart
Vision statement Too much manual work We want to eliminate all steps but the last!...And we’re going to use Linked Data technology to do it
Solution to the data format problem In addition to the HTML for human eyes we’re asking for a new format called RDF that machines can understand It is a modernisation of CSV, Excel and all the other data dump formats This is all we ask a producer to provide... and some metadata No Web Services – just files
No more searching on foreign sites The remote nodes provide lists of their datasets Called manifests or semantic sitemaps Also in RDF format Controlled vocabulary URLs in metadata Use any identifier, we create equivalence links between them
How to create equivalence links We set up correspondance tables between the URLs. This is called an ontology = Some RDF databases handle ontologies transparently. When you use one, you get the data for the other too
Remember this?
Now we can make the join
Downloading made easy! Click on the title to see if it is in the database
Downloading made easy Seconds later...
Status EEA has deployed two triple stores called Content Registry and Semantic Data Service that import all lists and all data Content Registry is for Reportnet deliveries Semantic Data Service is for published datasets We have created RDF of several data sets: Reportnet, GEMET, EUNIS, ITIS, NUTS, NACE etc. We can also load Eurostat SDMX data via the LATC project
SDS and CR’s Role ITIS Reportnet PRTR Harvesting Content Registry EUNISOther... SPARQL JSON RDF Querying RDF XML OtherVisualisationEUNISReportnet QA system
Queries
Comparing data: Where do EUNIS and ITIS not agree on naming? PREFIX e: PREFIX itis: PREFIX dwc: SELECT ?eunisname ?eunisauthor ?itisname ?itisauthor ?usage WHERE { ?eunisurl e:validName 1; e:sameSynonym ?itisurl; e:binomialName ?eunisname; dwc:scientificNameAuthorship ?eunisauthor. ?itisurl itis:nameUsage "invalid",?usage; itis:completename ?itisname; itis:hasAuthor ?auurl. ?auurl itis:shortAuthor ?itisauthor }
Results eunisnameeunisauthoritisnameitisauthorusage Chondrocladia alaskensis Lambe,1900Chondrocladia alaskensis Lambe 1895invalid Myxilla parasitica(Lambe,1900)Myxilla parasiticaLambe 1893invalid Hymedesmia primitiva Lundbeck,1910Hymedesmia primitiva Lundbeck 1910invalid Asbestopluma lycopodium (Levinsen,1886)Asbestopluma lycopodium Levinsen 1886invalid Esperiopsis rigidaLambe,1900Esperiopsis rigidaLambe 1893invalid Cordylophora lacustris Allman, 1844Cordylophora lacustris Allman 1844invalid
Example of SPARQL query Future prospects for the European otter (From Reportnet) PREFIX art17: PREFIX eea: SELECT ?country ?region ?future WHERE { [] art17:forSpecies ; art17:hasRegionalReport ?report. ?report art17:conclusion_future ?future; art17:forCountry ?curl; art17:region ?bgregion. ?bgregion eea:name ?region. ?curl eea:name ?country } ORDER BY ?country ?region
Result: Future of the European otter countryregionfuture AustriaAlpineInadequate (U1) AustriaContinentalInadequate (U1) BelgiumAtlanticBad (U2) BelgiumContinentalBad but improving (U2+) Czech RepublicContinentalFavourable (FV) Czech RepublicPannonianFavourable (FV) EstoniaBorealFavourable (FV)
Queries on EUNIS
Visualisations
Water use per NUTS level 2 in 2007 Top 20 Combination of two Eurostat SDMX datasets Combination of two Eurostat SDMX datasets
Linked Data in map views
GHG per capita
Søren Roug European Environment Agency