Web Standards and Technical Challenges for Publishing and Processing Open Data Axel Polleres web:

Slides:



Advertisements
Similar presentations
OMV Ontology Metadata Vocabulary April 10, 2008 Peter Haase.
Advertisements

IATI Technical Advisory Group Technical Proposals Simon Parrish IATI Technical Advisory Group, DIPR March 2010.
EU Open Data Portals and Infrastructures Open EU Publications Office Luca Martinelli Publications Office of the European Union Seminar Open Government.
Interoperability Aspects in Europeana Antoine Isaac Workshop on Research Metadata in Context 7./8. September 2010, Nijmegen.
Spatial Data Infrastructure: Concepts and Components Geog 458: Map Sources and Errors March 6, 2006.
(1) Standardizing for Open Data Ivan Herman, W3C Open Data Week Marseille, France, June Slides at:
Open data and data curation
Web Standards and Technical Challenges for Publishing and Processing Data on the Web Axel Polleres web:
Open Data at the World Bank. Open Data at the World Bank Open about what we do Open about what we.
The KB on its way to Web 2.0 Lower the barrier for users to remix the output of services. Theo van Veen, ELAG 2006, April 26.
Xyleme A Dynamic Warehouse for XML Data of the Web.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Cloud based linked data platform for Structural Engineering Experiment Xiaohui Zhang
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Linked Data Visualizations for Eurostat Linked Data Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
How can you use Open Data? ... And why you should!
What are research data? July 2015 This work is licensed under a Creative Commons Attribution 4.0 International LicenseCreative Commons Attribution 4.0.
Semantic Web outlook and trends May The Past 24 Odd Years 1984 Lenat’s Cyc vision 1989 TBL’s Web vision 1991 DARPA Knowledge Sharing Effort 1996.
WP.5 - DDI-SDMX Integration E.S.S. cross-cutting project on Information Models and Standards Marco Pellegrino, Denis Grofils Eurostat METIS Work Session6-8.
Michalis Vafopoulos NTUA, GFOSS & The transformers GREEN CITY HACKATHON.
Scotland's Environment Web Data Journey Dave Watson, Duncan Taylor.
Spatially enabling Northern Ireland Dr Suzanne McLaughlin DFP Land & Property Services GIS Ireland Conference 11 th October 2012.
Data on the Web Life Cycle Bernadette Farias Lóscio March, 2014.
Keeping an Open Mind OPEN DATA SUZANNE VAN DEN HOOGEN, MLIS DLI WORKSHOP FREDERICTON, NB APRIL 28, 2015.
DDI-RDF Discovery Vocabulary A Metadata Vocabulary for Documenting Research and Survey Data Linked Data on the Web (LDOW 2013) Thomas Bosch.
Shared innovation Linking Distributed Data across the Web Dr Tom Heath Researcher, Platform Division Talis Information Ltd t
Save time. Reduce costs. Find and reuse interoperability solutions on Joinup for developing European public services Nikolaos Loutas
Recent Developments of the OECD Business Tendency and Consumer Opinion Surveys Portal coi/coordination
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
IAEA International Atomic Energy Agency Open Data at NIS United Nations Library and Information Network for Knowledge Sharing (UN-LINKS) October.
Linking Tasks, Data, and Architecture Doug Nebert AR-09-01A May 2010.
Adoption of RDA-DFT Terminology and Data Model to the Description and Structuring of Atmospheric Data Aaron Addison, Rudolf Husar, Cynthia Hudson-Vitale.
Online Library of Knowledge Juro4C – Introduction.
Semantic Web: The Future Starts Today “Industrial Ontologies” Group InBCT Project, Agora Center, University of Jyväskylä, 29 April 2003.
The future of the Web: Semantic Web 9/30/2004 Xiangming Mu.
Introduction to the Semantic Web and Linked Data
Serving society Stimulating innovation Supporting legislation Benefits and challenges of INSPIRE implementation in the field of statistics.
ISO/IEC JTC 1/SC 32 Plenary and WGs Meetings Jeju, Korea, June 25, 2009 Jeong-Dong Kim, Doo-Kwon Baik, Dongwon Jeong {kjd4u,
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
HORIZON 2020 W ORK PROGRAMME DG Research and Innovation.
Toward a framework for statistical data integration Ba-Lam Do, Peb Ruswono Aryan, Tuan-Dat Trinh, Peter Wetz, Elmar Kiesling, A Min Tjoa Linked Data Lab,
The Research Data Archive at NCAR: A System Designed to Handle Diverse Datasets Bob Dattore and Steven Worley National Center for Atmospheric Research.
Open Government From Data to Information Presentation by Tariq Khokhar March 1 st 2012.
The Semantic Web. What is the Semantic Web? The Semantic Web is an extension of the current Web in which information is given well-defined meaning, enabling.
Characterizing Knowledge on the Semantic Web with Watson Mathieu d’Aquin, Claudio Baldassarre, Laurian Gridinoc, Sofia Angeletou, Marta Sabou, Enrico Motta.
Linked Open Data for European Earth Observation Products Carlo Matteo Scalzo CTO, Epistematica epistematica.
SysML v2 Model Interoperability & Standard API Requirements Axel Reichwein Consultant, Koneksys December 10, 2015.
© CGI Group Inc. EGI-InSPIRE Open Data and Business Modelling for Open Science John van Echtelt Business Model Innovator Madrid, 18 September 2013.
Overview of the Semantic Web Ralph R. Swick World Wide Web Consortium (W3C) 17 October 2009.
Components People Technology Policies Standards Spatial Data.
Food and Agriculture Organization of the United Nations
The Semantic Web By: Maulik Parikh.
Linked Data Web that can be processed by machines
Cloud based linked data platform for Structural Engineering Experiment
Ways to upgrade the FAIRness of your data repository.
Trevor Taylor, Director, Member Services, Asia and the Americas,
Big Data Quality the next semantic challenge
Lifting Data Portals to the Web of Data
Metadata Quality: Learning from Open Data Portalwatch
Marek Šturc European Commission - Eurostat
Linked Data for SDG Reporting
PREMIS Tools and Services
LOD reference architecture
TOOLS & Projects overview
Conference on New Technologies for official Statistics
Is Copernicus benefitting from INSPIRE?
SDI from a technological perspective: Standards
Presentation transcript:

Web Standards and Technical Challenges for Publishing and Processing Open Data Axel Polleres web:

Outline 1.Open Data != Big Data... What is Open Data? 2.What is Linked (Open) Data? 3.Why do standards matter? 4.Challenges in Consuming Open Data

What is Open Data? Availability and Access: the data must be available as a whole and at no more than a reasonable reproduction cost, preferably by downloading over the internet. The data must also be available in a convenient and modifiable form. Reuse and Redistribution: the data must be provided under terms that permit reuse and redistribution including the intermixing with other datasets. The data must be machine- readable.machine- readable Universal Participation: everyone must be able to use, reuse and redistribute – there should be no discrimination against fields of endeavour or against persons or groups. For example, ‘non-commercial’ restrictions that would prevent ‘commercial’ use, or restrictions of use for certain purposes (e.g. only in education), are not allowed. See more at: Open Knowledge Foundation

Open Data vs. Big Data

Open Data Providers & Motivations, examples:  “Bottom-up”: UN, Worldbank, Wikipedia, Cities, Governments:  “Top-down” e.g. EU INSPIRE directive, PSI directive, E urostat, EEA,… 5 DIRECTIVE 2003/4/EC Public Access to Environmental Information DIRECTIVE 2007/2/EC INSPIRE Directive 2003/98/EC PSI Directive

Example Open Data Sources: it’s not only governmental data… but also user-generated content! e.g. Structured information on most cities and points of interest in the world (location, population, economy, weather, climate,...) Free GIS data for most countries & cities in the world (base information: area, land-use, administrative districts, …) Open Government Data 6

Domains and Types of Data:

Open Data Portals CKAN... almost „de facto“ standard for Open Data Portals facilitates search, metadata (publisher, format, publication date, license, etc.) for datasets machine-processable? partially

Still... Challenges regarding machine-readability:... Missing/wrong meta-data related datasets are not linked searching for the right dataset is difficult

Standards to the rescue: Towards more machine-processable Data publishing: Linked Data!

Data on the Web: the Web is not only a place for documents!  Most Web pages are created dynamically... from Data  Data from user-generated content...  Data from public administration...  Data from companies...  In the course of the trend for „Open Data“ a lot of this Data is being published directly on the Web, but rarely interlinked

The Web 1989… “This proposal concerns the management of general information about accelerators and experiments at CERN […] based on a distributed hypertext system. “  Globally Unique identifiers  Links between Documents (href)  A common protocol URIs HTTP I work here

 Globally Unique identifiers  Links between Documents (href)  A common protocol  Globally Unique identifiers  Typed Links between Entities  A common protocol RDF URIs HTTP I work here polleres.net#me xmlns.com/foaf/0.1/wokplaceHomepage wu.ac.at Person University The Web of Data… RDF

What is the idea of Linked Data?  Standards to publish data on the Web  machine readable  machine processable  Make data interlinked just as Web-pages!

15 Linked Data on the Web: Adoption March 2008 March 2009 July 2009 Sep Sep Image from: 15

Linked Data is moving from academia to industry

In the last few years, we have seen many successes, e.g. … Knowledge Graph Watson

Google Knowledge Graph

5-Star Schema for Open Data:  Still, full Linked Data might be asked „too much“ by Open data providers... ★ Make data/documents available on the Web ★★ Make it available as structured data (e.g., an Excel sheet instead of image scan of a table) ★★★ Use a non-proprietary format (e.g., a CSV file instead of an Excel sheet) ★★★★ Use linked data format (i.e., URIs to identify things, and RDF to represent data) ★★★★★ Link your data to other people’s data to provide context Source:

Open Data Trends, Future & Challenges  Open Data: Typically very liberal licenses (variants of CC), but still mixed  Many formats, varying quality, harmonization starting  Mostly by online communities or public bodies (cities, communities, governments, UN,…)  Currently focused mostly in SMEs to take advantage of that data  vs. Publicly available data: e.g. NYT is public but not free/not license free  vs. Enterprise (Linked) Data DIRECTIVE 2007/2/EC INSPIRE

Open Data – Status:  Mostly 3-star Open Data... ... RDF and Linked Data are starting to be adopted by Open Government Data.  Some exceptions: US, UK, EU

Open Government Data Austria:  Mostly 3-star  Various interesting aspects  Standard meta-data catalog  „grass-roots effort by various public bodies (as opposed to e.g. UK)  Parallel (non-government) Open data Platform underway  Unique license  Community meetings („BarCamps“)  E.g. transformation to 4/5-star discussed The portal just won the UN Public Service Award 2014!

Can Open Data be used by industry?  Use Case: Building an Open City Data Pipeline...

Dynamic Calculation of KPIs at variable Granularity (City, District, Neighbourhood, Building) 1. Periodic Data Gathering of registered sources (“Focused Crawler”): Various Formats (CSV, HTML, XML … ) & Granularity (monthly, annual, daily) 2. Semantic Integration: Unified Data Model, Data Consolidation 3. Analysis/Statistical Correlation/Aggregation: Statistical Methods, Semantic Technologies, Constraints Extensible CityData Model Cities: + Open Data: Berlin, Vienna, London, … Donaustadt Aspern City Data Pipeline: Overview 24

Collected Data vs. Green City Index Data: Overlaps  We identified 20 quantitative raw data indicators that are overlapping between the Siemens’ “Green City Index” and our current Data sources. The picture below visualizes the availability of data for these indicators for the cities of the European GCI: >65% of raw date could be covered by publically available data that we have collected automatically Data quality?  Not all indicators are 100% comparable (different scales, units, etc., sources of different quality)  for some indicators (e.g. Population) already less than 2% median error.  The more data we collect, the better the quality! 25

26 7 SEPTEMBER 2012 Our Web interface allows to browse data and download complex composed KPIs as Excel sheets (e.g. “Transport related CO2 emissions for Berlin”): 2 Browse available Open Data sources that contain the requested indicators City Data Pipeline: Web Interface

Base assumption (for our use case): Added value comes from comparable Open datasets being combined Challenges & Lessons Learnt – Is Open Data fit for industry?

Incomplete Data: can be partially overcome By ontological reasoning (RDF & OWL) = formalizing "background knowledge" By statistical methods and data mining, e.g.Multi-dimensional Matrix Decomposition: Incomparable Data: dbpedia:populationTotal dbpedia:populationCensus  Heterogeneity across Open Government Data efforts:  Different Indicators, Different Temporal and Spatial Granularity  Different Licenses of Open Data: e.g. CC-BY, country specific licences, etc.  Heterogeneous Formats (CSV != CSV)... Maybe the W3C CSV on the Web WG will solve this issue)  Open Data needs strong standards to be useful  Gaining Knowledge from Open Data has high potential, but still needs research! 28

Open Data vs. Big Data 1)Aggregated Open Data from various, heterogeneous sources and different portals will potentially become "Big Data" over time 2)Serving Open Data "at scale" might become a challenge the more Open Data is being used! 1)Aggregated Open Data from various, heterogeneous sources and different portals will potentially become "Big Data" over time 2)Serving Open Data "at scale" might become a challenge the more Open Data is being used! We need big data technologies to avoid creating yet another data graveyard

EU is pushing Linked Data Standards

Recent Activities in Standardisation: W3C  W3C Data Activity launched (December 2013!!!)  Data on the Web Best Practices Group  CSV on the Web Group  Provenance WG (PROV)  Government Linked Data Group  etc.... Also just founded a data quality working group!

Open your data!  A "sister" portal for for non- governmental open data launching soonhttp://data.gv.at 1 July Thank you!