Metadata Quality: Learning from Open Data Portalwatch

Slides:



Advertisements
Similar presentations
A multi-level metadata approach for a Public Sector Information data infrastructure Nikos Houssos 1,2, Brigitte Jörg 1,3, Brian Matthews 4 1 euroCRIS 2.
Advertisements

(1) Standardizing for Open Data Ivan Herman, W3C Open Data Week Marseille, France, June Slides at:
Web Standards and Technical Challenges for Publishing and Processing Data on the Web Axel Polleres web:
Do not use fonts other than Arial for your presentations ‘From A2A to Web 3.0’: local authority archives and the challenges in working across sectors in.
Provenance in Open Distributed Information Systems Syed Imran Jami PhD Candidate FAST-NU.
OneGeology-Europe - the first step to the European Geological SDI INSPIRE Conference 2010, Session Thematic Communities: Geology Krakow, June 24 th 2010.
Ocean data dissemination Jon Blower, University of Reading, UK Steve Hankin, Bob Keeley, Sylvie Pouliquen, Jeff de la Beaujardière, Edward Vanden Berghe,
Issues and challenges Stakeholder workshop, 29 Jan 2003.
ÆKOS: A new paradigm for discovery and access to complex ecological data David Turner, Paul Chinnick, Andrew Graham, Matt Schneider, Craig Walker Logos.
LINKED DATA AS A SERVICE WITH THE INFORMATION WORKBENCH SEMTECHBIZ San Francisco 2012 Peter Haase fluid Operations AG.
Connecting Repositories Zdenek Zdrahal Knowledge Media Institute The Open University, UK UNESCO, Paris, 26 February 2013.
How can you use Open Data? ... And why you should!
Semantic Web outlook and trends May The Past 24 Odd Years 1984 Lenat’s Cyc vision 1989 TBL’s Web vision 1991 DARPA Knowledge Sharing Effort 1996.
Michalis Vafopoulos NTUA, GFOSS & The transformers GREEN CITY HACKATHON.
Web Standards and Technical Challenges for Publishing and Processing Open Data Axel Polleres web:
First they have to find it: Getting Government Data Discovered and Used Adapted from: John S. Erickson, Ph.D. Tetherless World Constellation Rensselaer.
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
Bridging Communities and Data with ArcGIS Open Data Courtney Claessens, Product Engineer Daniel Fenton, Product Engineer.
Boris Villazón-Terrazas, Ghislain Atemezing FI, UPM, EURECOM, Introduction to Linked Data.
Publications Office Metadata Registry (MDR) INSPIRE Registry and Registers Workshop Willem van Gemert Publications Office of the EU Dissemniation and Reuse.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
Eurostat SDMX and Global Standardisation Marco Pellegrino Eurostat, Statistical Office of the European Union Bangkok,
Find Research Data b2find.eudat.eu B2FIND User Training How to find data objects and collections using EUDAT’s B2FIND This work is licensed.
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
Introduction to the VO ESAVO ESA/ESAC – Madrid, Spain.
Open Science (publishing) as-a-Service Paolo Manghi (OpenAIRE infrastructure) Institute of Information Science and Technologies Italian Research Council.
Linked Open Data for European Earth Observation Products Carlo Matteo Scalzo CTO, Epistematica epistematica.
BG 5+6 How do we get to the Ideal World? Tuesday afternoon What gaps, challenges, obstacles prevent us from attaining the vision now? What new research.
LINKED DATA what you need to know to understand, produce, and work with Linked Data Robert Chavez, PhD. Senior Content Solutions Architect, NEJMGroup NETSL.
UNEP Live. What is UNEP Live? - An on-line knowledge management platform - Focuses on open access to global, regional and national data and knowledge.
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
GCI Architecture GEOSS Information System Meeting 20 September 2013, ESA/ESRIN (Frascati, Italy) M.Albani (ESA), D.Nebert (USGS/FGDC), S.Nativi (CNR)
Linking Big Data from Space to Apps on Earth
Linked Open Data Approaches within the ARIADNE project
<Panel: The Art & Science of Data Visualization>
Activities in a nutshell
Sebastian Neumaier Advisor: Univ.Prof. Dr. Axel Polleres Co-Advisor:
Designing a better future: Active, actionable DMPs
Department of Geography Jeon-Young Kang · Yi Yang
Justin Buck OceanSITES data Incentives for participation: Data citation & data services Justin Buck
14.00 – The common EURES IT platform & the mapping process - workshop Martin Le Vrang, DG EMPL Kornelia Kozovska, DG EMPL Zoltan Patkai, DG EMPL.
Integrating Data for Archaeology
Flanders Marine Institute (VLIZ)
Kadaster Data Platform
ACS 2016 Moving research forward with persistent identifiers
Yesterday in a talk this slide was presented.
Lifting Data Portals to the Web of Data
MIG-T relevant activities in the ELISE ISA2 Action
OpenAIRE Services for Open Science
Axel Polleres Technical aspects vs. Innovation challenges of Enabling and Enhancing Privacy Axel Polleres
Digital transformation of democracy?
Spatial Data on the Web :
Building an Open Knowledge Graphs for and from Open Data
<Panel: The Art & Science of Data Visualization>
Linked Data for SDG Reporting
EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal
Geospatial Data Use and sharing Concepts
How can DDI make the most of RDF?
The Global Digital Library will increase the availability of high quality learning resources in underserved languages worldwide.
Google Dataset Search Evaluation
LOD reference architecture
The concept.
W3C Recommendation 17 December 2013 徐江
MSDI training courses feedback MSDIWG10 March 2019 Busan
TOOLS & Projects overview
September 12-14, 2018 Raleigh, NC.
Journal of Web Semantics 55 (2019)
Linked Data Ryan McAlister.
Data & Software Support – Why you probably don’t know us!
Australian and New Zealand Metadata Working Group
Presentation transcript:

Metadata Quality: Learning from Open Data Portalwatch Axel Polleres twitter: @AxelPolleres web: polleres.net web: http://polleres.net twitter: @AxelPolleres

Linked Data SPARQL Open Data 2 Latest release 04-30-2018- 1184 Datasets

Linked Data But: Open Data is more than Linked Open Data… 3 Latest release 04-30-2018- 1184 Datasets

Open Data is a Global Trend! EU & Austria, but also the (previous) US and UK administration are/were pushing Open Data! DIRECTIVE 2007/2/EC INSPIRE 4

A lot of Open Data is not Linked Data Cities, International Organizations, National and European Portals, Int'l. Conferences: We are aware of currently 260 active such portals worldwide. 5

Different portals…

What’s the problem(s)? Metadata is heterogeneous and (partially) messy Software-specific metadata (CKAN vs Socrata vs …) Portal-specific metadata Missing metadata (file formats, API descriptions, …) Metadata not available as Linked Data Only partially in DCAT vocabulary No mappings for additional metadata fields Poor discoverability of datasets No content information in metadata (e.g., CSV headers) Datasets’ metadata not optimized for search engines (Meta-)data becomes stale/offline/outdated Metadata still in silos -> bring it to the Web Different portals use different metadata Data sources become offline or stale Meta-data doesn’t reflect reality Data is hardly searchable

Open Data Portal Software CKAN ... http://ckan.org/ almost „de facto“ standard for Open Data Portals facilitates search, metadata (publisher, format, publication date, license, etc.) for datasets http://data.gov/ http://data.gv.at/ machine-processable? ... ... partially

Our solution: Open Data Portalwatch Monitoring Metadata quality Mapping to standard vocabularies Enriching Metadata to improve search

1) Monitoring and QA over evolving data portals 3/2015 [1]: - 90 portals - Only CKAN 8/2015 [2]: - 6 quality metrics - QA 6/2016 [3]: - 260 portals - CKAN, Socrata, OpenDataSoft - 18 metrics [1] Towards assessing the quality evolution of open data portals. In ODQ2015: Open Data Quality Workshop, Munich, Germany [2] Quality assessment & evolution of open data portals. In: International Conference on Open and Big Data, Rome, Italy (2015) [3] Automated quality assessment of metadata across open data portals. ACM Journal of Data and Information Quality (2016)

Demo: http://data.wu.ac.at/portalwatch/portal/data_gov/1818

2) Mapping to Standard vocabularies & Linked Data Mapping & Heuristic Enrichment DCAT PROV CSVW Schema.org Enable uniform access: SPARQL endpoint  Linked Data & Memento Protocol [1] http://data.wu.ac.at/portalwatch/sparql [2] http://data.wu.ac.at/odso/

Finally: 3) Why is Search in Open Data a problem?

Why is Search in Open Data a problem? https://www.youtube.com/watch?v=kCAymmbYIvc Structured Data in Web Search by Alon Halevy vs. Open Data Search is hard... No natural language „cues“ like in Web tables... Existing knowledge graphs don‘t cover the domain of "Open Data“ Open Data is not properly geo-referenced

Some starting points: First baby steps on building an Open Data Knowledge Graph: Ongoing work to enable spatio- temporal search in Open Data e.g. in our project communidata.at (just submitted to JWS) International Semantic Web conference 2016:

Demo: http://data.wu.ac.at/odgraphsearch/ Idea: Link OD to geospatial and temporal Knowledge bases: Geonames OpenStreetMap Perio.do Wikidata …

Take-home messages: Scalable monitoring of metadata quality and evolution is possible and useful Meta-Data Quality can be improved by IE and looking into the data Linked Data and standard protocols (SPARQL, Memento) helps to provide an integrated view Enable combination with other sources to improve search

Other Ongoing Projects (data.wu.ac.at)