How can you use Open Data? ... And why you should!

Slides:



Advertisements
Similar presentations
Improving Learning Object Description Mechanisms to Support an Integrated Framework for Ubiquitous Learning Scenarios María Felisa Verdejo Carlos Celorrio.
Advertisements

The DRIVER Infrastructure (Digital Repository Infrastructure Vision for European Research) Paolo Manghi ISTI - National Research Council, Italy.
Mobile Learning Resource Exchange (mLRE) David Massart, EUN EdReNe Meeting Chasseneuil, France - Oct. 26, 2011.
What is a Database By: Cristian Dubon.
Management Information Systems, Sixth Edition
Big Data and Predictive Analytics in Health Care Presented by: Mehadi Sayed President and CEO, Clinisys EMR Inc.
Web Standards and Technical Challenges for Publishing and Processing Data on the Web Axel Polleres web:
CPSC 695 Future of GIS Marina L. Gavrilova. The future of GIS.
Integrated Scientific Workflow Management for the Emulab Network Testbed Eric Eide, Leigh Stoller, Tim Stack, Juliana Freire, and Jay Lepreau and Jay Lepreau.
Navigating and Sharing in a Decentralized World Francisco Matias Cuenca-Acuna
Major Tasks in Data Preprocessing(Ref Chap 3) By Prof. Muhammad Amir Alam.
This presentation will guide you though the initial stages of installation, through to producing your first report Click your mouse to advance the presentation.
Case Studies: Statistics Canada (WP 11) Alice Born Statistics UNECE Workshop on Statistical Metadata.
Michalis Vafopoulos NTUA, GFOSS & The transformers GREEN CITY HACKATHON.
Keeping an Open Mind OPEN DATA SUZANNE VAN DEN HOOGEN, MLIS DLI WORKSHOP FREDERICTON, NB APRIL 28, 2015.
CONTENTS Arrival Characters Definition Merits Chararterstics Workflows Wfms Workflow engine Workflows levels & categories.
Web Standards and Technical Challenges for Publishing and Processing Open Data Axel Polleres web:
BUFFALO 311 CRM-CM Overview CITY OF BUFFALO Division of Citizen Services.
Definition of Computational Science Computational Science for NRM D. Wang Computational science is a rapidly growing multidisciplinary field that uses.
Per Møldrup-Dalum State and University Library SCAPE Information Day State and University Library, Denmark, SCAPE Scalable Preservation Environments.
Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, Computing Systems Laboratory School of Electrical.
1 Adapted from Pearson Prentice Hall Adapted form James A. Senn’s Information Technology, 3 rd Edition Chapter 7 Enterprise Databases and Data Warehouses.
File Processing - Database Overview MVNC1 DATABASE SYSTEMS Overview.
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Sept. 5, 2012 Kevin T. Gallagher and Linda C. Gundersen September 5, 2012 CDI Science.
The aim / learning outcome of this module is to understand how to gather and use data effectively to plan the development of recycling and composting.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
The Value of Geospatial Metadata Metadata has tremendous value to Individuals within your organization, as well as to individuals outside of your organization.
Modernisation of ESS infrastructure: The ESS instruments - a review E. di Meglio – P. Jacques – J.M. Museux.
Database Design Part of the design process is deciding how data will be stored in the system –Conventional files (sequential, indexed,..) –Databases (database.
1 How to make the GEOSS Data CORE a reality Part 1 Max Craglia Presented on behalf of Max by: Alan Edwards, EC Stefano Nativi, CNR.
Innovations in Data Dissemination Thomas L. Mesenbourg, Jr. Acting Director U.S. Census Bureau United Nations Seminar on Innovations in Official Statistics.
IAEA International Atomic Energy Agency Open Data at NIS United Nations Library and Information Network for Knowledge Sharing (UN-LINKS) October.
Introducing HingX now with Capacity Development Network.
Directorate for Food, Agriculture, and Fisheries 1 ORGANISATION FOR ECONOMIC CO-OPERATION AND DEVELOPMENT ORGANISATION DE COOPÉRATION ET DE DEVELOPMENT.
Copyright © 2012, SAS Institute Inc. All rights reserved. ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY,
1 DSARCH OVERVIEW Dataset Archiving Utility Overview By Zaihua Ji.
 Knowledge management is managing the organization’s knowledge by means of systematic and organized processes.  For acquiring, organizing, sustaining,
| 1 Open Access Advancing Text and Data Mining Libraries & Publishers working together to support Researchers What is Text Mining?
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
29 March 2004 Steven Worley, NSF/NCAR/SCD 1 Research Data Stewardship and Access Steven Worley, CISL/SCD Cyberinfrastructure meeting with Priscilla Nelson.
Modernization of official statistics Eric Hermouet Statistics Division, ESCAP
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Status Report on the Validation Framework S. Banerjee, D. Elvira, H. Wenzel, J. Yarba Fermilab 15th Geant4 Collaboration Workshop 10/06/
The Research Data Archive at NCAR: A System Designed to Handle Diverse Datasets Bob Dattore and Steven Worley National Center for Atmospheric Research.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
A Cyberinfrastructure for Drought Risk Assessment An Application of Geo-Spatial Decision Support to Agriculture Risk Management.
Santi Thompson - Metadata Coordinator Annie Wu - Head, Metadata and Bibliographic Services 2013 TCDL Conference Austin, TX.
United Nations Statistics Division Overview of handbook on rapid estimates Expert Group Meeting on Short-Term Economic Statistics in Western Asia
BG 5+6 How do we get to the Ideal World? Tuesday afternoon What gaps, challenges, obstacles prevent us from attaining the vision now? What new research.
© 2007 IBM Corporation IBM Software Strategy Group IBM Google Announcement on Internet-Scale Computing (“Cloud Computing Model”) Oct 8, 2007 IBM Confidential.
© CGI Group Inc. EGI-InSPIRE Open Data and Business Modelling for Open Science John van Echtelt Business Model Innovator Madrid, 18 September 2013.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
James A. Senn’s Information Technology, 3rd Edition
Food and Agriculture Organization of the United Nations
Sebastian Neumaier Advisor: Univ.Prof. Dr. Axel Polleres Co-Advisor:
Steering Group Member, Link Digital
Trevor Taylor, Director, Member Services, Asia and the Americas,
Lifting Data Portals to the Web of Data
Metadata Quality: Learning from Open Data Portalwatch
DIME ITDG, Luxembourg 28 June 2016
Generic Statistical Business Process Model (GSBPM)
(VIP-EDC) Point 6 of the agenda
OpenML Workshop Eindhoven TU/e,
Water Quality Portal Data Tools
Featured Presentation
C.U.SHAH COLLEGE OF ENG. & TECH.
Big Data ESSNet WP 1: Web scraping / Job Vacancies Pilot
Data Warehousing Data Mining Privacy
Presentation transcript:

How can you use Open Data? ... And why you should! Axel Polleres web: http://polleres.net twitter: @AxelPolleres

What is Open Data? Availability and Access: the data must be available as a whole and at no more than a reasonable reproduction cost, preferably by downloading over the internet. The data must also be available in a convenient and modifiable form. Reuse and Redistribution: the data must be provided under terms that permit reuse and redistribution including the intermixing with other datasets. The data must be machine- readable. Universal Participation: everyone must be able to use, reuse and redistribute – there should be no discrimination against fields of endeavour or against persons or groups. For example, ‘non-commercial’ restrictions that would prevent ‘commercial’ use, or restrictions of use for certain purposes (e.g. only in education), are not allowed. See more at: http://opendefinition.org/okd/ Open Knowledge Foundation

Open Data vs. Big Data http://www.opendatanow.com/2013/11/new-big-data-vs-open-data-mapping-it-out/

Open Data is a global trend: Cities, International Organizations, National and European Portals, Int'l. Conferences: 4

In today's talk: 2 projects on recent projects at WU: What is the status of Open Data and what are the challenges using Open Data? OpenData PortalWatch – a project at WU How can Open Data be used? Open City Data Pipeline – a joint project with Siemens on using Open Data What's next? Improving Open Data Quality: ADEQUATE (FFG - project) Why should you care? WU can help you in your Open Data Strategy!

Challenges: Open Data also has the "Vs" Volume: It's growing! (we currently monitor 90 CKAN portals, 512543 resources/ 160069 datasets, at the moment (statically) ~1TB only CSV files... Variety: different datasets (from different cities, countries, etc.), only partially comparable, partially not. Different metadata Different data formats Velocity: Open Data changes regularly (fast and slow) New datasets appear, old ones disappear

OPEN DATA PORTAL WATCH http://data.wu.ac.at/portalwatch/ Periodically monitoring a list of Open Data Portals 90 CKAN powered Open Data Portals Quality assessment Evolution tracking Meta data Data

Open Data Portals CKAN ... http://ckan.org/ almost „de facto“ standard for Open Data Portals facilitates search, metadata (publisher, format, publication date, license, etc.) for datasets http://datahub.io/ http://data.gv.at/ machine-processable? ... ... partially

Open Data Portal list

QUALITY DIMENSIONS DIMENSION DESCRIPTION Retrievability The extent to which meta data and resources can be retrieved. Usage The extent to which available meta data keys are used to describe a dataset. Completeness The extent to which the used meta data keys are non empty. Accuracy The extent to which certain meta data values accurately describe the resources. Openness The extent to which licenses and file formats conform to the open definition. Contactability The extent to which the data publisher provide contact information. Objective measures which can be automatically computed in a scalable way

Portal Overview

Portal Details

ODP Evolution

ODP CHANGES

Data Dumps OPEN DATA PORTAL WATCH provides an archive of Open Data portal crawls (weekly snapshots/dynamic crawling framework):

Open Data Portal Watch http://data.wu.ac.at/portalwatch/ Key findings: ODQ2015 http://data.wu.ac.at/portalwatch/ Key findings: Varying quality acrosss portals Rapid growth for some portals Huge variety and range of datasets

But: How to use all that Open Data? More challenges: How to find the right datasets? How to integrate related datasets? How to deal with heterogeneous/missing data

Use Case: City Data – Important for Infrastructure Providers & for City Decision Makers City Assessment and Sustainability reports Tailored offerings by Infrastructure Providers … however, these are often outdated before even published! Goal (short term): Leverage Open Data for calculating a city’ performance from public sources on the Web automatically Needs up-to-date City Data and calculates City KPIs in a way that allows to display the current state and run scenarios of different product applications. e.g. towards a “Dynamic” Green City Index: Goal (long term): Define and Refine KPI models to assess specific impact of infrastructural investments and gather/check input automatically 18

City Data Pipeline http://citydata.wu.ac.at/

City Data Pipeline: Architecture Crawl and integrate Open data sources with city data: Wikipedia Eurostat UNData USCencus Etc. Use statistical methods to approximate missing values Re-publish as Linked Data (SPARQL endpoint, provenance information, etc.

Challenges – Missing values Found a large amount of missing values Two Reasons: Incomplete data published by providers (Tables 1+2) The combination of different data sets with disjoint cities and indicators (later)

Challenges – Missing values Individual datasets (e.g. from Eurostat) have missing values Merging together datasets with different indicators/cities adds sparsity

Missing Values – Hybrid approach choose best prediction method per indicator: Our assumption: every indicator has its own distribution and relationship to others. Basket of „standard“ regression methods: K-Nearest Neighbour Regression (KNN) Multiple Linear Regression (MLR) Random Forest Decision Trees (RFD)

Missing Values – Hybrid approach choose best prediction method per indicator: Instead of using indicators directly we use PCs, built from the indicators For buidling the PCs, fill in missing data points with neutral values → predict all rows

Open Data: The more, the merrier! City Data Pipeline citydata.wu.ac.at Search for indicators & cities obtain results incl. sources Integrated data served as Linked Data Predicted values for missing data... Open Data: The more, the merrier! ...assumption: Predictions get better, the more Open data we integrate...

More Details: Stefan Bischof, Christoph Martin, Axel Polleres, and Patrik Schneider. Open City Data Pipeline: Collecting, Integrating, and Predicting Open City Data. In 4th Workshop on Knowledge Discovery and Data Mining Meets Linked Open Data (Know@LOD), co-located with ESWC2015, Portoroz, Slovenia, May 2015.

What's next? Collaborations to make Open Data usage more effective: Improving Open Data Quality https://www.data.gv.at/wp-content/uploads/2012/03/Mission-Statement- AG-Qualitaetssicherung-OpenData-Portale.pdf Upcoming: ADEQUATe: Analytics & Data Enrichment to improve the QUAliTy of Open Data Project Start: Fall 2015

Open your data & include Open Data in your Data Strategy! A "sister" portal for http://data.gv.at for non- governmental open data launched in 2014 http://www.opendataportal.at/ We can help you to use and publish Open Data! WU, TU, SWC, DUK have just founded a network node of the Official Launch: 4. OGD D-A-CH-LI - Konferenz - Open X 24. Juni 2015, Wiener Rathaus

Want to learn more?  Talk to me about Your Open Data Strategy! Axel.Polleres@wu.ac.at Twitter: @AxelPolleres http://wu.ac.at/infobiz/ Maybe see you at one of the following events: http://2015.data-forum.eu/ http://semantics.cc/