CESSDA Workplan: Metadata Harvesting Tool

Slides:



Advertisements
Similar presentations
A PPARC funded project AstroGrid Framework Consortium meeting, Dec 14-15, 2004 Edinburgh Tony Linde Programme Manager.
Advertisements

Hello i am so and so, title/role and a little background on myself (i.e. former microsoft employee or anything interesting) set context for what going.
Hydra Partners Meeting March 2012 Bill Branan DuraCloud Technical Lead.
OneGeology-Europe - the first step to the European Geological SDI INSPIRE Conference 2010, Session Thematic Communities: Geology Krakow, June 24 th 2010.
Open Government Vlora Ademi, Business Development Manager-Edu, Microsoft Macedonia &Kosovo
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
SOAPI: a flexible toolkit for implementing ingest and preservation workflows Mark Hedges Centre for e-Research, King’s College London Arts and Humanities.
Data Grids: Globus vs SRB. Maturity SRB  Older code base  Widely accepted across multiple communities  Core components are tightly integrated Globus.
DSpace Rea Devakos and Gabriela Mircea University of Toronto Libraries.
Components and Architecture CS 543 – Data Warehousing.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Software Engineering Module 1 -Components Teaching unit 3 – Advanced development Ernesto Damiani Free University of Bozen - Bolzano Lesson 2 – Components.
Getting Started with WCF Windows Communication Foundation 4.0 Development Chapter 1.
Data-PASS Shared Catalog Micah Altman & Jonathan Crabtree 1 Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate.
By Mihir Joshi Nikhil Dixit Limaye Pallavi Bhide Payal Godse.
Microsoft and Community Tour 2011 – Infrastrutture in evoluzione Community Tour 2011 Infrastrutture in evoluzione.
EMI INFSO-RI SA2 - Quality Assurance Alberto Aimar (CERN) SA2 Leader EMI First EC Review 22 June 2011, Brussels.
C ross-European data sharing made easy EDAF Luxembourg.
1 Introduction to Middleware. 2 Outline What is middleware? Purpose and origin Why use it? What Middleware does? Technical details Middleware services.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Service Oriented Architecture + SOAP -Robin John.
Maite Barroso - 10/05/01 - n° 1 WP4 PM9 Deliverable Presentation: Interim Installation System Configuration Management Prototype
Jens Hartmann York Sure Raphael Volz Rudi Studer The OntoWeb Portal.
ATLAS-specific functionality in Ganga - Requirements for distributed analysis - ATLAS considerations - DIAL submission from Ganga - Graphical interfaces.
Renovation of Eurostat dissemination chain
Enabling the Cloud OS Today  New high-density Web Sites with elastic cloud scaling and complete dev-ops experiences  New rich IaaS experience for self-service.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
BHL-Europe Biodiversity Heritage Library for Europe – ECP-2008-DILI – Kick-off meeting – Berlin – May 2009www.biodiversitylibrary.org Biodiversity.
IoT R&I on IoT integration and platforms INTERNET OF THINGS
The Earth System Curator Metadata Infrastructure for Climate Modeling Rocky Dunlap Georgia Tech.
WMO WIS strategy – Life cycle data management WIS strategy – Life cycle data management Matteo Dell’Acqua.
1© Copyright 2012 EMC Corporation. All rights reserved. Authentication Manager Integration Services (AMIS) RSA Global Services Customer Presentation SP.
If it’s not automated, it’s broken!
Service Assurance in the Age of Virtualization
2nd GEO Data Providers workshop (20-21 April 2017, Florence, Italy)
Developing IoT endpoints with mbed Client
Deployment of Flows Loretta Auvil
CESSDA – for what and for whom?
Self Healing and Dynamic Construction Framework:
An Overview of Data-PASS Shared Catalog
Data Ingestion in ENES and collaboration with RDA
ICT PSP 2011, 5th call, Pilot Type B, Objective: 2.4 eLearning
Widening the European Infrastructure for Social Science Data Archives
Platform as a Service.
Presentation on Copernicus Dissemination
Chapter 18 MobileApp Design
The Improvement of PaaS Platform ZENG Shu-Qing, Xu Jie-Bin 2010 First International Conference on Networking and Distributed Computing SQUARE.
The evolution of the SDMX infrastructure and services
Distributed System Concepts and Architectures
Richard Waller NOF Technical Advisor UKOLN is supported by:
Cloud Management Mechanisms
6.2 data interoperability Rafael C Jimenez ELIXIR
Module 01 ETICS Overview ETICS Online Tutorials
WIS Strategy – WIS 2.0 Submitted by: Matteo Dell’Acqua(CBS) (Doc 5b)
$, $$, $$$ API testing Edition
Cloud Web Filtering Platform
Technical Capabilities
LOSD Publication Deirdre Lee
LOD reference architecture
High level seminar on the implementation of the
Integrating social science data in Europe
BUILDING A DIGITAL REPOSITORY FOR LEARNING RESOURCES
Quoting and Billing: Commercialization of Big Data Analytics
Roxane Silberman, Réseau Quételet
Jisc Research Data Shared Service (RDSS)
Azure Active Directory
Data(trans)forming Roberto Barcellan European Commission NTTS2019
Exchanging Data Management Plans with DDI
Distributed System using Web Services
Presentation transcript:

CESSDA Workplan: Metadata Harvesting Tool Ørnulf Risnes (NSD) John Shepherdson (UKDS) EDDI 15, Copenhagen 3 December 2015

CESSDA Consortium of European Social Science Data Archives Bring together social science data archives across Europe http://cessda.net/National-Data-Services/CESSDA-Members Developing a pan-European Research Infrastructure (RI) Facilitate researcher access to important resources of relevance to the European social science research agenda regardless of the location of either researcher or data Abstract The UK Data Service is providing access to its data and metadata holdings by making public some of its web service APIs. These REST APIs facilitate a self-service approach for our data producers and researchers alike, whilst also enabling 3rd party developers to write applications that consume our APIs and present novel and exciting ways of accessing and viewing some of the data collections that we hold. We have put new infrastructure in place to enable the provision of these APIs and have already run an App Challenge (for external developers to build mobile applications against our APIs) and added a data collection usage ‘leader board’ as initial tests of the functionality, capacity, account management, developer documentation and performance aspects of our public APIs. The main infrastructure elements are an API management service, HTTP caching and routing and various API endpoints. The other major consideration was a set of design principles for the APIs so that developers have a consistent and predictable experience. This presentation will elaborate on the key components of the infrastructure and the API design guidelines.

Commissioned by CESSDA 2015 work plan launched RI development Metadata Harvester task is part of the work plan “The objective of this Task is to select and/or develop and implement into CESSDA service a metadata harvesting tool that will enable the efficient compilation and operation of the CESSDA Product and Service Catalogue, the CESSDA Secure Access Portal, and other data management tasks and data supply services.” Open Source bundle due Q2 2016

Task Objectives Produce an easy to use metadata harvesting service Extensible design - use plugin architecture for inputs and outputs wide range of metadata sources must be harvested data to be emitted in a variety of metadata standards (more to life than DDI!)

Delivery Partners NSD and UKDS FSD, SND, DDA Design, implement, quality assure and document the metadata harvester to produce an Open Source bundle NSD lead the Task FSD, SND, DDA Test the OS project - build additional input/output plugins and harvest a variety of metadata source types and languages

Foundations Build on outputs of Data without Boundaries WP12 Prototype Resource Discovery Portal (DwB-RDP) Harvests DDI 1.2 from CESSDA SP’s Nesstar servers and converts to DwB-Disco format See harvesting ingest system report and DwB Resource Discovery Portal description for more details Needs to interoperate with CESSDA metadata model yet to be defined

Harvester as complex system Consumers Metadata sources

Harvester as complex stateful system Data store Consumers Metadata sources Images sourced via Google Images

Metadata Model Based on RDF-Disco How is it different? will perform gap analysis Harvesting challenges and normalisation Simplifying assumption normalisation handled by Consumer resumption and completeness are responsibility of Consumer => Harvester as stateless system

Functional description

Harvester as simple adaptor Metadata sources Consumers Harvester plugins Images sourced via Google Images

Harvester Extension Mechanism Webhooks Harvester calls a provided URI URI handles the call Images sourced via Google Images from pusher.com

Functional description Queries: ListRecordsForRepository GetRecordFromRepository Arguments: Repository/Record URI Repository type Type handler URI (optional)

Success Criteria Impact Analysis Quality/Usability of OS bundle How will it affect CESSDA Service Providers, CESSDA ERIC, EU Researchers Quality/Usability of OS bundle Establish maturity rating using NASA Reuse Readiness Levels, prior to testing Testing undertaken by FSD, SND, DDA against System Usability Scale

NASA Reuse Readiness criteria Ten levels for each of following: Documentation Extensibility Intellectual Property Modularity Packaging Portability Standards Compliance Support Verification and Testing Security Internationalisation and Localization

Deliverables Metadata harvester as a service Administration tool Provides an API for clients to consume it Administration tool Used to monitor and manage the harvester service May be readable to many, but will be writable by few Publically available Open Source Bundle Code base and documentation Facilitates creation of new harvesters and output formatters

How will it be developed? Where will it run?

Development Environment Common, cloud-based tool chain lower barriers to entry for all Service Providers no need to install and configure locally code repositories automated build and test documentation area Enure CESSDA has access to source code configuration files technical documentation that underpin its products and services

Production Environment Short-term cloud based hosting Experience will feed in to CESSDA’s requirements for compute and storage in order to host and run the components of the Research Infrastructure

Thanks for your attention