APARSEN-EGI Community Workshop on Managing, Computing and Preserving Big Data for Research Core services across e-infrastructures Persistent Identifiers.

Slides:



Advertisements
Similar presentations
The DRIVER Infrastructure (Digital Repository Infrastructure Vision for European Research) Paolo Manghi ISTI - National Research Council, Italy.
Advertisements

DRIVER Building a worldwide scientific data repository infrastructure in support of scholarly communication 1 JISC/CNI Conference, Belfast, July.
Joint Information Systems Committee Digital Library Services BL/JISC Workshop Rachel Bruce JISC Programme Director The Digital Library and its Services,
Digital Object Identifiers and Unique Authors Identifiers to enable services for data quality assessment, provenance, access Barbara.
ORCID – Institutional Uses Minimizing contributor disambiguation costs Use-case: MIT Libraries support for OA initiative Need to determine Institute scholarly.
Co-funded by the European Union under FP7-ICT Co-ordinated by aparsen.eu #APARSEN Achille Felicetti, Emanuele Bellini, Cinzia Luddi Fondazione Rinascimento.
Joint CASC/CCI Workshop Report Strategic and Tactical Recommendations EDUCAUSE Campus Cyberinfrastructure Working Group Coalition for Academic Scientific.
OpenAIRE: the European Scholarly Communication Infrastructure OCLC Research Workshop Libraries and Research: Supporting Change/Changing Support June 10.
Co-ordinated by aparsen.eu #APARSEN Co-funded by the European Union under FP7-ICT The Entity Name System (ENS): A technical infrastructure for implementing.
DARE: building a networked academic repository in the Netherlands ICOLC October 25 Ronald Dekker Delft University of Technology Library.
THE ODIN PROJECT Sergio Ruiz – DataCite Laura Paglione – ORCID ORCID and DataCite Interoperability Network: Connecting Identifiers This project has received.
CORDRA Philip V.W. Dodds March The “Problem Space” The SCORM framework specifies how to develop and deploy content objects that can be shared and.
Institutional Perspective on Credit Systems for Research Data MacKenzie Smith Research Director, MIT Libraries.
DataCite: Making Data Citable Jan Brase (DataCite/TIB Hannover) Brigitte Hausstein (GESIS) Wolfgang Zenk-Möltgen (GESIS)
A Robust Health Data Infrastructure P. Jon White, MD Director, Health IT Agency for Healthcare Research and Quality
By N.Gopinath AP/CSE. Why a Data Warehouse Application – Business Perspectives  There are several reasons why organizations consider Data Warehousing.
1 APARSEN - WP2200 Identifiers and Citability Interoperability Framework for PI systems Webinar on PI - 15 February 2013 Maurizio Lunghi.
1 Multi Cloud Navid Pustchi April 25, 2014 World-Leading Research with Real-World Impact!
APARSEN WP22 Identifiers and Citability APARSEN WP22 Identifiers and Citability Some key results Fondazione Rinascimento Digitale Emanuele Bellini, Chiara.
Co-funded by the European Union under FP7-ICT Co-ordinated by aparsen.eu #APARSEN Why persistent identifiers are crucial in digital preservation.
ODIN – ORCID and DataCite Interoperability Network ODIN Event October 2013 Jude England– British Library Funded by The European Union Seventh Framework.
1 CrossRef - a DOI Implementation for Journal Publishers January 29, 2003 CENDI Workshop.
Dataset Citation: From Pilot to Production Mark Martin Assistant Director, Office of Scientific and Technical Information U.S. Department of Energy.
Ensuring access to the record of science: driving changes in the role of research libraries APE2014 Berlin, 29 th January Susan Reilly Projects Manager.
Per Møldrup-Dalum State and University Library SCAPE Information Day State and University Library, Denmark, SCAPE Scalable Preservation Environments.
W HAT IS H ADOOP ? Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity.
A complementary view from the DIGOIDUNA study Paolo Bouquet, University of Trento, Italy SMART 2010/0054.
Sharing Research Data Globally Alan Blatecky National Science Foundation Board on Research Data and Information.
Helix Nebula The Science Cloud CERN – 14 May 2014 Bob Jones (CERN) This document produced by Members of the Helix Nebula consortium is licensed under a.
Towards a European network for digital preservation Ideas for a proposal Mariella Guercio, University of Urbino.
Scholarly communications Discussion group Linked Data Workshop May 2010.
© 2005 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice The China Digital Museum Project.
EU Project proposal. Andrei S. Lopatenko 1 EU Project Proposal CERIF-SW Andrei S. Lopatenko Vienna University of Technology
Architectural Design lecture 10. Topics covered Architectural design decisions System organisation Control styles Reference architectures.
METADATA WORKSHOP Conclusions Keith Jeffery Peter Wittenburg.
The Many Facets of Metadata Exchange Between Publishers and the Research Community: The Role that A&I Services and DOIs Play in Providing Access to Electronic.
VIVO and Scholarly Repositories: Synergistic Opportunities.
Recent Developments in CLARIN-NL Jan Odijk P11 LREC, Istanbul, May 23,
10/24/09CK The Open Ontology Repository Initiative: Requirements and Research Challenges Ken Baclawski Todd Schneider.
Distribution and components. 2 What is the problem? Enterprise computing is Large scale & complex: It supports large scale and complex organisations Spanning.
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
Technical Update 2008 Sandy Payette, Executive Director Eddie Shin, Senior Developer April 3, 2008 Open Repositories 2008, Fedora User Group.
Co-ordinated by aparsen.eu #APARSEN Co-funded by the European Union under FP7-ICT The importance of interoperability and intelligibility in digital.
JISC/CNI Conference Edinburgh, 26th June 2002 Challenges of Digital Preservation – do we have a road map? Maggie Jones.
Date, location Open Access policy guidelines for research institutions Name Logo area.
4 way comparison of Data Citation Principles: Amsterdam Manifesto, CoData, Data Cite, Digital Curation Center FORCE11 Data Citation Synthesis Group.
Margret Plank 17th International Conference on Grey Literature 1st and 2nd December 2015, Amsterdam (Netherlands) Move beyond text – How TIB manages the.
Infrastructure Breakout What capacities should we build now to manage data and migrate it over the future generations of technologies, standards, formats,
National Geospatial Enterprise Architecture N S D I National Spatial Data Infrastructure An Architectural Process Overview Presented by Eliot Christian.
Data Citation Implementation Pilot Workshop
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Digital Object Identifiers and Unique Authors Identifiers to enable services for data quality assessment, provenance, access Paolo.
PERSISTENT IDENTIFIERS FOR THE UK: SOCIAL AND ECONOMIC DATA …………………………………………………………………………………………………… LOUISE CORTI …………………….…………………………….… UK DATA ARCHIVE.
Open Science (publishing) as-a-Service Paolo Manghi (OpenAIRE infrastructure) Institute of Information Science and Technologies Italian Research Council.
Active Directory Domain Services (AD DS). Identity and Access (IDA) – An IDA infrastructure should: Store information about users, groups, computers and.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EUDAT Aalto Data.
Fedora Commons Overview and Background Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.
ODIN – ORCID and DATACITE Interoperability Network ODIN: Connecting research and researchers Sergio Ruiz - DataCite Funded by The European Union Seventh.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
Co-funded by the European Union under FP7-ICT Co-ordinated by aparsen.eu #APARSEN CoE offerings Simon Lambert STFC All Hands Meeting, Amsterdam,
EGI-InSPIRE RI An Introduction to European Grid Infrastructure (EGI) March An Introduction to the European Grid Infrastructure.
Intentions and Goals Comparison of core documents from DFIG and Publishing Workflow IG show that there is much overlap despite different starting points.
WP2200 Identifiers and Citability
INTAROS WP5 Data integration and management
Donatella Castelli CNR-ISTI
ACS 2016 Moving research forward with persistent identifiers
Distribution and components
Maggie, Carlo, Peter, Rebecca (GEDE discussions)
Jez Cope, Data Services Lead, The British Library
EOSC-hub Contribution to the EOSC WGs
Presentation transcript:

APARSEN-EGI Community Workshop on Managing, Computing and Preserving Big Data for Research Core services across e-infrastructures Persistent Identifiers Interoperability e-Infrastructure Prof. Paolo Bouquet University of Trento and OKKAM srl Amsterdam, 4-6 March 2014

A core service... Development and promotion of the uptake of a Digital Identifier e-infrastructure for digital objects (articles, datasets, collections, software, nomenclature, etc.), contributors and authors which cuts across geographical, temporal, disciplinary, cultural, organisational and technological boundaries, without relying on a single centralised system but rather federating locally operated systems to ensure interoperability. The requirements of all relevant stakeholder groups (researchers, libraries, data centres, publishers, etc.) will be addressed EINFRA , Provision of core services across e-infrastructures

Why do (research) (big) data need PIs ? PI management is an essential building block for enabling value-added services like: Ensuring Persistence of Access to data and content Fast, large-scale and decentralized Data Sharing & Reuse Effective Data Linkage across repositories Fine-grained Access Control Data and information Quality assessment Reputation assessment & Citation indexes Impact and ROI assessment (reliable research outputs beyond the scope of published literature) Ownership management for data and scholarly content (citability) … on top of scientific data and contents DIGOIDUNA report, 2011

Not only digital objects though... Other types of entities are crucial in building value added services: People (researchers, authors/contributors,...) Organizations (research institutions, funding bodies, companies, libraries, repositories,...) Events (conferences, experiment runs, projects / grants, publication,...) Geographical locations Artifacts (instruments, sensors, products,...)

Current situation

The need for an interop PI infrastructure Fragmentation: there exist already several PI systems for the same categories of objects (mainly for digital objects and authors) Heterogeneity: different systems provide different information (metadata) and services about the same entity Scalability: very often interoperability is defined point-to- point, which is not very effective on a big scale Openness: interoperability should not be managed in a closed silos Business apps: interoperability enables other (more complex) business services, and therefore should be managed separately

PI Interoperability Infrastructure Point-to-point Interoperability Solutions Isolated PI Systems DOI URN-NBN Cool URI PURL ARK ISNI ORCID DOI (DataCite) ORCID ISNIORCID ISNI VIAF PI INTEROPERABILITY INFRASTRUCTURE URN-NBN Cool URI ISNI ORCID DOI ARK PURL APARSEN Interoperability Framework for PIs

PI interoperability Infrastructure: Examples of KEY SERVICES PI Service Registration: existing PI systems can register as data and service providers about a set of entities Entity Matching: supports the alignment between 2 (or more) PI systems Lifecycle management: support interoperability through time (changing mappings, adding entities, etc.) PI service lookup: given a PI, returns all known PIs and resolvers for the same entity Vocabulary Mapping: given an attribute in a PI system, returns equivalent attributes in different PI systems

Handle NBN ARK FRD DNB DANS STM GLOBIT CERN ORCID VIAF CERN ISNI DOI INTEROPERABILITY FRAMEWORK New services cross-domains for users requirements Making data and contents from different PI platforms accessible in a seamless way APARSEN interop framework

THE ENTITY NAME SYSTEM (ENS): A TECHNICAL PLATFORM FOR IMPLEMENTING THE APARSEN INTEROPERABILITY FRAMEWORK

What is the ENS? An open, public infrastructure for supporting PI interoperability Outcome of an FP7 project Maintained by a company (OKKAM) which was created in 2010 as a follow-up of the research project Live since 2007 Already provides the technical support for most key services which are needed in an interoperability infrastructure Constantly updated and improved

ENS in Numbers (March 2014) 8.6 million entities largest part Locations, Persons, Organizations 6 top-level categories person, location, organization, event, artifact type and artifact instance 5 general purpose Matching Modules Fingerprint, FBEM, Jolly Matcher, Group Linkage, Eureka 3 different Persistence layers available Apache HBase 0.9x (Hadoop), Oracle MySQL 5.5, Apache Solr (3.6 and 4.5) ~0,2-0,4 second per query (matching) deployed on 8 server cluster (144 core AMD Opteron 2,6 GHz) at Unitn 51+ million entity matching requests served Since January 2011

Benefits of the ENS as a basis for interop infrastructure Scalable, BigData-size architecture already available Key services already available Implements a thin, neutral layer which is not in competition with the current PI platforms Access control, preservation policies and trust levels are fully decentralized (as they are delegated to each PI system) Fully distributed architecture, only a very lightweight form of logical centralization (see analogy with DNS) Sustainable model, no big costs involved

Conclusions The fundamental trade-off between centralization / decentralization (the DNS analogy) Separation of concerns: ID management should be kept independent from the implementation of other value- added community services Overcoming organizational barriers: PI interoperability as a basis for implementing services and business solutions ENS full compatibility with existing PI initiatives (e.g. ORCID for authors) and EC recommendations, it can be a technical building block for a PI interoperability e-infrastructure