Data Fabric IG Use Case Analysis. 2 Data Fabric Analysis how to come to essential components & services? Analyze Data Practices.

Slides:



Advertisements
Similar presentations
Research Data Access and Preservation Summit Panel 2 - Promoting Re-Use of Scientific Collections Some responses to the questions posed... John Harrison.
Advertisements

DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Advanced Metadata Usage Daan Broeder TLA - MPI for Psycholinguistics / CLARIN Metadata in Context, APA/CLARIN Workshop, September 2010 Nijmegen.
Co-funded by the European Union under FP7-ICT Co-ordinated by aparsen.eu #APARSEN Welcome to the Conference !! Juan Bicarregui Chair, APA Executive.
The current state of Metadata - as far as we understand it - Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Metadata Management and Cataloging Breakout Jim Myers, Line Pouchard Ann Chervenak, Richard Mount, Larry Rahn, Greg Riccardi, Sonja Tidemann, Steve Wiley.
RDA Wheat Data Interoperability Working Group Outcomes RDA Outputs P5 9 th March 2015, San Diego.
DATA FOUNDATION TERMINOLOGY WG 4 th Plenary Update THE PLUM GOALS This model together with the derived terminology can be used Across communities and stakeholders.
SC32 WG2 Metadata Standards Tutorial Metadata Registries and Big Data WG2 N1945 June 9, 2014 Beijing, China.
WPS Application Patterns at the Workshop “Models For Scientific Exploitation Of EO Data” ESRIN, October 2012 Albert Remke & Daniel Nüst 52°North Initiative.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Increasing the usage of endangered language archives in the.
STANDARDS AND INTEROPERABILITY; RIGHTS ISSUES Status and summary 1.
Digital Object Architecture
JISC CETIS Conference, Oxford, November 2004 Repositories: State of ELF “volunteer”: Martin Morrey Intrallect Ltd.
U.S. Department of the Interior U.S. Geological Survey Next Generation Data Integration Challenges National Workshop on Large Landscape Conservation Sean.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Why should we invest in DWF? Peter Wittenburg CLARIN Research.
Position Paper for Data Fabric IG Interoperability, Infrastructures and Virtuality Gary Berg-Cross, Keith.
CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics.
Wishes from Hum infrastructures Examples: DOBES and CLARIN Peter Wittenburg Max Planck Institute for Psycholinguistics.
Data Fabric IG Introduction. 2  about 50 interviews & about 75 community interactions  Data Management and Processing is too time consuming and costly.
Summary Data Practices Report Peter Wittenburg Max Planck Data & Compute Center former MPI for Psycholinguistics.
Teranode Tools and Platform for Pathway Analysis Michael Kellen, Solution Manager June 16, 2006.
EU Project proposal. Andrei S. Lopatenko 1 EU Project Proposal CERIF-SW Andrei S. Lopatenko Vienna University of Technology
SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.
Open What? Glad they called it Data... ? 1. President - CEO V-ICT-OR President L.O.L.A. Eddy Van der 2.
METADATA WORKSHOP Conclusions Keith Jeffery Peter Wittenburg.
Wikis, Standards and Everything Lee GillamLaurent Romary University of SurreyMax-Planck Digital Library.
Introduction to Grid Computing Ed Seidel Max Planck Institute for Gravitational Physics
RDA Data Foundation and Terminology (DFT) WG: Overview  Prepared for Collab Chairs Meeting, NIST, Nov 13-14, 2014  Gary Berg-Cross, Raphael Ritz, Peter.
CLARIN Issues Peter Wittenburg MPI for Psycholinguistics Nijmegen, NL.
Repository Registries Agenda State of the discussion Actions to be taken for Collection Registry Building a WG on Repository Registries Integrating the.
Recent Developments in CLARIN-NL Jan Odijk P11 LREC, Istanbul, May 23,
10/24/09CK The Open Ontology Repository Initiative: Requirements and Research Challenges Ken Baclawski Todd Schneider.
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
A Brave NEtWork World Rob Willis, Ross & Associates Node Mentoring Workshop New Orleans, LA February 28, 2005.
The data standards soup … Is the most exciting topic you can dream of.
ACGT: Open Grid Services for Improving Medical Knowledge Discovery Stelios G. Sfakianakis, FORTH.
Research Data Allience Why and what Peter Wittenburg.
Structured Data Capture (SDC) Gap Mitigation July 18, 2013.
NeuroLOG ANR-06-TLOG-024 Software technologies for integration of process and data in medical imaging A transitional.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
SEEK Science Environment for Ecological Knowledge l EcoGrid l Ecological, biodiversity and environmental data l Computational access l Standardized, open.
Sources of inspiration Discussions in DFT Use Cases Discussions in DF Use Cases „Paris“ Document Comments on „PARIS“ document Urgently need “Basic and.
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
OOI Cyberinfrastructure and Semantics OOI CI Architecture & Design Team UCSD/Calit2 Ocean Observing Systems Semantic Interoperability Workshop, November.
Repository Registries Agenda 11.30Welcome & State of the Discussion Is it all one – is it all different? Peter & Herman and commenters 12.10Actions to.
Data Foundation IG DF Organizing Chairs: Gary Berg-Cross & Peter Wittenburg.
The use of the EUDAT repository to store clinical trials in a secure and compliant way Wolfgang Kuchinke Heinrich-Heine University Düsseldorf Germany.
What problems are we trying to solve? Hannes Tschofenig.
Preservation e-Infrastructure IG Description: help ensure preservation of needed data succeeds Goals: foster worldwide collaboration; ensure consistency.
National Biological Information Infrastructure Tom Lahr USGS Biological Resources Division, Office of Biological Informatics and Outreach Information Technology.
1 This slide indicated the continuous cycle of creating raw data or derived data based on collections of existing data. Identify components that could.
Intentions and Goals Comparison of core documents from DFIG and Publishing Workflow IG show that there is much overlap despite different starting points.
Accessing the VI-SEEM infrastructure
Software & Technologies: an overview
PIDs in EUDAT Webinar, 15 Februari 2013
Towards a pan-European Collaborative Data Infrastructure
Overview of WGs, IGs and BoFs
AAI for a Collaborative Data Infrastructure
ACS 2016 Moving research forward with persistent identifiers
Maggie, Carlo, Peter, Rebecca (GEDE discussions)
Helping a friend out Guidelines for better software
European Research Data Services, Expertise & Technology Solutions
Metadata in Digital Preservation: Setting the Scene
Joint DFIG – Broker Meeting The DFIG view Peter Wittenburg
Bird of Feather Session
Working Group: DFT - some use cases - Peter Wittenburg, Raphael Ritz
EOSC-hub Contribution to the EOSC WGs
Presentation transcript:

Data Fabric IG Use Case Analysis

2 Data Fabric Analysis how to come to essential components & services? Analyze Data Practices

3 Data Practices I (120 interviews etc.)

4 Data Practices II – EUDAT federation Community Centers Common Data Centers projects to push limits and raise awareness

5 Data Practices II – split of functions  physical layer operations are trivial – know how to do it  “logical layer” operations are complex due to relations, etc.  all LL information needs to be aggregated and we need to have a secure access layer around it

6 Data Fabric Analysis how to come to essential components & services? Analyze Use Cases

7 10 (+5) Use Cases so far (2 in development, others mature) environmental sciencenatural sciencelife sciencehumanities, soc. sciencesIT, various all indicated nodes are centers of national, regional and even worldwide federations

8 10 (+5) Use Cases so far (2 in development, others mature) all indicated nodes are centers of national, regional and even worldwide federations NameInstitutestate 1Language ArchiveMax Planck Institute NLin operation 2Geodata Sharing PlatformAcademy of ChinaIn operation 3Datanet Federation ConcortiumRENCI USIn operation 4ADCIRC Storm ForcastingRENCI USIn operation 5EPOS Plate ObservationINGV/CINECA ItalyIn operation 6ENVRI Environment ObservationU Helsinki, FinlandIn design 7Nanoscopy Repository Cell structuresKIT, GermanyIn design 8Human Brain NeuroinformaticsEPFL Switzerlandin testing 9ENES Climate ModelingDKRZ GermanyIn operation 10LIGO Gravitation PhysicsNCSA USIn operation 11ECRIN Medical Trial InteroperationU Düsseldorf GermanyIn testing 12VPH Physiology SimulationU London UKIn operation 13Species ArchiveNature Museum GermanyIn operation 14International NeuroI FacilityINCF SwedenIn operation 15Molecular GeneticsMPI GermanyIn operation

9 10 (+5) Use Cases so far (2 in development, others mature) all indicated nodes are centers of national, regional and even worldwide federations NameInstitutestate 1Language ArchiveMax Planck Institute NLin operation 2Geodata Sharing PlatformAcademy of ChinaIn operation 3Datanet Federation ConcortiumRENCI USIn operation 4ADCIRC Storm ForcastingRENCI USIn operation 5EPOS Plate ObservationINGV/CINECA ItalyIn operation 6ENVRI Environment ObservationU Helsinki, FinlandIn design 7Nanoscopy Repository Cell structuresKIT, GermanyIn design 8Human Brain NeuroinformaticsEPFL Switzerlandin testing 9ENES Climate ModelingDKRZ GermanyIn operation 10LIGO Gravitation PhysicsNCSA USIn operation 11ECRIN Medical Trial InteroperationU Düsseldorf GermanyIn testing 12VPH Physiology SimulationU London UKIn operation 13Species ArchiveNature Museum GermanyIn operation 14International NeuroI FacilityINCF SwedenIn operation 15Molecular GeneticsMPI GermanyIn operation a few side remarks these are all federated approaches some have various use cases (one selected) 3 is more of an IT framework applied by many description of state very vague indication 5 marked red need another round of interaction

10 Issues of Relevance sensors simulations crowd etc. PID, Metadata Rights Syntax, Types Semantics Relations FS, Cloud, DB Repository System virtual collection builder management, analytics, conversion provenance – reproducibility workflows, policies, deployment new collection new metadata temp store highly distributed in federations AAI/FIM

11 How do WGs/IGs fit? CITDD PROVBROK CERT BDA REP REPRO DMP DOM FIM PP

12  domain of registered digital objects (DO) incl. basic organization principles (data, code, knowledge) -> worldwide PID system (Handles/DOI)  domain of registered actors -> worldwide ID system (ORCID)  domain of trusted repositories for DOs -> worldwide Rep Registry  proper DFT/DSA/WDS compliant repository systems  accepted policy commons (proper organization support, self-documenting, tested/certified, etc.) -> policy component registry  policy/services -> service registry  authentication system -> various in place (ORCID just number)  authorization system -> authorization registry Components I

13  MD components/schemas -> metadata schema registry  data types /schemas/formats -> data type registry  semantic categories -> category registry  vocabularies -> vocabulary registry  what about complex ontologies (thesauri, ontologies, etc.)  what about mapping relations? Components II

14  MD components/schemas -> metadata schema registry  data types /schemas/formats -> data type registry  semantic categories -> category registry  vocabularies -> vocabulary registry  what about complex ontologies (thesauri, ontologies, etc.)  what about mapping relations? Components II much already out there but why does it cost months to federate and integrate data to make data interoperable... need to harmonize, raise trust & value... make it ready for machines

15  4 use cases (max 10 min) with the following goals  understand whether we get what we want to get (common components/services)  discuss whether we need to adapt the template  Zhu  Dieter  Sean  Giuseppe  Ed  discuss how to move on with use cases & analysis  discuss my first look on C/S (?)  update of existing and appearance on wiki (deadline)  deadline for first round (when, whom to motivate, ?)  virtual meeting for a discussion on analysis (when?)  at P6 (September) a first document with analysis What to do today

16 Did we forget something?

17 Data Practices I – Survey  ~120 Interviews/Interactions  2 Workshops with Leading Scientists (EU, US)  too much manual or via ad hoc scripts  too much in Legacy formats (no PID & MD)  there are lighthouse projects etc. but...  DM and DP not efficient and too expensive (Biologist for 75% of his time data manager)  federating data incl. logical information much too expensive  hardly usage of automated workflows and lack of reproducibility

18 Data Practices I – Survey  ~120 Interviews/Interactions  2 Workshops with Leading Scientists (EU, US)  too much manual or via ad hoc scripts  too much in Legacy formats (no PID & MD)  there are lighthouse projects etc. but...  DM and DP not efficient and too expensive (Biologist for 75% of his time data manager)  federating data incl. logical information much too expensive  hardly usage of automated workflows and lack of reproducibility is DI research only available for Power-Institutes pressure towards DI research is high, but only some departments are fit for the challenges Senior Researchers: can’t continue like this! need to move towards proper data organization and automated workflows is evident but changes now are risky: lack of trained experts, guidelines and support