Position Paper for Data Fabric IG Interoperability, Infrastructures and Virtuality https://rd-alliance.org/group/data-fabric-ig.html Gary Berg-Cross, Keith.

Slides:



Advertisements
Similar presentations
Presented to: By: Date: Federal Aviation Administration Registry/Repository in a SOA Environment SOA Brown Bag #5 SWIM Team March 9, 2011.
Advertisements

ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
High Performance Computing Course Notes Grid Computing.
Overview of OASIS SOA Reference Architecture Foundation (SOA-RAF)
OASIS Reference Model for Service Oriented Architecture 1.0
Software Engineering Module 1 -Components Teaching unit 3 – Advanced development Ernesto Damiani Free University of Bozen - Bolzano Lesson 2 – Components.
Community Manager A Dynamic Collaboration Solution on Heterogeneous Environment Hyeonsook Kim  2006 CUS. All rights reserved.
1 Metadata Data Foundation and Terminology RDA-P5 San Diego - Keith Jeffery.
Vivien Bonazzi Ph.D. Program Director: Computational Biology (NHGRI) Co Chair Software Methods & Systems (BD2K) Biomedical Big Data Initiative (BD2K)
LEVERAGING THE ENTERPRISE INFORMATION ENVIRONMENT Louise Edmonds Senior Manager Information Management ACT Health.
RDA Data Foundation and Terminology (DFT) IG: Introduction Prepared for RDA Plenary San Diego, March 9, 2015 Gary Berg-Cross, Raphael Ritz, Co-Chairs DFT.
Requirements for Epidemic Information Management Farrukh Najmi XML Standards Architect Sun Microsystems
Cardea Requirements, Authorization Model, Standards and Approach Globus World Security Workshop January 23, 2004 Rebekah Lepro Metz
Chapter 6 System Engineering - Computer-based system - System engineering process - “Business process” engineering - Product engineering (Source: Pressman,
DATA FOUNDATION TERMINOLOGY WG 4 th Plenary Update THE PLUM GOALS This model together with the derived terminology can be used Across communities and stakeholders.
An Answer to the EC Expert Group on CLOUD Computing Keith G Jeffery Scientific Coordinator.
Working Group: Practical Policy Rainer Stotzka, Reagan Moore.
USING METADATA TO FACILITATE UNDERSTANDING AND CERTIFICATION ABOUT THE PRESERVATION PROPERTIES OF A PRESERVATION SYSTEM Jewel H. Ward, Hao Xu, Mike C.
©Ian Sommerville 2000 Software Engineering, 6th edition. Slide 1 Component-based development l Building software from reusable components l Objectives.
Introduction to MDA (Model Driven Architecture) CYT.
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
Interfacing Registry Systems December 2000.
Data Fabric IG Introduction. 2  about 50 interviews & about 75 community interactions  Data Management and Processing is too time consuming and costly.
Linked-data and the Internet of Things Payam Barnaghi Centre for Communication Systems Research University of Surrey March 2012.
Delivering business value through Context Driven Content Management Karsten Fogh Ho-Lanng, CTO.
The roots of innovation Future and Emerging Technologies (FET) Future and Emerging Technologies (FET) The roots of innovation Proactive initiative on:
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
© 2012 xtUML.org Bill Chown – Mentor Graphics Model Driven Engineering.
JOINING UP GOVERNMENTS EUROPEAN COMMISSION Establishing a European Union Location Framework.
METADATA WORKSHOP Conclusions Keith Jeffery Peter Wittenburg.
1 4/23/2007 Introduction to Grid computing Sunil Avutu Graduate Student Dept.of Computer Science.
Illustrations and Answers for TDT4252 exam, June
1 Metadata Coordinating Chairs Meeting Gaithersburg November Keith Jeffery, Rebecca Koskela, Jane Greenberg, Alex Ball, Brigitte Jörg, Bridget Almas,
SEEK Welcome Malcolm Atkinson Director 12 th May 2004.
RDA Data Foundation and Terminology (DFT) WG: Overview  Prepared for Collab Chairs Meeting, NIST, Nov 13-14, 2014  Gary Berg-Cross, Raphael Ritz, Peter.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
What’s MPEG-21 ? (a short summary of available papers by OCCAMM)
Distribution and components. 2 What is the problem? Enterprise computing is Large scale & complex: It supports large scale and complex organisations Spanning.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Nature Reviews/2012. Next-Generation Sequencing (NGS): Data Generation NGS will generate more broadly applicable data for various novel functional assays.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
MODEL-BASED SOFTWARE ARCHITECTURES.  Models of software are used in an increasing number of projects to handle the complexity of application domains.
16/11/ Semantic Web Services Language Requirements Presenter: Emilia Cimpian
Towards a Reference Quality Model for Digital Libraries Maristella Agosti Nicola Ferro Edward A. Fox Marcos André Gonçalves Bárbara Lagoeiro Moreira.
Foundations of Information Systems in Business. System ® System  A system is an interrelated set of business procedures used within one business unit.
The Role of International Standards for National Statistical Offices Andrew Hancock Statistics New Zealand Prepared for 2013 Meeting of the UN Expert Group.
© Drexel University Software Engineering Research Group (SERG) 1 The OASIS SOA Reference Model Brian Mitchell.
System Development & Operations NSF DataNet site visit to MIT February 8, /8/20101NSF Site Visit to MIT DataSpace DataSpace.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
Discussion of Data Fabric Terms & Preparation for RDA P7 Virtual Meeting Monday, January 25, 2016 Organized by Gary Berg-Cross (DFT-IG) and Peter Wittenburg.
Data Foundation IG DF Organizing Chairs: Gary Berg-Cross & Peter Wittenburg.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Semantics and the EPA System of Registries Gail Hodge IIa/ Consultant to the U.S. Environmental Protection Agency 18 April 2007.
Cloud-based e-science drivers for ESAs Sentinel Collaborative Ground Segment Kostas Koumandaros Greek Research & Technology Network Open Science retreat.
1 The Metadata Groups - Keith G Jeffery. 2 Positioning  Raise profile of metadata  Data first  Also software, resources, users  Achieve outputs/outcomes.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EUDAT Aalto Data.
IPDA Architecture Project International Planetary Data Alliance IPDA Architecture Project Report.
Preservation e-Infrastructure IG Description: help ensure preservation of needed data succeeds Goals: foster worldwide collaboration; ensure consistency.
By Jeremy Burdette & Daniel Gottlieb. It is an architecture It is not a technology May not fit all businesses “Service” doesn’t mean Web Service It is.
IPDA Registry Definitions Project Dan Crichton Pedro Osuna Alain Sarkissian.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Intentions and Goals Comparison of core documents from DFIG and Publishing Workflow IG show that there is much overlap despite different starting points.
Enhancements to Galaxy for delivering on NIH Commons
RDA Data Fabric (DF) Interest Group Peter Wittenburg & Gary Berg-Cross
BoF: VREs- Keith G Jeffery & Helen Glaves
EOSC services architecture
WG/IG Collaboration Meeting June Göteborg METADATA GROUPS PERSPECTIVE Keith G Jeffery & Rebecca Koskela.
From Observational Data to Information (OD2I IG )
Presentation transcript:

Position Paper for Data Fabric IG Interoperability, Infrastructures and Virtuality Gary Berg-Cross, Keith Jeffery, Reagan Moore What principles & methods principles are needed to guide the interaction between services interface, protocol ? What Managed processes & Services? What Basic & Flexible Infrastructure Machinery ? See datafabric-ig-data-fabric-position-paper-broaden- discussion-berg-1 for discussion & attached file. datafabric-ig-data-fabric-position-paper-broaden- discussion-berg-1

2 1.Much of the current DF discussion focuses on a data management & lifecycle view.  Lacks a focus on other important topics  standards & federation mechanisms that are needed to assemble collaborations spanning institutions, data management environments. 2.Interoperability, needs to be a 1 st class concept in the DF conservation  it is fundamentally important for federation & overcoming data-silo generated problems. 3.There are multiple benefits for development of federated & virtualized mechanisms & mathematical descriptions to assist sharing DOs & (digital) knowledge procedures. Key Points

3 Initial ideas for DF IG- Implied Framework is Data Lifecycle New Groups emerge so Pubs are part of Data Fabric & related analysis View with a Data Management Focus that emerged from the discussions amongst various RDA WG chairs Where is interoperability in a Raw to Citable Data View ?

4 Data Fabric Analysis, e.g. components & services in LC via Use CASES How do we come to essential components & services? (guided by use scenarios that need to include collaboration) This scenario doesn’t show analysis or data sharing via DO manipulation! Sharing

5  Concept of Interoperability:  The extent to which systems and devices can routinely exchange data and services, and interpret that shared data through the shared services  A stronger type of data exchange can include knowledge of the meaning of the data content, usage constraints, and the underlying assumptions.  Bring interoperability (within and cross-domain) aspects into the DF discussions as a ‘first class citizen’ alongside all the other aspects of the research data lifecycle in a domain. Make Interoperability a First Class View

6  When an enterprise implements a data management solution, one of multiple types of DFs infrastructure is typically chosen:  Data management –enterprise to build a data repository, manage an information catalog, & enforce management & curation policies (but also)  Data analysis –enterprise to process a data collection, apply analysis & visualization tools, and automate a processing pipeline. (but also)  Data preservation –enterprise to build reference collections and knowledge bases that comprise their intellectual capital, while managing technology evolution  Data publication –enterprise to provide descriptive information and arrangement for discovery and access of data collections.  Data sharing – controlled sharing of a data collection, shared analysis workflows, and information catalogs - interoperability. Multiple types of DF infrastructure

7 Interoperability mechanisms required for sharing data, information, & knowledge. Composition - how the separate components, developed separately, can be made to work together. Minimal set of infrastructure mechanisms & service requirements Gaps, obstacles and possible incompatibilities Different suites of components will have different data fabrics. Enable reproducible research Brokers

8  EUDAT & the DataNet Federation Consortium use cases provide some view to help:  Interoperability mechanisms for sharing DOs & (digital) knowledge procedures  An implication is that researcher can re-execute trusted procedures to obtain identical results, making reproducible data- driven research possible  Community driven research collaborations  Seismology – share seismic data, tsunami prediction workflows between research groups  Climate change – share oceanography environmental data, coastal storm surge analyses, hydrology flood analyses, satellite environmental data  Genomics – build a cohort of genomes, predictive models for humans, plants, animals, diseases Data Sharing Use Cases

9 1.Shared name spaces for users, files, and services. 1. Besides a single sign-on, shared name space providing users federated services we want to afford service for virtual collections that span administrative domains. 2. And a shared name space for services enables re-use of procedures across researcher resources. 2.Shared services for manipulating digital objects. 1.Such as shared service through a broker, accessing the service through its access protocol or an encapsulated service in a virtual machine environment, for movement to the local research resources for execution. 3.Third-party (service) access. 1.Posting requests to a 3 rd party, such as a message queue, and eliminate direct communication between the federated system components. Expanded Ideas of Federation: 3 versions of federated systems

10  Virtual machines, such as in a CLOUD or GRID environment,  Required to manage dynamic resource allocation, scalability, distributed parallelism, energy efficiency and other aspects.  Virtual collections  Required to build research collaboration environments  We need the appropriate level of abstraction for optimum computing environment/middleware behavior.  Too low or prescriptive a level constrains the environment,  too high or abstract a level does not indicate clearly the requirement of the user.  See Triple-I Computing as a concept (Information-Intention-Incentive model proposed by [Schubert and Jeffery, 2014]) and already research projects are addressing the challenges therein. Enhanced Use of Virtualization

11  We have made a useful start but the DF vision needs to be expanded (also focused for maximum benefit) to  more than a domain of registered DO stored in well-managed repositories  Frame DF & its services broadly as data use & applications taking into account the available context or environment.  This doesn’t minimize good data management practices and services which are necessary and deserve support, but are not sufficient to address the challenge for interoperability.  For enhanced, semi-automated interoperability we need to consider:  Improved metadata for data in context with enhanced semantics  Leveraging the emergence of a mathematical foundation for federation of data management systems (e.g. work by Hao Xu). Analysis and Preliminary Conclusions

12  Including datasets, SW services, resources (computers, detectors…), users  Composed as workflows documented mathematically and ideally created autonomically  Achieved through metadata describing the elements of bullet 1  Discovery  Contextualisation (relevance, quality, … through relations to organisations, persons, projects, publications etc. and provenance, rights)  Detailed application-specific (to connect software to data at a resource for a user) i.e. schema-level  The key technologies to achieve interoperability as recognised by researchers are:  AAAI  PID  Metadata with formal syntax and declared semantics So…. DF, VRE must be integrated