Presentation is loading. Please wait.

Presentation is loading. Please wait.

Position Paper for Data Fabric IG Interoperability, Infrastructures and Virtuality https://rd-alliance.org/group/data-fabric-ig.html Gary Berg-Cross, Keith.

Similar presentations


Presentation on theme: "Position Paper for Data Fabric IG Interoperability, Infrastructures and Virtuality https://rd-alliance.org/group/data-fabric-ig.html Gary Berg-Cross, Keith."— Presentation transcript:

1 Position Paper for Data Fabric IG Interoperability, Infrastructures and Virtuality https://rd-alliance.org/group/data-fabric-ig.html Gary Berg-Cross, Keith Jeffery, Reagan Moore What principles & methods principles are needed to guide the interaction between services interface, protocol ? What Managed processes & Services? What Basic & Flexible Infrastructure Machinery ? See https://rd-alliance.org/group/data-fabric-ig/post/re-rda- datafabric-ig-data-fabric-position-paper-broaden- discussion-berg-1 for discussion & attached file. https://rd-alliance.org/group/data-fabric-ig/post/re-rda- datafabric-ig-data-fabric-position-paper-broaden- discussion-berg-1

2 2 1.Much of the current DF discussion focuses on a data management & lifecycle view.  Lacks a focus on other important topics  standards & federation mechanisms that are needed to assemble collaborations spanning institutions, data management environments. 2.Interoperability, needs to be a 1 st class concept in the DF conservation  it is fundamentally important for federation & overcoming data-silo generated problems. 3.There are multiple benefits for development of federated & virtualized mechanisms & mathematical descriptions to assist sharing DOs & (digital) knowledge procedures. Key Points

3 3 Initial ideas for DF IG- Implied Framework is Data Lifecycle New Groups emerge so Pubs are part of Data Fabric & related analysis View with a Data Management Focus that emerged from the discussions amongst various RDA WG chairs Where is interoperability in a Raw to Citable Data View ?

4 4 Data Fabric Analysis, e.g. components & services in LC via Use CASES How do we come to essential components & services? (guided by use scenarios that need to include collaboration) This scenario doesn’t show analysis or data sharing via DO manipulation! Sharing

5 5  Concept of Interoperability:  The extent to which systems and devices can routinely exchange data and services, and interpret that shared data through the shared services  A stronger type of data exchange can include knowledge of the meaning of the data content, usage constraints, and the underlying assumptions.  Bring interoperability (within and cross-domain) aspects into the DF discussions as a ‘first class citizen’ alongside all the other aspects of the research data lifecycle in a domain. Make Interoperability a First Class View

6 6  When an enterprise implements a data management solution, one of multiple types of DFs infrastructure is typically chosen:  Data management –enterprise to build a data repository, manage an information catalog, & enforce management & curation policies (but also)  Data analysis –enterprise to process a data collection, apply analysis & visualization tools, and automate a processing pipeline. (but also)  Data preservation –enterprise to build reference collections and knowledge bases that comprise their intellectual capital, while managing technology evolution  Data publication –enterprise to provide descriptive information and arrangement for discovery and access of data collections.  Data sharing – controlled sharing of a data collection, shared analysis workflows, and information catalogs - interoperability. Multiple types of DF infrastructure

7 7 Interoperability mechanisms required for sharing data, information, & knowledge. Composition - how the separate components, developed separately, can be made to work together. Minimal set of infrastructure mechanisms & service requirements Gaps, obstacles and possible incompatibilities Different suites of components will have different data fabrics. Enable reproducible research Brokers

8 8  EUDAT & the DataNet Federation Consortium use cases provide some view to help:  Interoperability mechanisms for sharing DOs & (digital) knowledge procedures  An implication is that researcher can re-execute trusted procedures to obtain identical results, making reproducible data- driven research possible  Community driven research collaborations  Seismology – share seismic data, tsunami prediction workflows between research groups  Climate change – share oceanography environmental data, coastal storm surge analyses, hydrology flood analyses, satellite environmental data  Genomics – build a cohort of genomes, predictive models for humans, plants, animals, diseases Data Sharing Use Cases

9 9 1.Shared name spaces for users, files, and services. 1. Besides a single sign-on, shared name space providing users federated services we want to afford service for virtual collections that span administrative domains. 2. And a shared name space for services enables re-use of procedures across researcher resources. 2.Shared services for manipulating digital objects. 1.Such as shared service through a broker, accessing the service through its access protocol or an encapsulated service in a virtual machine environment, for movement to the local research resources for execution. 3.Third-party (service) access. 1.Posting requests to a 3 rd party, such as a message queue, and eliminate direct communication between the federated system components. Expanded Ideas of Federation: 3 versions of federated systems

10 10  Virtual machines, such as in a CLOUD or GRID environment,  Required to manage dynamic resource allocation, scalability, distributed parallelism, energy efficiency and other aspects.  Virtual collections  Required to build research collaboration environments  We need the appropriate level of abstraction for optimum computing environment/middleware behavior.  Too low or prescriptive a level constrains the environment,  too high or abstract a level does not indicate clearly the requirement of the user.  See Triple-I Computing as a concept (Information-Intention-Incentive model proposed by [Schubert and Jeffery, 2014]) and already research projects are addressing the challenges therein. Enhanced Use of Virtualization

11 11  We have made a useful start but the DF vision needs to be expanded (also focused for maximum benefit) to  more than a domain of registered DO stored in well-managed repositories  Frame DF & its services broadly as data use & applications taking into account the available context or environment.  This doesn’t minimize good data management practices and services which are necessary and deserve support, but are not sufficient to address the challenge for interoperability.  For enhanced, semi-automated interoperability we need to consider:  Improved metadata for data in context with enhanced semantics  Leveraging the emergence of a mathematical foundation for federation of data management systems (e.g. work by Hao Xu). Analysis and Preliminary Conclusions

12 12  Including datasets, SW services, resources (computers, detectors…), users  Composed as workflows documented mathematically and ideally created autonomically  Achieved through metadata describing the elements of bullet 1  Discovery  Contextualisation (relevance, quality, … through relations to organisations, persons, projects, publications etc. and provenance, rights)  Detailed application-specific (to connect software to data at a resource for a user) i.e. schema-level  The key technologies to achieve interoperability as recognised by researchers are:  AAAI  PID  Metadata with formal syntax and declared semantics So…. DF, VRE must be integrated


Download ppt "Position Paper for Data Fabric IG Interoperability, Infrastructures and Virtuality https://rd-alliance.org/group/data-fabric-ig.html Gary Berg-Cross, Keith."

Similar presentations


Ads by Google