Workshop on Brokering in Data Fabrics - community perspectives - Garching, 2016/11/22 Participants: Wouter Addink, Stefano Nativi, Abraham Nieva De La Hidalga, Francoise Pearlman, Jay Pearlman, Tobias Wiegel, Peter Wittenburg; Remote: Bridget Almas, Donald Hobern, Larry Lannom Communities involved: Biodiversity, Natural Museums, Environmental Science, Climate Modeling, Linguistics, Cultural Heritage, Geo-Sciences Hosted by Peter Wittenburg and Kathrin Beck
Workshop Agenda 1. Introductions 2. Needs for data fabric and interoperability 3. Technical status of fabric and brokering 4. Key gaps to be addressed 5. Addressing the gaps 6. Program approach and system level concept 7. Demonstration cases 8. Decision on steps forward 9. Funding possibilities 11/22/16
Intro to Data Fabrics Provided by Peter Wittenburg 11/22/16
Interoperability Challenge Credit: Data Fabric IG Envri+ use Case Provided by Stefano Nativi
Service Brokering Pattern Traditional SOA Advanced B-SOA (NxM connections) (N+M connections) Provided by Stefano Nativi
Intro to Broker Provided by Stefano Nativi 11/22/16
Elements of Fabric-Broker constructs PID - Point to and locate information (contains current state information) Types - Describes the characteristics of a class of objects – noting that objects can include services. Broker – alternative descriptions: 1. mediating data to make collections fit for a use case; 2. “brokers are structured, understood and auditable processes for working across heterogeneous data”; 3. “Actualizing the capability of resource utilization (in a broad sense)”. Collections of digital objects – aggregations that can be coupled to services for a specific purpose or goal (see next slide) 11/22/16
Use case for Museum collections (use Case 1) Donald, Dimitris, Wouter and team will provide a detailed description. Define criteria and rationale for new collection (defined by a query) that would draw from existing collections. Building an new (virtual) collection for a particular application(s) and outcome(s). Combine very different collections (two or more) leading to a new collection with common ontologies and geospatial distributions? The original collections are not changed. Combine cross-collection slices into a new collection and the benefits of the outcome Objects (virtual objects) in the collections would be looked at by the broker and mediated to enable use is a process(es) with outputs in a selected format – this would allow comparison or processing of multiple collections. There is a controller to orchestrate and monitor the entire process. Controller also maintains provenance, metadata, audit trails, attribution, etc through the processing of new collection. Controller addresses this by drawing on broker responding to the Controller Demonstration: PID Service, Type registries, collection service, mediation Impact of the data fabric construct (including broker) and capabilities. Time frame for initial use case description is a week. Provided by Donald Hobert 11/22/16
Use Case for Coupled Model Intercomparison Project - CMIP6 (use Case 2) Tobias and Stephan and team will provide a detailed description. Define criteria and rationale for new collection (defined by a query) that would draw from existing collections. Building an new (virtual) collection for a particular application(s) and outcome(s). Combine very different collections (two or more) leading to a new collection with common ontologies and geospatial distributions? The original collections are not changed. Combine cross-collection slices into a new collection and the benefits of the outcome Model data and observation data (Copernicus) would be looked at by the broker and mediated to allow outputs from a processes in a selected format – which would allow comparison or processing of multiple collections. There may be multiple processes. Process is a WPS (service) and there is a scientific payload – data and a model to give new scientific information There is a controller to orchestrate and monitor the entire process. Controller also maintains provenance, metadata, audit trails, attribution, etc through the processing of new collection. Controller addresses this by drawing on broker responding to the Controller The output collections consists of the output objects plus the provenance and audit trail. Demonstration: PID Service, Type registries, collection service, mediation Impact of the data fabric construct (including broker) and capabilities. Time frame for initial use case description is a week. 11/22/16
Possible Activities at RDA Define a discussion at the Barcelona Plenary (session proposal) Support a discussion sponsored by the Data Fabric and Broker Interest groups as a joint session Blueprint/patterns for Data Fabric with brokering Addressing also technical, organizational and business perspectives – changing the paradigm (like defining operation of clouds verses HPC) Open discussion for domains that might be interested in participating Working Group (WG) formation? What are the goals – maintain a dialogue? Then do we need a working group at this point? The WG could be a focal point to look at best practices from the two use cases and broader applications beyond the first two use cases. It would be of more value if the effort matures before formation of a WG. Wait to proceed Tobias Weigel (DKRZ) 11/22/16
Actions Donald, et al will address use case1 (Nov 30) Tobias, et al will address use case2 (Nov 30) Abraham will bring topic to ENVRI+ (Nov 29) Other communities need to spell out interests (linguistics, cultural heritage, etc.) Stefano will address broker structure options (see slide 4) (Nov 30) Jay and Bridget will look at joint session at RDA Barcelona Meeting (Dec 15) Next meeting (virtual) is January 20 at 16:00 CET 11/22/16
Resources for pilot demonstrations Additional steps Logistics Goggle docs Email distribution list Meeting information Resources for pilot demonstrations Wouter, Peter, Dimitris 11/22/16