1 This slide indicated the continuous cycle of creating raw data or derived data based on collections of existing data. Identify components that could.

Slides:



Advertisements
Similar presentations
Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint N.P. Drakenberg, H. Höck, M. Lautenschlager, H. Luthardt,
Advertisements

CMIP5: Overview of the Coupled Model Intercomparison Project Phase 5
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
PAWN: Producer-Archive Workflow Network University of Maryland Institute for Advanced Computer Studies Joseph JaJa, Mike Smorul, Mike McGann.
CLIMATE SCIENTISTS’ BIG CHALLENGE: REPRODUCIBILITY USING BIG DATA Kyo Lee, Chris Mattmann, and RCMES team Jet Propulsion Laboratory (JPL), Caltech.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall.
M. Stockhause et al. Martina Stockhause, Michael Lautenschlager, Frank Toussaint Deutsches Klimarechenzentrum (DKRZ) World Data Centre for Climate (WDCC)
Z EGU Integration of external metadata into the Earth System Grid Federation (ESGF) K. Berger 1, G. Levavasseur 2, M. Stockhause 1, and M. Lautenschlager.
DATA FOUNDATION TERMINOLOGY WG 4 th Plenary Update THE PLUM GOALS This model together with the derived terminology can be used Across communities and stakeholders.
Climate Sciences: Use Case and Vision Summary Philip Kershaw CEDA, RAL Space, STFC.
RDA Data Foundation and Terminology (DFT) IG: Introduction Prepared for RDA 6 th Plenary Paris, Sept. 25, 2015 Gary Berg-Cross, Raphael Ritz Co-Chairs.
Service Computation 2010November 21-26, Lisbon.
Data Publication and Quality Control Procedure for CMIP5 / IPCC-AR5 Data WDC Climate / DKRZ:
Geosciences - Observations (Bob Wilhelmson) The geosciences in NSF’s world consists of atmospheric science, ocean science, and earth science Many of the.
Adoption of RDA-DFT Terminology and Data Model to the Description and Structuring of Atmospheric Data Aaron Addison, Rudolf Husar, Cynthia Hudson-Vitale.
May 6, 2002Earth System Grid - Williams The Earth System Grid Presented by Dean N. Williams PI’s: Ian Foster (ANL); Don Middleton (NCAR); and Dean Williams.
Data formats and requirements in CMIP6: the climate-prediction case Pierre-Antoine Bretonnière EC-Earth meeting, Reading, May 2015.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No The pan-European.
1 Accomplishments. 2 Overview of Accomplishments  Sustaining the Production Earth System Grid Serving the current needs of the climate modeling community.
1 Overall Architectural Design of the Earth System Grid.
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Persistent Identifiers (PIDs) & Digital Objects (DOs) Christine Staiger & Robert Verkerk SURFsara.
The Modeling Circle Courtesy M. Lautenschlager, DKRZ.
4 th WCRP Observations and Assimilation Panel Meeting Hamburg, Germany, March 29-31, Workshop on Ensuring Access and Trustworthiness of Climate.
Data Foundation IG DF Organizing Chairs: Gary Berg-Cross & Peter Wittenburg.
Data Citation Implementation Pilot Workshop
Adoption of RDA-DFT Terminology and Data Model to the Description and Structuring of Atmospheric Data Aaron Addison, Rudolf Husar, Cynthia Hudson-Vitale.
Jost von Hardenberg ISAC-CNR, Torino, Italy with Paolo Davini, Susanna Corti, and many others EUDAT User Forum, Rome,Italy 3-4 February, 2016.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EUDAT Aalto Data.
Data Fabric IG From Testing to Recommendations Beth Plale.
Weigel, Berger, Kindermann, Lautenschlager EGU Versioning for CMIP6 in the Earth System Grid Federation Data preparation Initial registration.
Bringing visibility to food security data results: harvests of PRAGMA and RDA Quan (Gabriel) Zhou, Venice Juanillas Ramil Mauleon, Jason Haga, Inna Kouper,
Intentions and Goals Comparison of core documents from DFIG and Publishing Workflow IG show that there is much overlap despite different starting points.
Accessing the VI-SEEM infrastructure
RDA 9th Plenary Breakout 3, 5 April :00-17:30
Approaches and Challenges in Managing Persistent Identifiers
AP7/AP8: Long-Term Archival of CMIP6 Data
World Conference on Climate Change October 24-26, 2016 Valencia, Spain
EUDAT’s engagement with the Earth Sciences
RDA US Science workshop Arlington VA, Aug 2014 Cees de Laat with many slides from Ed Seidel/Rob Pennington.
RDA Data Fabric (DF) Interest Group Peter Wittenburg & Gary Berg-Cross
EOSC MODEL Pasquale Pagano CNR - ISTI
Data Citation Service for CMIP6 and IPCC DDC Aspects
An Overview of Data-PASS Shared Catalog
Data Ingestion in ENES and collaboration with RDA
Fernando Aguilar, IFCA-CSIC
Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox,
Data Fabric Interest Group Plenary 9 Core Session Barcelona
Maggie, Carlo, Peter, Rebecca (GEDE discussions)
Agenda Welcome and overview (Peter)
C2CAMP (A Working Title)
Climate Data Analytics in a Big Data world
DATA SPHINX & EUDAT Collaboration
CMIP6 / ENES Data TF Meeting: DKRZ
New input for CEOS Persistent Identifier Best Practices
Agenda welcome and goals (Peter)
Brief WG/IG reporting Tobias Weigel on behalf of co-chairs
Using the RDA Collections API to Shape Humanities Data
Data types and persistent identifiers in
Agenda (AM) 9:30-10:15 Introduction to RDA
CMIP6 use case and adoption of RDA outputs
Joint DFIG – Broker Meeting The DFIG view Peter Wittenburg
EUDAT Site and Service Registry
Bird of Feather Session
School of Information Studies, Syracuse University, Syracuse, NY, USA
RDA uptake activities and plans: ESGF
Digital Object Management for ENES: Challenges and Opportunities
Leveraging PIDs for object management in data infrastructures RDA UK Node Workshop, July Tobias Weigel (DKRZ)
Presentation transcript:

1 This slide indicated the continuous cycle of creating raw data or derived data based on collections of existing data. Identify components that could improve (stepwise). Observations Experiments Simulations etc. Data Fabric Cycle

2 Dimensions of Testing: getting to Data Fabrics RDA produces RDA Recommendations (i.e., outputs) Some technically-oriented RDA Recommendations have reference software with it, these are starting point Many technically oriented RDA recommendations do not have reference software, yet are machine actionable (a schema for instance). These also are of immediate interest to data fabric composition. Other recommendations play important but background roles in early composition RDA recommendation: Purely human consumption and action RDA recommend- ation: Reference software Machine actionable reference Human consumption needed before being actionable reference

3 Compositions of components Core Components & Services Specific Components & Services Composition (or Fabric) A Composition B

4 Compositions of components Composition (Fabric) B Given nature of data (can’t do much without understanding it), successful data fabric will likely: 1.Run on possibly distributed e-infrastructure (EUDAT, NDS, …) 2.Serve scholarly domain as domain infrastructure 3.Support multiple projects within that domain 4.And eventually result in cross-domain research For 3 and 4 to be realized, composition of 2 must be shared across projects

5 Testbed* (US) to facilitate community evaluation of DTR and PIT service, and to test complex types * Funding proposal under review

6  WCRP: World Climate Research Program  CMIP6: Coupled Model Intercomparison Project, phase 6  Ultimately inform IPCC assessment reports  Big community effort: Earth system simulations, run internationally at multiple sites  Estimated ca. 50 PB data volume over next 5 years  PID for every file – 250 Mio. Handles WCRP CMIP6 data scenario

7  Requirement: Put a Handle in every file header, but not allowed to change files after production phase The publishing process is elaborate and takes a lot of real time Phase Production Phase Project / Community Phase Bibliometric Phase M1M4 MD Check M3 Step LTA DOI CMOR CMIP ESGF publication M2 IPCC DDC archive  project storage D3D2D1 Data Check ESGF replication

8  Requirement: Put a Handle in every file header, but not allowed to change files after production phase  tracking_id = hdl: /  Lot of time spent on agreements that ensure sanity of PID record  Each object gets a PID and no object outside our control with embedded PID  PID not citable – required metadata not ready  Still: some file headers are extracted and put in the PID record  PIDs are a new development – Handle registration not allowed to interrupt publication process More details on the registration workflow

9 Make it scalable: Deal with burst events, provide high availability

10  Yes. Does the idea of common components work in this case?

11  Object registration  Object management (copy, move, delete)  Collection builder  Classifier (DTR)  register specific community PID profile  register file data types (netcdf, variables)  Object information (end-user)  Landing page – link to further information, enable browsing  Information tool  External addendum (processing; feedback)  Policy provider and enforcer (preliminary objects, LTA) What are the common components in this case?

12 Address Every Object on the Net Two documents in prep: Report from Workshop, Digital Object Cloud

13 End users, developers, and automated processes The DO Cloud ID: 123… A ID: 987/… F ID: 843… G deal with persistently identified,consistently structured digital objects which are securely & redundantly managed & stored in the Internet Identifier Service Repository Identifier Service which is an overlay on existing or future infor- mation storage systems.

14 From abstract fabrics to solutions Common Components & Services Specific Components & Services t-repositories PID system MD schemas MD editors vocabularies etc. Global Digital Object Cloud Closing urgent gaps