CMIP6 / ENES Data TF Meeting: DKRZ

Slides:



Advertisements
Similar presentations
NGAS – The Next Generation Archive System Jens Knudstrup NGAS The Next Generation Archive System.
Advertisements

Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint N.P. Drakenberg, H. Höck, M. Lautenschlager, H. Luthardt,
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Brief Overview of Major Enhancements to PAWN. Producer – Archive Workflow Network (PAWN) Distributed and secure ingestion of digital objects into the.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
What is it? Hierarchical storage software developed in collaboration with five US department of Energy Labs since 1992 Allows storage management of 100s.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall.
M. Stockhause et al. Martina Stockhause, Michael Lautenschlager, Frank Toussaint Deutsches Klimarechenzentrum (DKRZ) World Data Centre for Climate (WDCC)
Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint N.P. Drakenberg, H. Höck, S. Kindermann, M. Lautenschlager,
Platform as a Service (PaaS)
Z EGU Integration of external metadata into the Earth System Grid Federation (ESGF) K. Berger 1, G. Levavasseur 2, M. Stockhause 1, and M. Lautenschlager.
1 1 Hybrid Cloud Solutions (Private with Public Burst) Accelerate and Orchestrate Enterprise Applications.
Evolution to CIMI Charles (Cal) Loomis & Mohammed Airaj LAL, Univ. Paris-Sud, CNRS/IN2P3 29 August 2013.
DATA FOUNDATION TERMINOLOGY WG 4 th Plenary Update THE PLUM GOALS This model together with the derived terminology can be used Across communities and stakeholders.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
Data Publication and Quality Control Procedure for CMIP5 / IPCC-AR5 Data WDC Climate / DKRZ:
Technical Update 2008 Sandy Payette, Executive Director Eddie Shin, Senior Developer April 3, 2008 Open Repositories 2008, Fedora User Group.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No The pan-European.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
System/SDWG Update Management Council Face-to-Face Flagstaff, AZ August 22-23, 2011 Sean Hardman.
XROOTD AND FEDERATED STORAGE MONITORING CURRENT STATUS AND ISSUES A.Petrosyan, D.Oleynik, J.Andreeva Creating federated data stores for the LHC CC-IN2P3,
M. Stockhause 1, G. Levavasseur 2, K. Berger 1 1 Deutsches Klimarechenzentrum (DKRZ) 2 Institute Pierre Simon Laplace (IPSL) ESGF-QCWT Quality Control.
DSpace System Architecture 11 July 2002 DSpace System Architecture.
Yaroslav Pentsarskyy Involved in SharePoint since 2003 SharePoint MVP (2009- Present) Blog: sharemuch.com.
Globus and ESGF Rachana Ananthakrishnan University of Chicago
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Data Preservation.
CAS2K11 in Annecy, France September 11 – 14, 2011 Data Infrastructures at DKRZ Michael Lautenschlager.
Cloud Computing from a Developer’s Perspective Shlomo Swidler CTO & Founder mydrifts.com 25 January 2009.
Simulation Production System Science Advisory Committee Meeting UW-Madison March 1 st -2 nd 2007 Juan Carlos Díaz Vélez.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
Get Data to Computation eudat.eu/b2stage B2STAGE How to shift large amounts of data Version 4 February 2016 This work is licensed under the.
DIRAC for Grid and Cloud Dr. Víctor Méndez Muñoz (for DIRAC Project) LHCb Tier 1 Liaison at PIC EGI User Community Board, October 31st, 2013.
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
Weigel, Berger, Kindermann, Lautenschlager EGU Versioning for CMIP6 in the Earth System Grid Federation Data preparation Initial registration.
XSEDE GLUE2 Update 1. Current XSEDE Usage Using legacy TeraGrid information services Publishing compute information about clusters – Subset of XSEDE clusters.
Docker for Ops: Operationalize Your Apps in Production Vivek Saraswat Sr. Product Evan Hazlett Sr. Software
Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,
1 This slide indicated the continuous cycle of creating raw data or derived data based on collections of existing data. Identify components that could.
Intentions and Goals Comparison of core documents from DFIG and Publishing Workflow IG show that there is much overlap despite different starting points.
INFSO-RI Enabling Grids for E-sciencE ESR Database Access K. Ronneberger,DKRZ, Germany H. Schwichtenberg, SCAI, Germany S. Kindermann,
Platform as a Service (PaaS)
CernVM-FS vs Dataset Sharing
Organizations Are Embracing New Opportunities
Platform as a Service (PaaS)
Simulation Production System
Approaches and Challenges in Managing Persistent Identifiers
AP7/AP8: Long-Term Archival of CMIP6 Data
Platform as a Service (PaaS)
Data Citation Service for CMIP6 and IPCC DDC Aspects
Information Collection and Presentation Enriched by Remote Sensor Data
Data Ingestion in ENES and collaboration with RDA
Software infrastructure for a National Research Platform
Joseph JaJa, Mike Smorul, and Sangchul Song
Research Data Collections WG Plenary 9 Barcelona
IP Publishing From IP Data Base to IP list to IP catalog
Research Data Archive - technology
GGF15 – Grids and Network Virtualization
Future Data Architecture Cloud Hosting at USGS
Climate Data Analytics in a Big Data world
A Web-Based Data Grid Chip Watson, Ian Bird, Jie Chen,
Hydrographic Data as a Service
Cloud Web Filtering Platform
Technical Capabilities
FDA-08 FDA Whitepaper Update
Enterprise Integration
Bird of Feather Session
RDA uptake activities and plans: ESGF
Digital Object Management for ENES: Challenges and Opportunities
Leveraging PIDs for object management in data infrastructures RDA UK Node Workshop, July Tobias Weigel (DKRZ)
Presentation transcript:

CMIP6 / ENES Data TF Meeting: DKRZ DKRZ RM-ODP Views Status Developments: CMIP6 PID Integration CMIP6+ Data Ingest QA Collection of open issues Large Scale Data Project meet RDA 5th Plenary Session RDA adoption benefits from perspective of a national service provider, embedded into a set of international climate data management e-infrastructure projects DKRZ  broader context (ESGF, EUDAT, IS-ENES, WDCs)

DKRZ RM-ODP views Enterprise view Computational view Engineering view purpose, scope, policy, ..  strategic view Computational view Functional decomposition (objects, interfaces,..)  application designer view Engineering view Communication and resource control Information view Technology view Overview of DKRZ: some info, history, compute and storage facilities, stakeholders WDCC and international climate data management infrastructures: IS-ENES, ESGF, EUDAT, .. RDA adoption plans at DKRZ: Early pid assignment (data ingest time) in life cylce Back-linking of (late coarse granular) data citation to individual data entities 17.11.2018

DKRZ computational viewpoint DKRZ HPC Cluster Visuali- sation VM cluster (XEN) Service Nodes Interactive Nodes Data Nodes (Test) Index Nodes (Test) WDCC Data Node ESGF Data Nodes DTNs Index Node Processing Nodes Project VMs PID VMs WDCC Portal Fuse (Batch) Compute Nodes WDCC FUSE Oracle DB NFS Lustre I/O NFS WDCC / CERA LTA Lustre File System WDCC / CERA Cache National CMIP Pool HPSS Tape Backend DKRZ openstack cloud WDCC / CERA Long Term Archive Data 17.11.2018

DKRZ enterprise viewpoint (data centric) stakeholder MPI-M Uni HH Resource usage (storage / compute) Helmholtz Association AWI Geesthacht „system funders“ Data and compute resources DKRZ sys admin group BMBF (current and prior systems – not next generations) Data Production Data Postprocessing „Modelers“ Data Analysis Data Ingest / Replication „Climate data Researchers“ Data Management External data managers, data federations, .. DKRZ data managers Data Distribution „Climate data federations“ DKRZ data sys admins QA + Long Term Archival Data service development / depoyment / operation National Initiatives / Projects / .. (CMIP6, Reklies,...) International Initiatives / Projects / .. (ESGF, EUDAT, ENVRI+,..) 17.11.2018

CMIP6 PID integration requirements CMIP6: single handle prefix  Central PID registry PID registry / Handle server handle v8 REST API Need to cope with CMIP6 load (millions of registrations, parallel bursts from multiple clients)  Highly scalable PID registration process !? Minimal impact on standard esgf publication process  Non blocking, asynchronous handle registration necessary ESGF data nodes ESGF publisher Esgf-PID client Easy installation as part of ESGF data node  ESGF PID client package as part of ESGF publisher 17.11.2018

ESGF PID curation tools CMIP6 PID integration PID registry / Handle server Hosted at DKRZ Tests: ~ 130 PID reg. requests per second on VM (5 Mio requ.) Optimization potential with physical machine + SSD: > 1000 requ. per second reported by CNRI handle v8 rest interface pyhandle client ESGF PID consumer Rabbit exchange Highly scalable, persistent message queuing infrastructure Guaranteed delivery of ESGF PID reg. requests Fail-over endpoints for ESGF PID client Hosted at DKRZ + one other ENES partner (IPSL ?) DKRZ exchange RZG exchange Rabbitmq message interface (amqp) ESGF pyhandle client esgf-pid client ESGF PID curation tools ESGF data nodes ESGF publisher esgf-pid client ESGF PID part of CMIP6 publisher (PyPi package) Configuration via CMIP6 ini 17.11.2018

CMIP6 citation info + DOI landing page CMIP5 example: http://esgf-data.dkrz.de/search/esgf-dkrz/?project=CMIP5&model=MPI-ESM-P&experiment=1pctCO2&time_frequency=fx&realm=ocean 17.11.2018

CMIP6 PID landing page landing page service at DKRZ Obs4Mips example: http://esgf-data.dkrz.de/search/esgf-dkrz/?project=obs4MIPs&institute=FUB-DWD&source_id=SSMI-MERIS&variable=prw&time_frequency=mon landing page service at DKRZ CMIP6 prototype based on django python web framework 17.11.2018

PID system clients pyhandle client: python client to interact with handle system v8 API Generalization of b2handle client (https://github.com/EUDAT-B2SAFE/B2HANDLE) EUDAT collaboration – used in RDA PID Trainingsworkshop at DKRZ May 2016 esgf-pid-client python client to asychronously interact with the esgf-pid message queuing system (rabbitmq) 17.11.2018

DKRZ data ingest and national CMIP6+ data pool Tickting System Data provider Submission form --- Form validation CMIP6+ Data pool Data ingest / removal json repo /db Data quality assurance CMIP6+ Data pool ingest requestor Data publication / registration Request form --- Data archival Status: Core system in place (repo, form system, different APIs) Next: stepwise integrate in operational workflow – parallel development of missing integration pieces Complete data ingest provenance available for data in CMIP6+ data pool  (W3C PROV representation) 17.11.2018

Open Issues Collection IS-ENES User Support for CMIP6 (general + scientific) CDNOT related: CA signing also in USA, ..? Update and Organization of COG project pages CMIP5,CORDEX  -- do better for CMIP6 !? How ? Data ingest for IS-ENES how to ? By now site specific intransparent procedures .. Who publishes to whom, who takes data from whom, .. Who replicates what .. To be filled ..  17.11.2018