CAS2K11 in Annecy, France September 11 – 14, 2011 Data Infrastructures at DKRZ Michael Lautenschlager.

Slides:



Advertisements
Similar presentations
Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.
Advertisements

DSpace: the MIT Libraries Institutional Repository MacKenzie Smith, MIT EDUCAUSE 2003, November 5 th Copyright MacKenzie Smith, This work is the.
Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint N.P. Drakenberg, H. Höck, M. Lautenschlager, H. Luthardt,
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Background Chronopolis Goals Data Grid supporting a Long-term Preservation Service Data Migration Data Migration to next generation technologies Trust.
Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Data.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
M. Stockhause et al. Martina Stockhause, Michael Lautenschlager, Frank Toussaint Deutsches Klimarechenzentrum (DKRZ) World Data Centre for Climate (WDCC)
German Cluster of WDCs for Earth System Research - Entwurf - Michael Lautenschlager 1, Michael Diepenbroek 2, Hannes Grobe 2, Michael Bittner 3, Jens Klump.
Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint N.P. Drakenberg, H. Höck, S. Kindermann, M. Lautenschlager,
M.Lautenschlager (WDCC / MPI-M) / / 1 GO-ESSP at LLNL Livermore, June 19th – 21st, 2006 World Data Center Climate: Status and Portal Integration.
Data-PASS Shared Catalog Micah Altman & Jonathan Crabtree 1 Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate.
INFSO-RI Enabling Grids for E-sciencE Intelligent Distributed Data Management in Earth system science K. Ronneberger, DKRZ, Germany.
Scientific Data Infrastructure in CAS Dr. Jianhui Scientific Data Center Computer Network Information Center Chinese Academy of Sciences.
Z EGU Integration of external metadata into the Earth System Grid Federation (ESGF) K. Berger 1, G. Levavasseur 2, M. Stockhause 1, and M. Lautenschlager.
SDSC RP Update TeraGrid Roundtable Reviewing Dash Unique characteristics: –A pre-production/evaluation “data-intensive” supercomputer based.
CIM – The Common Information Model in Climate Research
Presented by The Earth System Grid: Turning Climate Datasets into Community Resources David E. Bernholdt, ORNL on behalf of the Earth System Grid team.
F. Toussaint (WDCC, Hamburg) / / 1 CERA : Data Structure and User Interface Frank Toussaint Michael Lautenschlager World Data Center for Climate.
Michael Lautenschlager World Data Center Climate Model and Data / Max-Planck-Institute for Meteorology German Climate Computing Centre (DKRZ)
M.Lautenschlager (WDCC, Hamburg) / / 1 Semantic Data Management for Organising Terabyte Data Archives Michael Lautenschlager World Data Center.
Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Wolfgang.
M.Lautenschlager (WDCC, Hamburg) / / 1 Training-Workshop Facilities and Sevices for Earth System Modelling Integrated Model and Data Infrastructure.
Data Publication and Quality Control Procedure for CMIP5 / IPCC-AR5 Data WDC Climate / DKRZ:
Recent Developments in CLARIN-NL Jan Odijk P11 LREC, Istanbul, May 23,
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Processing services.
Michael Lautenschlager, Hannes Thiemann, Frank Toussaint WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Joachim Biercamp, Ulf Garternicht,
IPCC TGICA and IPCC DDC for AR5 Data GO-ESSP Meeting, Seattle, Michael Lautenschlager World Data Center Climate Model and Data / Max-Planck-Institute.
The Repository of the World Data Centre for Climate Frank Toussaint, Michael Lautenschlager Max-Planck-Institut für Meteorologie Repositories in Research.
WP6/SA2: Access to IS-ENES Data Federation SA2 is a European distributed data infrastructure providing access to data from ESM simulations produced in.
29 March 2004 Steven Worley, NSF/NCAR/SCD 1 Research Data Stewardship and Access Steven Worley, CISL/SCD Cyberinfrastructure meeting with Priscilla Nelson.
1 Accomplishments. 2 Overview of Accomplishments  Sustaining the Production Earth System Grid Serving the current needs of the climate modeling community.
1 Overall Architectural Design of the Earth System Grid.
Earth System Curator and Model Metadata Discovery and Display for CMIP5 Sylvia Murphy and Cecelia Deluca (NOAA/CIRES) Hannah Wilcox (NCAR/CISL) Metafor.
1 Summary. 2 ESG-CET Purpose and Objectives Purpose  Provide climate researchers worldwide with access to data, information, models, analysis tools,
Evolving Scientific Data Workflow CAS 2011 Pamela Gillman
NCAR RP Update Rich Loft NCAR RPPI May 7, NCAR Teragrid RP Developments Current Cyberinfrastructure –5.7 TFlops/2048 core Blue Gene/L system –100.
IPCC WG II + III Requirements for AR5 Data Management GO-ESSP Meeting, Paris, Michael Lautenschlager, Hans Luthardt World Data Center Climate.
Hannes Thiemann Michael Lautenschlager Deutsches Klimarechenzentrum GmbH, Germany EGU 2010.
Support to scientific research on seasonal-to-decadal climate and air quality modelling Pierre-Antoine Bretonnière Francesco Benincasa IC3-BSC - Spain.
M. Lautenschlager (M&D/MPIM)1 WDC on Climate as Part of the CERA 1 Database System Michael Lautenschlager Modelle und Daten Max-Planck-Institut.
NASA Earth Exchange (NEX) A collaborative supercomputing environment for global change science Earth Science Division/NASA Advanced Supercomputing (NAS)
HPC Trends and Challenges in Climate Research Prof. Dr. Thomas Ludwig German Climate Computing Centre & University of Hamburg Hamburg, Germany
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EUDAT Aalto Data.
Get Data to Computation eudat.eu/b2stage B2STAGE How to shift large amounts of data Version 4 February 2016 This work is licensed under the.
TOWARD CONVERGENCE DA-07-P6: Data Integration and Analysis System It is expected that there will be a large increase in the volume of Earth Observation.
1. Gridded Data Sub-setting Services through the RDA at NCAR Doug Schuster, Steve Worley, Bob Dattore, Dave Stepaniak.
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
European Perspective on Distributed Computing Luis C. Busquets Pérez European Commission - DG CONNECT eInfrastructures 17 September 2013.
Government Printing Office Future Digital System (FDsys) Special Library Association Open Access and Public Access: New Models for Information Access June.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Support to scientific.
© Thomas Ludwig Prof. Dr. Thomas Ludwig German Climate Computing Center (DKRZ) University of Hamburg, Department for Computer Science (UHH/FBI) Disks,
1 This slide indicated the continuous cycle of creating raw data or derived data based on collections of existing data. Identify components that could.
CESSDA SaW Training on Trust, Identifying Demand & Networking
PIDs in EUDAT Webinar, 15 Februari 2013
Approaches and Challenges in Managing Persistent Identifiers
AP7/AP8: Long-Term Archival of CMIP6 Data
World Conference on Climate Change October 24-26, 2016 Valencia, Spain
EUDAT’s engagement with the Earth Sciences
Joseph JaJa, Mike Smorul, and Sangchul Song
VI-SEEM Data Repository
Jay Bhatt Drexel University Libraries
Climate Data Analytics in a Big Data world
DATA SPHINX & EUDAT Collaboration
CMIP6 / ENES Data TF Meeting: DKRZ
Data Management Components for a Research Data Archive
RDA uptake activities and plans: ESGF
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

CAS2K11 in Annecy, France September 11 – 14, 2011 Data Infrastructures at DKRZ Michael Lautenschlager

© DKRZ Overview DKRZ Climate research as data intensive science Data life cycle and services at DKRZ Data infrastructure development Content M. Lautenschlager (DKRZ) CAS2K11

© DKRZ M. Lautenschlager (DKRZ) CAS2K11 DKRZ - to provide high performance computing platforms, sophisticated and high capacity data management, and superior service for premium climate science. Mission High performance compute, storage, and visualization systems optimized for climate research Parallelization and optimization of climate models and workflows Efficient management of highest data volumes 3D visualization to communicate research results Support of current projects on climate research Our Competences

© DKRZ M. Lautenschlager (DKRZ) CAS2K11 Building

© DKRZ M. Lautenschlager (DKRZ) CAS2K11 Computer Hall Compute Nodes Disk Subsystem Air Conditioning

© DKRZ M. Lautenschlager (DKRZ) CAS2K11 IBM Power6-System 264 nodes with 8448 cores Clock rate 4,7 GHz Compute power per core 18,8 GFLOPS Maximum compute power 159 TFLOPS Linpack 110 TFLOPS and rank 72 in TOP500 of 2011 Main memory more than 20 TB Hard disk storage 7 PB Interconnect 8x DDR Infiniband Cooling 75% water, 25% air Compute Service

© DKRZ M. Lautenschlager (DKRZ) CAS2K11 Tape Library

© DKRZ M. Lautenschlager (DKRZ) CAS2K11  HPSS – High Performance Storage System  7x Sun StorageTek SL8500  In total 67,000 media slots  More than 100 PB storage capacity  90 tape drives ◦ LTO-5, LTO-4, T10000A/B ◦ 9940B, 9840C Tape Library

© DKRZ M. Lautenschlager (DKRZ) CAS2K11 Long-term data archive Appr. 500 TB climate data Fully documented Search engine Field-based data access Server side data processing (sub-setting, format conversion) Data download free of charge World Data Center for Climate (approved by ICSU in 2003)

© DKRZ Data Volume Increase: small to high PB M. Lautenschlager (DKRZ) CAS2K11 DKRZ Overpeck et al., Science 2011 TB IPCC GCM Data

© DKRZ CMIP5 Data Federation M. Lautenschlager (DKRZ) CAS2K11 Data estimates 2010: 10 PB in total 2.5 PB WCRP requested 1 PB IPCC-AR5 core Summary Modeling centers13 Models17 Data nodes13 Gateways5 Datasets11051 Size TB CMIP5 Archive Status Friday, 09. September :34AM (UTC) ESG infrastructure for CMIP5 provided by NCAR (ESG Portal) PCMDI (ESG Data Node)

© DKRZ Hey, Tansley and Tolle (2009) „The Fourth Paradigm“: – Data-intensive science consists of three basic activities: capture, curation, and analysis. Data comes in all scales and shapes, covering large international experiments; cross-laboratory, single-laboratory, and individual observations; and potentially individuals’ lives. The discipline and scale of individual experiments and especially their data rates make the issue of tools a formidable problem. (Page XIII) Climate Modeling: – In international experiments like CMIP5 data are produced without knowing all applications beforehand and these data are projected for interdisciplinary utilization (impact). This broad data application increases the volume of archived data and adds additional requirements compared to community specific data applications. Climate Research as Data Intensive Science M. Lautenschlager (DKRZ) CAS2K11

© DKRZ Complete data description with respect to browse, discover and use research data Efficient data access via common interfaces in standard formats Efficient data processing workflows even in data federations (data mining might provide new methods for information discovery) Common security management across data federations in order to offer unique access to individual archives Data replication for security and access performance Agreed quality assurance workflow and documentation of data processing and quality level in metadata in order to assign accepted quality levels Transparent data federation management ………. Data Management Requirements for Data Intensive Science M. Lautenschlager (DKRZ) CAS2K11

© DKRZ Starting today we need in future for climate data archives: – Sufficient information to find and select data properly – Sufficient standardization for automatic data processing – Transparent data quality flags to convince people to trust the archive federation – New methods to identify new information in federated data archives (data mining) – Complete data life cycle support for seamless management of large/huge amount of data volumes My Essentials M. Lautenschlager (DKRZ) CAS2K11

© DKRZ Data Life Cycle Management M. Lautenschlager (DKRZ) CAS2K11 DKRZ distinguishes two layers: a)Virtual research environments integrates community-based scientific research b)Long-term archiving supports interdisciplinary data utilization

© DKRZ M. Lautenschlager (DKRZ) CAS2K11 Services at DKRZ: Creation

© DKRZ M. Lautenschlager (DKRZ) CAS2K11 Services at DKRZ: Evaluation Code Optimization CMIP5

© DKRZ M. Lautenschlager (DKRZ) CAS2K11 Services at DKRZ: Archiving CERA CIM (EU-METAFOR) WDCC (CERA):

© DKRZ M. Lautenschlager (DKRZ) CAS2K11 Services at DKRZ: Dissemination IS-ENES C3-Grid CMIP5 / ESGF

© DKRZ M. Lautenschlager (DKRZ) CAS2K11 International Cooperation in Data Infrastructure Development IS-ENES: Infrastructure for the European Network for Earth System Modeling ( ExArch: Climate analytics on distributed exascale data archives (G8 project) EUDAT: EUropean DATa (EU-FP7 project starting at October 1st) ESGF: Earth System Grid Federation ( GO-ESSP: Global Organization for Earth System Science Portals ( Target infrastructure:

© DKRZ M. Lautenschlager (DKRZ) CAS2K11 Future Development: Identification of distinct data objects in data federations with PID and handle system (Cooperation with European Persistent Identifier Consortium (EPIC), DataCite scientific data publication entity: DOI has been assigned Digital objects are frozen and approved by author Citation reference is assigned for direct use in scientific literature Realized with QC-L3 in the CMIP5 data quality assessment Data Objects NetCDF/CF including use metadata Metadata Objects CIM metadata for browse + discovery Information Objects Related more general information Transaction Record Dissemination info. of digital objects Digital Object Architecture of Climate Model Data

© DKRZ Peak compute performance 150 TFLOPS -> 3 PFLOPS (x20) Disk capacity 7 PB -> 150 PB (x20) Tape capacity 100 PB -> 1 EB (x10) Are we ready for the data tsunami? Are the products ready for the data tsunami? We will be happy to discuss these issues with you - before the data sweeps us away Planned DKRZ extension in M. Lautenschlager (DKRZ) CAS2K11

© DKRZ M. Lautenschlager (DKRZ) CAS2K11 Thank you for your Attention!