Michael Lautenschlager, Hannes Thiemann, Frank Toussaint WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Joachim Biercamp, Ulf Garternicht,

Slides:



Advertisements
Similar presentations
Computing Infrastructure
Advertisements

XenData SXL-5000 LTO Archive System Turnkey video archive system with near-line LTO capacities scaling from 210 TB to 1.18 PB, designed for the demanding.
XenData SX-520 LTO Archive Servers A series of archive servers based on IT standards, designed for the demanding requirements of the media and entertainment.
XenData SXL-3000 LTO Archive System Turnkey video archive system with near-line LTO capacities scaling from 150 TB to 750 TB, designed for the demanding.
Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint N.P. Drakenberg, H. Höck, M. Lautenschlager, H. Luthardt,
HPSS Update Jason Hick Mass Storage Group NERSC User Group Meeting September 17, 2007.
Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Data.
M.Lautenschlager (WDCC/MPI-M) / / 1 The CEOP Model Data Archive at the World Data Center for Climate as part of the CEOP Data Network CEOP / IGWCO.
CERA / WDCC Hannes Thiemann Max-Planck-Institut für Meteorologie Modelle und Daten zmaw.de NCAR, October 27th – 29th, 2008.
What is it? Hierarchical storage software developed in collaboration with five US department of Energy Labs since 1992 Allows storage management of 100s.
Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.
M. Stockhause et al. Martina Stockhause, Michael Lautenschlager, Frank Toussaint Deutsches Klimarechenzentrum (DKRZ) World Data Centre for Climate (WDCC)
M.Lautenschlager (WDCC / MPI-M) / / 1 WS Spatiotemporal Databases for Geosciences, Biomedical sciences and Physical sciences Edinburgh, November.
German Cluster of WDCs for Earth System Research - Entwurf - Michael Lautenschlager 1, Michael Diepenbroek 2, Hannes Grobe 2, Michael Bittner 3, Jens Klump.
Preservation and Long Term Access of Data at the World Data Centre for Climate Frank Toussaint N.P. Drakenberg, H. Höck, S. Kindermann, M. Lautenschlager,
M.Lautenschlager (WDCC / MPI-M) / / 1 GO-ESSP at LLNL Livermore, June 19th – 21st, 2006 World Data Center Climate: Status and Portal Integration.
XenData Digital Archives Simplify your video archive workflow XenData LTO Video Archive Solutions Overview © Copyright 2013 XenData Limited.
HEPIX 3 November 2000 Current Mass Storage Status/Plans at CERN 1 HEPIX 3 November 2000 H.Renshall PDP/IT.
M. Lautenschlager (M&D/MPIM)1 The CERA Database Michael Lautenschlager Modelle und Daten Max-Planck-Institut für Meteorologie Workshop "Definition.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
Polish Infrastructure for Supporting Computational Science in the European Research Space QoS provisioning for data-oriented applications in PL-Grid D.
F. Toussaint (WDCC, Hamburg) / / 1 CERA : Data Structure and User Interface Frank Toussaint Michael Lautenschlager World Data Center for Climate.
Michael Lautenschlager World Data Center Climate Model and Data / Max-Planck-Institute for Meteorology German Climate Computing Centre (DKRZ)
Bulk Metadata Structures in CERA Frank Toussaint, Michael Lautenschlager Max-Planck-Institut für Meteorologie World Data Center for Climate.
9-Sept-2003CAS2003, Annecy, France, WFS1 Distributed Data Management at DKRZ Distributed Data Management at DKRZ Wolfgang Sell Hartmut Fichtel Deutsches.
M.Lautenschlager (WDCC, Hamburg) / / 1 Semantic Data Management for Organising Terabyte Data Archives Michael Lautenschlager World Data Center.
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
M.Lautenschlager (WDCC, Hamburg) / / 1 Semantic Data Management for Organising Terabyte Data Archives Michael Lautenschlager World Data Center.
GStore: GSI Mass Storage ITEE-Palaver GSI Horst Göringer, Matthias Feyerabend, Sergei Sedykh
Long-term Archiving of Climate Model Data at WDC Climate and DKRZ Michael Lautenschlager WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Wolfgang.
20-22 September 1999 HPSS User Forum, Santa Fe CERN IT/PDP 1 History  Test system HPSS 3.2 installation in Oct 1997 IBM AIX machines with IBM 3590 drives.
M.Lautenschlager (WDCC, Hamburg) / / 1 Training-Workshop Facilities and Sevices for Earth System Modelling Integrated Model and Data Infrastructure.
Data Publication and Quality Control Procedure for CMIP5 / IPCC-AR5 Data WDC Climate / DKRZ:
M.Lautenschlager (WDCC, Hamburg) / / 1 ICSU World Data Center For Climate Semantic Data Management for Organising Terabyte Data Archives Michael.
Polish Infrastructure for Supporting Computational Science in the European Research Space FiVO/QStorMan: toolkit for supporting data-oriented applications.
ALMA Archive Operations Impact on the ARC Facilities.
The KB e-Depot long-term preservation of scientific publications in practice Marcel Ras, National library of The Netherlands.
IODE Ocean Data Portal - ODP  The objective of the IODE Ocean Data Portal (ODP) is to facilitate and promote the exchange and dissemination of marine.
The CERA2 Data Base Data input – Data output Hans Luthardt Model & Data/MPI-M, Hamburg Services and Facilities of DKRZ and Model & Data Hamburg,
M. Lautenschlager (M&D) / / 1 ENES: The European Earth System GRID ENES – Alcatel WS , ANTWERPEN Michael Lautenschlager Model and.
RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
H. Thiemann (M&D) / / 1 Hannes Thiemann M&D Statusseminar, 22. April 2004.
IPCC TGICA and IPCC DDC for AR5 Data GO-ESSP Meeting, Seattle, Michael Lautenschlager World Data Center Climate Model and Data / Max-Planck-Institute.
The Repository of the World Data Centre for Climate Frank Toussaint, Michael Lautenschlager Max-Planck-Institut für Meteorologie Repositories in Research.
PSI Meta Data meeting, Toulouse - 15 November The CERA C limate and E nvironment data R etrieval and A rchiving system at MPI-Met / M&D S. Legutke,
H. Widmann (M&D) Data Discovery and Processing within C3Grid GO-ESSP/LLNL / June, 19 th 2006 / 1 Data Discovery and Basic Processing within the German.
Lautenschlager + Thiemann (M&D/MPI-M) / / 1 Introduction Course 2006 Services and Facilities of DKRZ and M&D Integrating Model and Data Infrastructure.
Storage Why is storage an issue? Space requirements Persistence Accessibility Needs depend on purpose of storage Capture/encoding Access/delivery Preservation.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Create XML from a template Browse available records WDCC Metadata Generation with GeoNetwork Hans Ramthun, Michael Lautenschlager, Hans-Hermann Winter.
IPCC WG II + III Requirements for AR5 Data Management GO-ESSP Meeting, Paris, Michael Lautenschlager, Hans Luthardt World Data Center Climate.
Hannes Thiemann Michael Lautenschlager Deutsches Klimarechenzentrum GmbH, Germany EGU 2010.
01. December 2004Bernd Panzer-Steindel, CERN/IT1 Tape Storage Issues Bernd Panzer-Steindel LCG Fabric Area Manager CERN/IT.
M. Lautenschlager (M&D/MPIM)1 WDC on Climate as Part of the CERA 1 Database System Michael Lautenschlager Modelle und Daten Max-Planck-Institut.
CAS2K11 in Annecy, France September 11 – 14, 2011 Data Infrastructures at DKRZ Michael Lautenschlager.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
© Thomas Ludwig Prof. Dr. Thomas Ludwig German Climate Computing Center (DKRZ) University of Hamburg, Department for Computer Science (UHH/FBI) Disks,
2005 – 06 – - ESSP1 WDC Climate : Web Access to Metadata and Data Frank Toussaint World Data Center for Climate (M&D/MPI-Met, Hamburg)
CERN IT-Storage Strategy Outlook Alberto Pace, Luca Mascetti, Julien Leduc
Compute and Storage For the Farm at Jlab
XenData SX-10 LTO Archive Appliance
Approaches and Challenges in Managing Persistent Identifiers
Memory COMPUTER ARCHITECTURE
Data Citation Service for CMIP6 and IPCC DDC Aspects
22 September 2017, ESA/ESRIN - Frascati
Bernd Panzer-Steindel, CERN/IT
Ákos Frohner EGEE'08 September 2008
Research Data Archive - technology
Technology for Long Term Digital Preservation Workshop ESA 22/09/2017
Presentation transcript:

Michael Lautenschlager, Hannes Thiemann, Frank Toussaint WDC Climate / Max-Planck-Institute for Meteorology, Hamburg Joachim Biercamp, Ulf Garternicht, Stephan Kindermann, Wolfgang Stahl German Climate Computing Centre (DKRZ) Hamburg CAS2K9 September 13th – 16th, 2009 in Annecy, France

2 blizzard: /work /pf /scratch /work /pf /scratch tape:/hpss/arch /hpss/doku /dxul/ut /dxul/utf /dxul/utd tape:/hpss/arch /hpss/doku /dxul/ut /dxul/utf /dxul/utd xtape: ssh blizzard (sftp xtape.dkrz.de) „get /hpss/arch/ / “ pftp HPSS (10 Pbyte /a ) HPSS (10 Pbyte /a ) GPFS (3 Pbyte) GPFS (3 Pbyte) IBM Power6 2 x Login 250 x Compute 150 TFlops peak IBM Power6 2 x Login 250 x Compute 150 TFlops peak StorageTek Silos Total Capacity: Tapes Approx. 60 PB (LTO and Titan)

 Data production on IBM-P6: PB/year  Limit for long-term archiving: 10 PB/year ◦ Required is a complete data catalogue entry in WDCC (metadata) but decision procedure for long-term archive transition is not finally decided (data storage policy).  Limit for field-based data access: 1 PB/year ◦ Oracle BLOB-tables are replaced by CERA container file infrastructure which is developed by DKRZ/M&D 3

4 Oct Mid of 2009: 10 PB

5 Oct Mid of 2009: 400 TB

6

Data system (HPSS) (Information on DKRZ Webserver)  The DXUL/UniTree will be replaced by HPSS (High Performance Storage System). The existing DXUL- administered data - about 9 PetaByte – will be transferred.  6 robot-operated silos with slots for T10000 A/B, LTO4, 9940B and 9840C magnetic cartridges provide a primary capacity of 60 PetaByte with 75 tape drives.  The average bandwidth of the data server is at least 3 GigaByte/s while simultaneously reading and writing with peak flow rate up to 5 GigaByte/s.  390 TB Oracle BLOB data transferred into CERA container files 7

9 PB DXUL/UniTree data have to be transfered to HPSS without copying the data ◦ 9 PB DXUL data are stored in 25,000 cartridges and 25 * 10**6 files ◦ It was not feasible to run two systems in parallel for 3 -5 years which is the estimated time for copying from DXUL/UniTree to HPSS at DKRZ Challenges of physical movement from Powderhorn (Unitree) into SL8500 (HPSS): ◦ Technical aspects ◦ Legal aspects ◦ Quality assurance 8

Challenges of physical movement from Powderhorn (Unitree) into SL8500 (HPSS): ◦ Technical aspects: In principal it is possible to read UniTree cartridges with HPSS but it has been tested with old systems and with less complexity of name spaces (17 name spaces on 3 servers have to consolidated into 1 HPSS name space) ◦ Legal aspects: An unexpected license problem appeared with the proprietary UniTree library data format. Solution was to write data library information after consilidation into one large text file (10 GB). ◦ Quality assurance: complete comparison of metadata and checksum comparison of a subset of 1% of the data files Transfer to HPSS has been successfully completed, the new system is up and running with the old data. 9

10 3 of 6 StorageTek SL8500 silos under construction Room for magnetic cartridges in each silo

 CERA-2 data model left unchanged ◦ Metadata model modifications are planned in relation to the outcome of the EU-project METAFOR and CIM (Common Information Model)  WDCC metadata are still residing in Oracle database tables which build the searchable data catalogue 11

12 Entry Reference Status Distribution Contact Coverage Parameter Spatial Reference Local Adm. Data Access Data Org METAFOR / CIM: Data provenance information Searchable Earth system model description Unchanged since 1999

 Field-based data access is changing from Oracle BLOB data tables into CERA container files for two reasons: ◦ Financial aspect: Oracle license costs for an Internet accessible database system of the size of PB are out of DKRZ‘s scope. 13

◦ Technical aspect: The BLOB data concept in the range of TB and PB requires seamless data transition between disk and tape in order to keep the RDBMS restartable. This worked for Oracle and UniTree but it could not be guaranteed for the future neither by Oracle nor by HPSS. ◦ Requirement for BLOB data replacement: Transfer to CERA container files has to be transparent for CERA-2 and user data access. 14

15 Model variables Model Run Time 2 D: small LOBs (180 KB) 3 D: large LOBs (3 MB) Each columm is one data table in CERA-2 CERA Container Files are LOBs plus index for random data access are transparent for field-based data access in WDCC include basic security mechanisms of Oracle BLOBs

 Motivated by long-term archive strategy and scientific applications like CMIP5/AR5 the WDCC data access is extended: ◦ CERA Container Files: transparent field-based data access from tapes and disks (substitution of Oracle BLOB data tables) ◦ Oracle B-Files: transparent file-based data access from disk and tapes ◦ Thredds Data Server: field-based data access from files on disks (CMIP5/AR5) ◦ Intransparent data access: URLs provide links to data which are not directly/transparently accessible by WDCC/CERA (e.g. remote data archives) 16

17 Appl. Server TDS (or the like) LobServer HPSS CERA DB Layer What Where Who When How Midtier Archive: files Container: Lobs

Three major decisions are made in connetion with long-term archiving in transistion to HLRE2 and HPSS:  Limitation of annual growth rates ◦ File archive: 10 PB/year ◦ CERA Container Files: 1 PB/year  Development of CERA Container File infrastructure with emphasis on field-based data access from tapes  Integration of transparent file-based data access into WDCC/CERA in addition to traditional field-based data access 18