Update on Bit Preservation, HEPiX WG and Beyond 8/6/2015 DPHEP Collaboration Workshop1 Germán Cancio, IT-DSS-TAB CERN DPHEP Collaboration Workshop CERN,

Slides:



Advertisements
Similar presentations
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
Advertisements

1 Storage Today Victor Hatridge – CIO Nashville Electric Service (615)
12. March 2003Bernd Panzer-Steindel, CERN/IT1 LCG Fabric status
Components and Architecture CS 543 – Data Warehousing.
1 Archival Storage for Digital Libraries Arturo Crespo Hector Garcia-Molina Stanford University.
Hugo HEPiX Fall 2005 Testing High Performance Tape Drives HEPiX FALL 2005 Data Services Section.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Matei Ripeanu.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS Update on CERN Tape Status HEPiX Spring 2014, Annecy German.
Designing Storage Architectures for Preservation Collections Library of Congress, September 17-18, 2007 Preservation and Access Repository Storage Architecture.
Agenda  Overview  Configuring the database for basic Backup and Recovery  Backing up your database  Restore and Recovery Operations  Managing your.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF CERN Business Continuity Overview Wayne Salter HEPiX April 2012.
File System. NET+OS 6 File System Architecture Design Goals File System Layer Design Storage Services Layer Design RAM Services Layer Design Flash Services.
CERN IT Department CH-1211 Genève 23 Switzerland t Integrating Lemon Monitoring and Alarming System with the new CERN Agile Infrastructure.
November 2009 Network Disaster Recovery October 2014.
Experiences and Challenges running CERN's High-Capacity Tape Archive 14/4/2015 CHEP 2015, Okinawa2 Germán Cancio, Vladimír Bahyl
Module 10 Configuring and Managing Storage Technologies.
L/O/G/O External Memory Chapter 3 (C) CS.216 Computer Architecture and Organization.
SRUTHI NAGULAVANCHA CIS 764, FALL 2008 Department of Computing and Information Sciences (CIS) Kansas State University -1- Back up & Recovery Strategies.
1 Designing Storage Architectures Meeting September 17-18, 2007 Washington, DC Robert M. Raymond Tape Drive Engineering Storage Group Sun Microsystems.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Security and backups GCSE ICT.
Meeting the Data Protection Demands of a 24x7 Economy Steve Morihiro VP, Programs & Technology Quantum Storage Solutions Group
Data Preservation at the Exa-Scale and Beyond Challenges of the Next Decade(s) APARSEN Webinar, November 2014.
HEPiX bit-preservation WG Dmitry Ozerov/DESY Germán Cancio/CERN HEPiX Fall 2013, Ann Arbor.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS Summary of CASTOR incident, April 2010 Germán Cancio Leader,
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS From data management to storage services to the next challenges.
Chapter © 2006 The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/ Irwin Chapter 7 IT INFRASTRUCTURES Business-Driven Technologies 7.
IO – CART Project Status Protocol Revision Subcommittee Update 06/22/06.
Selling the Storage Edition for Oracle November 2000.
Mark A. Magumba Storage Management. What is storage An electronic place where computer may store data and instructions for retrieval The objective of.
HEPiX bit-preservation WG update – Spring 2014 Dmitry Ozerov/DESY Germán Cancio/CERN HEPiX Spring 2014, Annecy.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Component 8/Unit 9bHealth IT Workforce Curriculum Version 1.0 Fall Installation and Maintenance of Health IT Systems Unit 9b Creating Fault Tolerant.
Oracle's Distributed Database Bora Yasa. Definition A Distributed Database is a set of databases stored on multiple computers at different locations and.
CASTOR: CERN’s data management system CHEP03 25/3/2003 Ben Couturier, Jean-Damien Durand, Olof Bärring CERN.
Ian Bird Trigger, Online, Offline Computing Workshop CERN, 5 th September 2014.
CERN – IT Department CH-1211 Genève 23 Switzerland t Working with Large Data Sets Tim Smith CERN/IT Open Access and Research Data Session.
Firmware - 1 CMS Upgrade Workshop October SLHC CMS Firmware SLHC CMS Firmware Organization, Validation, and Commissioning M. Schulte, University.
Hosted by Creating RFPs for Tape Libraries Dianne McAdam Senior Analyst and Partner Data Mobility Group.
Vendor Management from a Vendor’s Perspective. Agenda Regulatory Updates and Trends Examiner Trends Technology and Solution Trends Common Issues and Misconceptions.
David Foster LCG Project 12-March-02 Fabric Automation The Challenge of LHC Scale Fabrics LHC Computing Grid Workshop David Foster 12 th March 2002.
Verification & Validation
Advances in Bit Preservation (since DPHEP’2015) 3/2/2016 DPHEP / WLCG Workshop1 Germán Cancio IT Storage Group CERN DPHEP / WLCG Workshop Lisbon, 3/2/2016.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS Data architecture challenges for CERN and the High Energy.
CERN IT Department CH-1211 Genève 23 Switzerland t The Tape Service at CERN Vladimír Bahyl IT-FIO-TSI June 2009.
AMS02 Data Volume, Staging and Archiving Issues AMS Computing Meeting CERN April 8, 2002 Alexei Klimentov.
Tape archive challenges when approaching Exabyte-scale CHEP 2010, Taipei G. Cancio, V. Bahyl, G. Lo Re, S. Murray, E. Cano, G. Lee, V. Kotlyar CERN IT-DSS.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
© 2012 IBM Corporation IBM Linear Tape File System (LTFS) Overview and Demo.
IT-DSS Alberto Pace2 ? Detecting particles (experiments) Accelerating particle beams Large-scale computing (Analysis) Discovery We are here The mission.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
AXF – Archive eXchange Format Report of AXF WG to TC-31FS 6 December, 2012.
CDP Technology Comparison CONFIDENTIAL DO NOT REDISTRIBUTE.
BEHIND THE SCENES LOOK AT INTEGRITY IN A PERMANENT STORAGE SYSTEM Gene Oleynik, Jon Bakken, Wayne Baisley, David Berg, Eileen Berman, Chih-Hao Huang, Terry.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
CTA: CERN Tape Archive Rationale, Architecture and Status
A Solution for Maintaining File Integrity within an Online Data Archive Dan Scholes PDS Geosciences Node Washington University 1.
CERN IT-Storage Strategy Outlook Alberto Pace, Luca Mascetti, Julien Leduc
Federating Data in the ALICE Experiment
APARSEN Webinar, November 2014
Tape Operations Vladimír Bahyl on behalf of IT-DSS-TAB
Experiences and Outlook Data Preservation and Long Term Analysis
Joseph JaJa, Mike Smorul, and Sangchul Song
CERN Lustre Evaluation and Storage Outlook
CTA: CERN Tape Archive Overview and architecture
Research Data Archive - technology
CASTOR: CERN’s data management system
CS310 Software Engineering Lecturer Dr.Doaa Sami
RAID RAID Mukesh N Tekwani April 23, 2019
Presentation transcript:

Update on Bit Preservation, HEPiX WG and Beyond 8/6/2015 DPHEP Collaboration Workshop1 Germán Cancio, IT-DSS-TAB CERN DPHEP Collaboration Workshop CERN, 8/9 June 2015

Outline CERN Archive, current numbers Large scale media migration (repack) and outlook Environmental hazards Current reliability and improvements Client access evolution (aka The Demise Of RFIO) HEPiX WG status 2 8/6/2015 DPHEP Collaboration Workshop

CERN Archive current numbers 3 Data: ~105 PB physics data (CASTOR) ~7 PB backup (TSM) Tape libraries: IBM TS3500 (3+2) Oracle SL8500 (4) Tape drives: ~100 archive Capacity: ~ slots ~ tapes Data to CASTOR tape, /6/2015 DPHEP Collaboration Workshop

Challenge: ~85 PB of data 2013: ~ tapes 2015: ~ tapes Verify all data after write 3x (255PB!) pumped through the infrastructure (read->write->read) Liberate library slots for new cartridges Decommission ~ obsolete tape cartridges Constraints: Be transparent for user/experiment activities Preserve temporal collocation Finish before LHC run 2 start Large scale media migration 8/6/2015 DPHEP Collaboration Workshop4

… Part 1: Oracle 5->8TB then empty 1TB Completed last week! Part 2: IBM 4->7TB then 1TB LHC Run1 Repack LHC Run1 Repack Large media migration: Repack

8/6/2015 DPHEP Collaboration Workshop Future… Run-2 ( ): Expecting ~50PB/year of new data (LHC + non-LHC) +7K tapes / year (~35’000 free library slots) Run-3 (-2022): ~150PB/year. Run-4 (2023 onwards): 600PB/year.... tape technology grows faster tape roadmaps at 30% CAGR for at least 10 years demo for 220TB tape by IBM/Fujifilm in April … but: market evolution is difficult to predict Tape media: monopoly in shrinking market disk: “duopoly” Cloud storage solutions Disk capacity slowdown (HAMR).. may slowdown tape products! storage slowdown == higher archiving costs 6

… and the past LEP-era data: ~370TB 2000: ~ 15’000 tapes 2007: ~ 1500 tapes 2015: 30 tapes… x 2 (replicated in separate buildings) Cost: 7 8/6/2015 DPHEP Collaboration Workshop

Tape contamination incident Identified 13 tapes in one library affected by concrete (or foam) particles Isolated incident by verifying all other tapes in the building Recovered 94% files with custom low-level tools and vendor recovery; 113 files lost 8 8/6/2015 DPHEP Collaboration Workshop

Airflows in tape libraries 9 (Our) tape libraries are not sealed nor filtered Over 30m 3 /min of airflows per library (Home vacuum cleaner: ~2m 3 /min) On top of already existing strong CC airflows Operating environment required for new-generation drives: ISO Class 8 (particles / m 3 ): Environmental sensitivity will continue increasing with newer drives as tape bit density grows exponentially 8/6/2015 DPHEP Collaboration Workshop

Environmental protection 10 Fruitful exchanges with other HEP tape sites on CC protective measures (access and activity restrictions, special clothing, air filters etc) Sampling by external company and corrective actions taken at CERN-CC (air filters) Library cleaning by specialist company in June Prototyped a set of environmental sensors to be installed inside libraries, using cheap commodity components, achieving industrial precision and reaction time Measure+correlate dust, temperature, humidity Raise alert in case of anomalies Can be integrated inside libraries Done in coordination with vendor, potential for built-in solutions Details: HEPiX Spring 2015 presentationHEPiX Spring 2015 presentation RPi Arduino sensor 8/6/2015 DPHEP Collaboration Workshop

CERN Archive Reliability Ongoing activity to improve archive reliability Continued systematic verification of freshly written + “cold” tapes Less physical strain on tapes (HSM access, buffered tape marks) With new hardware/media, differences between vendors getting small For smaller experiments, created dual copies on separated libraries / buildings 11 No losses 8/6/2015 DPHEP Collaboration Workshop

Reliability improvements (1) New CASTOR tape software developed and deployed in production Completely redesigned architecture, moved from C to C++ Improved error detection / handling, full support for SCSI tape alerts Re-engineered Tape Incident System Taking advantage of full SCSI tape alerts Automated problem identification: tape vs. drive vs. library Better detection of root cause -> catch problems and disable faulty elements earlier Enhanced low-level media repair tools 12 8/6/2015 DPHEP Collaboration Workshop Still much unexploited systems level information Transient/internal drive read/write/mount events at SCSI level; library low-level logs Work area in 2015

Reliability improvements(2) 13 Working on support for SCSI-4 Logical Block Protection Protect against link-level errors eg bit flips Data Blocks shipped to tape drive with pre-calculated CRC CRC re-calculated by drive (read- after-write) and stored on media; CRC checked again on reading. Minimal overhead (<1%) Tape drive can do fast media verification autonomously Supported by newer LTO and enterprise tape drives To be integrated in CASTOR in H2’15 8/6/2015 DPHEP Collaboration Workshop

Data access evolution We migrate and protect your bit streams… but can you still access them? Venerable RFIO client/server protocol reaching its end of life and to pass away “soon”, consolidate on de-facto standard (XROOT) RFIO: outdated code base, flaky security model Dependencies onto RFIO (and other CASTOR commands) within LEP SW need to be understood, isolated and removed Direct and indirect dependencies (e.g. via CERNLIB/ZEBRA, BOS, …) Replace HSM by local file access First copy files from CASTOR to a local file system Then access/process from there via standard open/close/read()... POSIX calls Should a RO replica of the LEP era data be made available on as locally accessible files on PLUS/BATCH? RO replica provided by EOS+FUSE All data would continue to be kept in the CASTOR archive 14 8/6/2015 DPHEP Collaboration Workshop

HEPiX bit-preservation WG (1) Set up during summer 2013 Co-chairs: D. Ozerov/DESY and myself Mandate: Collect/share bit preservation knowledge across HEP (and beyond) Provide technical advice to DPHEP Recommendations for sustainable HEP archival storage w3.hepix.org/bit-preservation 8/6/2015 DPHEP Collaboration Workshop15

HEPiX bit-preservation WG (2) Survey on large HEP archive sites 19 sites; areas such as archive lifetime, reliability, access, verification, migration Overall positive status but lack of SLA’s, metrics, common best practices, long-term costing impact Presented at HEPiX Fall’2013 Defined a simple, customisable model for helping establishing the long-term cost of bit-level preservation storage Approximate cost of generic (tape-based) data archive over years Factors such as media, hw, maintenance cost, media capacity growth rate, etc Different base scenarios (“frozen” to exponentially growing) presented at HEPiX Spring’2014 (and DPHEP cost of curation WS) 8/6/2015 DPHEP Collaboration Workshop16

HEPiX bit-preservation WG (3) WG “frozen” since ~summer 2014 D. Ozerov left DESY, no replacement nominated Reduced interest from HEPiX community for active collaboration at WG level… … good exchange of tape sites elsewhere (e.g. workshops such as LTUG, dedicated HEP mailing list, HEPiX!) covering broader topics; MSS-specific forums/workshops for tech implementations (CASTOR, HPSS, etc) Still some potential work in the pipeline Complete (unfinished) recommendations for bit-preservation best practices archive protection/verification/auditing/migration, reliability definition, etc. concentrate on “what” rather “how” to do it, align with OAIS ref model Expand and refine cost model active I/O, LTO tapes (non-reusable media), disk-based archiving, manpower, etc. compare to other models such as LIFE, KRDS, California Digital Library model) Exchanges with non-HEP sites (and with related groups eg NDSA Infrastructure WG)NDSA Infrastructure WG 8/6/2015 DPHEP Collaboration Workshop17

Summary CERN’s physics data archive has completed a large media migration during the Long Shutdown Assuming a continued market presence, tape still a perfect match for HEP/XXL-scale archiving Ensuring longevity of data has become a key and long-term activity: improve reliability, perform bit- level data preservation, ensure environmental conditions Applications need to adapt to changing client access protocols Collaboration with HEP sites continues also outside HEPiX WG, look outside HEP 8/6/2015 DPHEP Collaboration Workshop18

Technology forecast +30% / yr tape capacity per $ (+20%/yr I/O increase) +20% / yr disk capacity per $ 20 Source: INSIC, 86Gbit/in 2 demo (~154TB) 123Gbit/in 2 demo (~220TB) Oracle T10KD (8.5TB) IBM TS1150 (10TB)

Tape Market 8/6/2015 DPHEP Collaboration Workshop21 Tape cartridge revenue in M USD (source: Santa Clara Consulting Group)

CERN Tape Read Mounts vs Volume 8/6/2015 DPHEP Collaboration Workshop22

Bit preservation best practices(1) R1: Define the list of offered storage classes and their characteristics with regard to supported access protocols, throughput, and access latency. Related OAIS function: Receive Data (Archival Storage) The chosen technology for implementing the archive should be transparent to the customer (disk, tape, or a combination of both; number and physical location of replicas, etc.). The supported access protocols for read/write access on the archive and metadata lookup (such as namespace lookups) should be explicitly listed. Average and peak throughput and access latency rates should be defined for both reading and writing. R2: Define the lifetime of the archived data, including the data retention period and/or expiration policy. Related OAIS function: Manage Storage Hierarchy (Archival Storage) For each archive storage class or category, include a specific definition of how long the data is to be stored, and under what circumstances archive data is deleted (for example: experiment to be discontinued; user leaving the organization; etc). Budget constraints and planning should be taken into account for establishing these definitions. 8/6/2015 DPHEP Collaboration Workshop23

Bit preservation best practices(2) R3: Define expected data loss rates and associated metrics. Related OAIS function: Error Checking (Archival Storage) For the sake of uniformity between HEP sites, we recommend defining two data loss metrics for each archive storage class/category: a) the numbers of bytes lost divided by the number of bytes written per year on one hand, and b) the number of bytes lost by year divided by the total number of bytes stored. These metrics should reflect data loss as from the customer perspective (not including failures which can be recovered from, such as failures on redundant media / storage). For each archive storage class / category, historical evidence/statistics should be collected and made available to customers. R4: Provide mechanisms for file-level integrity checking via checksums. Related OAIS function: Error Checking (Archival Storage) All files stored in the archive should have an associated checksum such as MD5, SHA- 256, Adler32. This checksum should be calculated, stored and verified by the archive system; customers may also provide a pre-set checksum value against which the calculated checksum should be compared. 8/6/2015 DPHEP Collaboration Workshop24

Bit preservation best practices(3) R5: Provide mechanisms for regularly verifying the validity of the stored files within the archive. Related OAIS function: Error Checking (Archival Storage) Define the frequency and policy associated with the archive verification such as scope (random sampling, complete archive scans, checking of tape cartridge beginning/end, verification after filling a cartridge, non-recently accessed files/tapes etc.). Verification results should be used for determining the archive reliability as in R3. R6: Provide a workflow for contacting file owners in case of corruptions leading to data loss. Related OAIS function: Error Checking (Archival Storage) Keep up-to-date contact details for customers owning archive files. Provide a workflow to inform users in case of temporary data unavailability (ie. caused by broken tapes sent for repair) or persistent data loss. Allow users to delete or re-populate corrupted/lost files. R7: Provide a migration policy for media refreshments and/or storage technology upgrades. Related OAIS function: Replace Media (Archival Storage) Upgrades to newer-generation storage technology (such as migration to new disk arrays, tape repacking to higher media densities or cartridges) should be in principle transparent to the customer. In particular, data access and archive contents should not be impacted by such migrations. 8/6/2015 DPHEP Collaboration Workshop25

NDSA levels of digital preservation 8/6/2015 DPHEP Collaboration Workshop26