CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Data Management Evolution and Strategy at CERN G. Cancio, D. Duellmann, A. Pace With input.

Slides:

Advertisements

Similar presentations

Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS CASTOR Status Alberto Pace.

Advertisements

Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.

CASTOR Upgrade, Testing and Issues Shaun de Witt GRIDPP August 2010.

CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle and Streams Diagnostics and Monitoring Eva Dafonte Pérez Florbela Tique Aires.

19 February CASTOR Monitoring developments Theodoros Rekatsinas, Witek Pokorski, Dennis Waldron, Dirk Duellmann,

LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.

Experiences Deploying Xrootd at RAL Chris Brew (RAL)

Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.

CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.

CERN IT Department CH-1211 Genève 23 Switzerland t Tape-dev update Castor F2F meeting, 14/10/09 Nicola Bessone, German Cancio, Steven Murray,

LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.

Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS CERN and Computing … … and Storage Alberto Pace Head, Data.

CERN IT Department CH-1211 Geneva 23 Switzerland t The Experiment Dashboard ISGC th April 2008 Pablo Saiz, Julia Andreeva, Benjamin.

CERN IT Department CH-1211 Genève 23 Switzerland t Plans and Architectural Options for Physics Data Analysis at CERN D. Duellmann, A. Pace.

CERN - IT Department CH-1211 Genève 23 Switzerland CASTOR Operational experiences HEPiX Taiwan Oct Miguel Coelho dos Santos.

CERN - IT Department CH-1211 Genève 23 Switzerland Service Level Status (SLS) What it does The default SLS.

WLCG Service Report ~~~ WLCG Management Board, 27 th October

CERN IT Department CH-1211 Genève 23 Switzerland t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting.

Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS From data management to storage services to the next challenges.

Your university or experiment logo here NextGen Storage Shaun de Witt (STFC) With Contributions from: James Adams, Rob Appleyard, Ian Collier, Brian Davies,

QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.

CERN IT Department CH-1211 Geneva 23 Switzerland t Storageware Flavia Donno CERN WLCG Collaboration Workshop CERN, November 2008.

CERN IT Department CH-1211 Genève 23 Switzerland t Tier0 Status - 1 Tier0 Status Tony Cass LCG-LHCC Referees Meeting 18 th November 2008.

D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.

Grid Lab About the need of 3 Tier storage 5/22/121CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid.

CERN IT Department CH-1211 Genève 23 Switzerland t Castor development status Alberto Pace LCG-LHCC Referees Meeting, May 5 th, 2008 DRAFT.

Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.

CERN - IT Department CH-1211 Genève 23 Switzerland t CASTOR Status March 19 th 2007 CASTOR dev+ops teams Presented by Germán Cancio.

Light weight Disk Pool Manager experience and future plans Jean-Philippe Baud, IT-GD, CERN September 2005.

CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle Real Application Clusters (RAC) Techniques for implementing & running robust.

CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.

1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.

CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.

6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.

CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,

The CMS Top 5 Issues/Concerns wrt. WLCG services WLCG-MB April 3, 2007 Matthias Kasemann CERN/DESY.

Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.

CERN IT Department CH-1211 Genève 23 Switzerland t HEPiX Conference, ASGC, Taiwan, Oct 20-24, 2008 The CASTOR SRM2 Interface Status and plans.

EGI-Engage Data Services and Solutions Part 1: Data in the Grid Vincenzo Spinoso EGI.eu/INFN Data Services.

EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Monitoring of the LHC Computing Activities Key Results from the Services.

CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.

CASTOR project status CASTOR project status CERNIT-PDP/DM October 1999.

WLCG Service Report ~~~ WLCG Management Board, 18 th September

Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS Data architecture challenges for CERN and the High Energy.

Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.

Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.

CERN IT Department CH-1211 Genève 23 Switzerland t The Tape Service at CERN Vladimír Bahyl IT-FIO-TSI June 2009.

SRM v2.2 Production Deployment SRM v2.2 production deployment at CERN now underway. – One ‘endpoint’ per LHC experiment, plus a public one (as for CASTOR2).

CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.

LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.

CERN - IT Department CH-1211 Genève 23 Switzerland t Grid Reliability Pablo Saiz On behalf of the Dashboard team: J. Andreeva, C. Cirstoiu,

CERN IT Department CH-1211 Genève 23 Switzerland t CCRC’08 Review from a DM perspective Alberto Pace (With slides from T.Bell, F.Donno, D.Duelmann,

CASTOR in SC Operational aspects Vladimír Bahyl CERN IT-FIO 3 2.

IT-DSS Alberto Pace2 ? Detecting particles (experiments) Accelerating particle beams Large-scale computing (Analysis) Discovery We are here The mission.

CERN IT Department CH-1211 Genève 23 Switzerland t DPM status and plans David Smith CERN, IT-DM-SGT Pre-GDB, Grid Storage Services 11 November.

Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS CASTOR and EOS status and plans Giuseppe Lo Presti on behalf.

Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS Storage plans for CERN and for the Tier 0 Alberto Pace (and.

CERN IT-Storage Strategy Outlook Alberto Pace, Luca Mascetti, Julien Leduc

CASTOR: possible evolution into the LHC era

Jean-Philippe Baud, IT-GD, CERN November 2007

Managing Storage in a (large) Grid data center

StoRM Architecture and Daemons

Introduction to Data Management in EGI

Ákos Frohner EGEE'08 September 2008

The INFN Tier-1 Storage Implementation

CTA: CERN Tape Archive Overview and architecture

WLCG Storage planning Andrea Sciabà

Real IBM C exam questions and answers

LHC Data Analysis using a worldwide computing grid

Presentation transcript:

CERN IT Department CH-1211 Genève 23 Switzerland t Data Management Evolution and Strategy at CERN G. Cancio, D. Duellmann, A. Pace With input from several IT-DM developers

CERN IT Department CH-1211 Genève 23 Switzerland t 2 Software Status for the Tier-0 For all Tier-0 data operations, CERN is using CASTOR for –Migration of LHC data to tape –Replication of the LHC data to Tier-1 sites CCRC’08 Data Challenges have validated this architecture which is now in production... but few improvements are necessary to support analysis at CERN to reduce operational effort ALICE ATLAS CMS LHCb ASGC - Taiwan BNL - USA CNAF - Italy FNAL - USA FZK-GridKA – Germany IN2P3 - France NDGF - Nordic Countries NIKHEF - The Netherlands PIC - Spain RAL - United Kingdom TRIUMF - Canada CERN Tier-0 1 GB/s aggregated

CERN IT Department CH-1211 Genève 23 Switzerland t 3 Ongoing developments for Tier-0 During , several improvements were made to CASTOR in: –Monitoring –Security –SRM –in the tape handling area –in reducing latency to access files on disk and improved concurrent access

CERN IT Department CH-1211 Genève 23 Switzerland t 4 Monitoring New Castor Monitoring –Key performance calculated in real time (for end-users and operations) –New indicators: disk hit/miss, garbage collection statistics, inter-pool transaction statistics,... –Immediate detection of performance issues, alarms triggering –Not specific to CERN monitoring infrastructure Web dashboard with real time alarms and drill-down options for incident diagnostics See poster from Pokorski, Rekatsinas, Waldron, Duellmann, Ponce, Rabaçal, Wojcieszuk

CERN IT Department CH-1211 Genève 23 Switzerland t 5 Castor Monitoring G Cancio, D.Duellmann, A.Pace – Chep’09, March 2009

CERN IT Department CH-1211 Genève 23 Switzerland t 6 Security Strong authentication now available (v2.1.8) –No problems identified for secure access using XROOT protocol –GSI/PKI (Grid certificates) and Kerberos authentication implemented in RFIO, Name Server and Request Handler However for RFIO, the Name Server and the Request Handler are stateless protocols –GSI/PKI authentication at every request does not scale to request-intensive physics analysis activity Does not make sense to add server power to compensate this inefficiency. Currently unusable –The Castor server supporting Kerberos must run SL5 to have a multithreaded server (otherwise requests will be serialized) Currently SL4 servers deployed, therefore unusable –Upgrade to SL5 server necessary Even with SL5, the Kerberos replay cache will need to be disabled to avoid request serialization Options to enforce security: –Limit RFIO and direct Name Server access to limited trusted production cases (Analysis made with XROOT only) –Implement the possibility that an initial GSI/PKI authentication is used to obtain a Kerberos ticket in RFIO. Account needed at CERN, and lot of developments See

CERN IT Department CH-1211 Genève 23 Switzerland t 7 SRM Castor interface SRM interface improved –Collaboration with RAL –Required robustness being reached Future plans –to improve logging, including tracing of requests through their lifetime, time-ordered processing –Implementing the “server busy” protocol improvement, but waiting for new clients deployment –Better integrate SRM database with stager database

CERN IT Department CH-1211 Genève 23 Switzerland t 8 Increasing Tape Efficiency In production with 2.1.8: –New tape queue management supporting recall / migration policies, access control lists, and user / group based priorities Planning a new tape format with data aggregation –Increase current write efficiency which today drops on small files –allows managed tape migration / recall to match native drive speed –no more performance drop when handling large number of aggregated small files See Poster from Bessone, Cancio Melia, Murray, Taurelli

CERN IT Department CH-1211 Genève 23 Switzerland t 9 Calculating repacking times G Cancio, D.Duellmann, A.Pace – Chep’09, March 2009

CERN IT Department CH-1211 Genève 23 Switzerland t 10 Castor file access latency Not a problem for the Tier-0 operation –Scheduled I/O guarantees Tier-0 data streaming to tape and to Tier-1 which are not affected by the high LSF latency... but a concern for interactive data access and analysis –Unacceptable to wait seconds to open a file for read when it is already on disk

CERN IT Department CH-1211 Genève 23 Switzerland t 11 Reducing access latency Removal the LSF job scheduling –Only for XROOT clients reading from disk files. File write and tape access is still scheduled as well as all RFIO requests. –Latency reduces from seconds to milliseconds –Plan to remove job scheduling also for XROOT write operations in the future release of Castor Name Server performances improvements –Planned for the next Castor release –No more multiple network roundtrips to resolve names –Additional database optimization –“stat” command to obtain file information performance improved by factor of 20 From 400 – 500 requests / sec to more than 10K requests / sec –Further improvements possible by using the XROOTD cache (but limited to XROOT clients)

CERN IT Department CH-1211 Genève 23 Switzerland t 12 More details on changes made For disk-based analysis, the XROOT protocol becomes strategic to benefit from the increased performances –Excellent integration with ROOT for physics analysis –Connection-based, efficient protocol for direct access to storage on disk pools. Strong authentication built in –Can leverage easily functionalities of the underlining file systems Security, Authorization (ACLs), Quotas, Audit and logging Opens several possibilities –Mountable file system Usable with FUSE in interactive mode, global namespace –Additional access protocols can be added (eg. NFS 4.1) Separation of access protocols from storage technology –Access protocols: RFIO, XROOT, MFS, POSIX, NFS 4.1,... –Storage technology: Castor File Servers, Lustre, Xrootd / Scalla, GPFS,...

CERN IT Department CH-1211 Genève 23 Switzerland t 13 Conclusion Data Management at CERN is ready for the LHC startup Fully committed to support all major Grid / experiment interfaces –GridFTP, RFIO, SRM, XROOT,... Major effort in improving efficiency, stability and operational cost –Tape, SRM, Monitoring,... But it is impossible to overcome intrinsic limitations of some technologies –Tapes will never support random access –Chaotic analysis requires an efficient, connection-based, secure access protocol G Cancio, D.Duellmann, A.Pace – Chep’09, March 2009

CERN IT Department CH-1211 Genève 23 Switzerland t 14 Agenda Part 1 –Tier-0 data management software status –Ongoing developments for Tier-0 activities Part 2 –Ideas for future developments D.Duellmann, A.Pace – Grid Deployment Board, 12 November 2008

Questions / Discussion

CERN IT Department CH-1211 Genève 23 Switzerland t 16 Additional ideas being discussed Role of file systems and mounted file systems for physics analysis Role of data “Repack” process Role of tapes Hardware and Storage reliability G Cancio, D.Duellmann, A.Pace – Chep’09, March 2009

CERN IT Department CH-1211 Genève 23 Switzerland t 17 Requirements for analysis Working groups on analysis requirements led by B. Panzer (Jul – Oct ‘08) for Tier-0, by M. Schulz (Nov’08 - ) for Tiers-1/2 (simplified) requirements collected so far –Analysis made on disk pools only Tape access limited to managed and aggregated recall / migration requests from the experiments No end-user recalls or direct access to tapes –Demand for direct file access No unnecessary latencies Mountable file system accessible from both interactive desktops and batch systems –Integration with the existing physics analysis tools (grid middleware and root) –Secure / manageable (Authentication, ACLs, Quota, Accounting, etc...) –File catalogue and name server consistency

CERN IT Department CH-1211 Genève 23 Switzerland t 18 Does analysis require file system access ? File system advantages –High performance file system access –Mainstream software and services No end-client software necessary (posix) for basic file operations File system drawbacks –File catalogue and name server consistency –Tools for data management need to be re- thoughts –Long-term dependencies if advanced (specific) file operations features are exposed to end users

CERN IT Department CH-1211 Genève 23 Switzerland t 19 File system requirements Areas of research –Integration with existing infrastructure –Wide Area Transfer / Global namespace –Data replication (replicated pools / hot spots) –Tape integration –integration with the file catalogue –SRM interface (based on StoRM ?, BeStMan ?) Realistic timeline for a complete solution –End 2010 for the file system Mountable file system is considered a strategic direction –Either with Castor or with a new file system

CERN IT Department CH-1211 Genève 23 Switzerland t 20 Castor XROOT Approach Direct file system access to Castor files with similar performance in term of requests/sec and latency –Name server lookup generates less than 10% performance overhead Standard grid access protocol supported, compatibility ensured –SRM, RFIO, GridFTP (available now) –Mountable file system possible using Fuse (available now) –NFS 4.1 (as a future option) Name server / file catalogue consistency ensured –Global namespace –Tape integration, with same logical filename across disk pools and tapes data management possible Separation between access protocols and underlying storage technologies –No long-term dependencies:

CERN IT Department CH-1211 Genève 23 Switzerland t 21D.Duellmann, A.Pace – Grid Deployment Board, 12 November 2008 The options under investigation Technology independent data protocol Centrally managed data solution Storage technology change transparent (Lustre / GPFS /...) Internal architecture change transparent (Tape / Disk / Flash...) End-to-end consistency (eg checksums) Higher performance Data managed by the experiments Namespace on tape differs from namespace on disk Two files with same name can have different content Two files with different name can have same content IT-organized data migrations more heavy Castor RFIO GridFTP XROOT MFS (using FUSE) Posix MFS (ex: NFS 4.1) File System Posix Mountable File System Access control, Managed migrations DB Name server DB Name server

CERN IT Department CH-1211 Genève 23 Switzerland t 22 Role of data “Repack”-ing process Traditionally, the “Repack” process has been run whenever there was a “media change” –Triggered by hardware changes. Otherwise data where only read back on (user) requests –Repack could be run with direct tape-to-tape copy In the CERN recent repack exercise, repack has been “the” tools to discover several issues unrelated to the tape technology itself –Software errors appearing only under heavy loads in recalling data –Errors in tape data format that dated back several years that were not noticed until the tape was read back Repack is a measure of our ability to read data and must be constantly exercised –Plan to run repack constantly, independently of hardware changes –Hardware changes happens asynchronously from repack process that populates new hardware as it becomes available –Repack exercise applied to all storage types, disk in particular –Ideally, no data should be left on a media for more than one year. Ideally, 3 months –Software issues, (slow) data corruption are detected in time

CERN IT Department CH-1211 Genève 23 Switzerland t 23 Role of Tapes Tapes are a “forever dying” technology –This means that we will have tapes for the next 10 – 15 years – For the whole lifetime of the LHC project Tapes are more reliable than disks –True, but tapes also fail –recovery procedures must be foreseen for both tapes and disks Tapes are cheaper than disk –media itself becomes more expensive than disk –Tapes remains cheaper if the reader infrastructure is shared among a large number of tapes Tapes do not consume electrical power –Also disk do not consume power when powered off G Cancio, D.Duellmann, A.Pace – Chep’09, March 2009

CERN IT Department CH-1211 Genève 23 Switzerland t 24 Disk as tapes ? Why not using the same tape architecture with tapes ? “Drive” to “Tape” ratio 1:300, can read only one tape at the time Why not an architecture of 1 server for 300 disks ? –Faster to power on one disk than mount and seek a file on tape –1 server can read 20 – 30 disks simultaneously G Cancio, D.Duellmann, A.Pace – Chep’09, March 2009

CERN IT Department CH-1211 Genève 23 Switzerland t 25 Hardware and Storage reliability Traditionally, reliability is provided by the hardware Hardware can fail, so we duplicate the hardware –Double the cost to compensate failing hardware Some ideas –Implement “arbitrary reliability” in software Data redundancy can be dynamically reconfigured to compensate measured hardware failure rate –Have an infrastructure resilient to hardware failures –Requires separation of CPUs from storage. SAN, iSCSI, … G Cancio, D.Duellmann, A.Pace – Chep’09, March 2009

CERN IT Department CH-1211 Genève 23 Switzerland t 26 Arbitrary reliability Every file is split in N chunks, which are encoded with redundant information so that only M < N chunks are necessary to reconstruct the original data –Similar to Raid-6 with Reed–Solomon error correction All chunks are saved on different media You can lose N-M devices at any time without loosing data. N-M is arbitrary An ongoing “repack” process allows to re-tune the parameters to match the measured hardware reliability G Cancio, D.Duellmann, A.Pace – Chep’09, March 2009 document chunks Physical storage