CC-IN2P3 Pierre-Emmanuel Brinette IN2P3-CC Storage Team

Slides:

Advertisements

Similar presentations

CASTOR Project Status CASTOR Project Status CERNIT-PDP/DM February 2000.

Advertisements

Jean-Yves Nief, CC-IN2P3 Wilko Kroeger, SCCS/SLAC Adil Hasan, CCLRC/RAL HEPiX, SLAC October 11th – 13th, 2005 BaBar data distribution using the Storage.

1 Andrew Hanushevsky - HEPiX, October 6-8, 1999 Mass Storage For BaBar at SLAC Andrew Hanushevsky Stanford.

Mass RHIC Computing Facility Razvan Popescu - Brookhaven National Laboratory.

US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.

CC - IN2P3 Site Report Hepix Fall meeting 2009 – Berkeley

CASPUR Site Report Andrei Maslennikov Sector Leader - Systems Catania, April 2001.

Building Advanced Storage Environment Cheng Yaodong Computing Center, IHEP December 2002.

Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.

Overview of day-to-day operations Suzanne Poulat.

GStore: GSI Mass Storage ITEE-Palaver GSI Horst Göringer, Matthias Feyerabend, Sergei Sedykh

20-22 September 1999 HPSS User Forum, Santa Fe CERN IT/PDP 1 History  Test system HPSS 3.2 installation in Oct 1997 IBM AIX machines with IBM 3590 drives.

Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.

SLAC Site Report Chuck Boeheim Assistant Director, SLAC Computing Services.

Centre de Calcul de l’Institut National de Physique Nucléaire et de Physique des Particules Data storage services at CC-IN2P3 Jean-Yves Nief.

LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.

CASTOR: CERN’s data management system CHEP03 25/3/2003 Ben Couturier, Jean-Damien Durand, Olof Bärring CERN.

Storage and Storage Access 1 Rainer Többicke CERN/IT.

Test Results of the EuroStore Mass Storage System Ingo Augustin CERNIT-PDP/DM Padova.

1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.

USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.

CASPUR Site Report Andrei Maslennikov Lead - Systems Rome, April 2006.

CERN Database Services for the LHC Computing Grid Maria Girone, CERN.

Scientific Storage at FNAL Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015.

Lofar Information System on GRID A.N.Belikov. Lofar Long Term Archive Prototypes: EGEE Astro-WISE Requirements to data storage Tiers Astro-WISE adaptation.

Randy MelenApril 14, Stanford Linear Accelerator Center Site Report April 1999 Randy Melen SLAC Computing Services/Systems HPC Team Leader.

RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,

Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.

W.A.Wojcik/CCIN2P3, Nov 1, CCIN2P3 Site report Wojciech A. Wojcik IN2P3 Computing Center URL:

CASTOR project status CASTOR project status CERNIT-PDP/DM October 1999.

CC - IN2P3 Site Report Hepix Fall meeting 2010 – Ithaca (NY) November 1st 2010

Database CNAF Barbara Martelli Rome, April 4 st 2006.

BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.

CC-IN2P3 Pierre-Emmanuel Brinette Benoit Delaunay IN2P3-CC Storage Team 17 may 2011.

Storage & Database Team Activity Report INFN CNAF,

IHEP Computing Center Site Report Gang Chen Computing Center Institute of High Energy Physics 2011 Spring Meeting.

Centre de Calcul de l’Institut National de Physique Nucléaire et de Physique des Particules Data storage services at CC-IN2P3 Jean-Yves Nief.

An Introduction to GPFS

November 28, 2007 Dominique Boutigny – CC-IN2P3 CC-IN2P3 Update Status.

Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.

Vendredi 27 avril 2007 Management of ATLAS CC-IN2P3 Specificities, issues and advice.

Validation tests of CNAF storage infrastructure Luca dell’Agnello INFN-CNAF.

CCIN2P3 Site Report - BNL, Oct 18, CCIN2P3 Site report Wojciech A. Wojcik IN2P3 Computing Center.

High Performance Storage System (HPSS) Jason Hick Mass Storage Group HEPiX October 26-30, 2009.

CMS-specific services and activities at CC-IN2P3 Farida Fassi October 23th.

CERN IT-Storage Strategy Outlook Alberto Pace, Luca Mascetti, Julien Leduc

Compute and Storage For the Farm at Jlab

Dynamic Extension of the INFN Tier-1 on external resources

Extending the farm to external sites: the INFN Tier-1 experience

WLCG IPv6 deployment strategy

Experience of Lustre at QMUL

The Beijing Tier 2: status and plans

WP18, High-speed data recording Krzysztof Wrona, European XFEL

Status of the SRM 2.2 MoU extension

BNL Tier1 Report Worker nodes Tier 1: added 88 Dell R430 nodes

CC - IN2P3 Site Report Hepix Spring meeting 2011 Darmstadt May 3rd

LCG-France activities

Update on Plan for KISTI-GSDC

Experience of Lustre at a Tier-2 site

CERN Lustre Evaluation and Storage Outlook

Luca dell’Agnello INFN-CNAF

Ákos Frohner EGEE'08 September 2008

The INFN Tier-1 Storage Implementation

Data Management cluster summary

Vladimir Sapunenko On behalf of INFN-T1 staff HEPiX Spring 2017

Australia Site Report Sean Crosby DPM Workshop – 13 December 2013.

Storage resources management and access at TIER1 CNAF

CASTOR: CERN’s data management system

CC and LQCD dimanche 13 janvier 2019dimanche 13 janvier 2019

The LHCb Computing Data Challenge DC06

Presentation transcript:

Storage @ CC-IN2P3 Pierre-Emmanuel Brinette IN2P3-CC Storage Team HEPiX Fall 2010 2 nov. 2010

Storage services overview Mass Storage Systems Semi permanent Storage GPFS dCache Xrootd 5 PB Tape backend 9 PB 1 PB $HOME BACKUP AFS TSM 0.3 PB 0.25 PB iRods SRB 90 Groups (VO) (HEP,astro,bio,H&SS) 3000+ Users Pierre-Emmanuel Brinette Storage@CCIN2P3

The Storage team Design and manage the storage services of IN2P3-CC 8 peoples for services management 2 FTE dCache, FTS (+1 new) 2,5 FTE HPSS 0,75 FTE GPFS 1,5 FTE TSM 1 FTE SRB, xRootd, iRods Storages infrastructure deployed by the system team ( 1,5 FTE) AFS services are managed by the system team (1,5 FTE) Robotics libraries are managed by operation team (2 FTE) Pierre-Emmanuel Brinette Storage@CCIN2P3

Tape Backend : HPSS HPSS dCache Xrootd RFIO TReqS HPSS v6.2.3 (HSM) Repository for experimental Data 9.5 PB (30 % LHC) + 4PB / Year Access with RFIO Fork of CASTOR 1.3.5 (2002) Simple and lightweight client (rfcp, rfdir, …) Good performances (directs transfers from disks servers to clients) Staging from dCache/xRootd is controlled by TReqS [1] Optimizing read operations on tape (sort requests by Tape ID & Position) Efficient control of tape drive allocation Better performances (reduce mount/dismount) Migration to HPSS v7.3 Soon Small file aggregation HPSS dCache Xrootd RFIO TReqS GET files Status File TAPE ? Position ? STAGE ! rfcp [1] http://indico.cern.ch/contributionDisplay.py?contribId=45&sessionId=10&confId=61917 Pierre-Emmanuel Brinette Storage@CCIN2P3

SRB / Xrootd/ iRods SRB iRods xRootd TReqS RFIO HPSS SRB : Xrootd : Data management middleware Used by HEP / astro / bio experiments Connected to HPSS for tape backend Now used for inter-site transfers and data management Xrootd : Intensive I/O server Mainly for local reads operations from computing farm Act as disk cache for files in HPSS Main SE for Alice, R/W permitted SRB  iRods Migration Metadata Migration only, the data remains Clients non compatible Users Applications need to be rewritten for iRods Migration planning : 2 years Data import Inter-Site transfers Local Access Alice SE SRB iRods xRootd TReqS RFIO HPSS Pierre-Emmanuel Brinette Storage@CCIN2P3

dCache 2 instances Access via dcap, gsidcap, gFTP, xrootd LCG : T1 for ATLAS & LHCB, T1/T2 for CMS Egee (10+ VO) Access via dcap, gsidcap, gFTP, xrootd Servers Pool shared between LHC VO Side effect on other VO when a VO overload pools Troubles : Transfer problems (export T1  T2) still indeterminate (Solaris machines ? , network configuration ? , last version of dCache (v1.9.5-22) ? ) Scalability : Too many simultaneous SRM requests (misbehaviors of jobs from computing farms) Future Some discussion to create dedicate dCache instance for each LHC VO Some discussion to dedicate servers pool to a single VO Pierre-Emmanuel Brinette Storage@CCIN2P3

Semi Permanent storage GPFS since 2006 in replacement of NFS Used for medium term disk storage Automatic Cleanup policies IBM support contract renewed for next 3 years for CC + several IN2P3 laboratories HSM interface ? Not yet, to many small files Why not Lustre ? Data placement policies (different QoS on same namespace) Reliable transparent online data migration : Decommissioned servers Filesystem size changes Oracle… Pierre-Emmanuel Brinette Storage@CCIN2P3

OpenAFS /afs/in2p3.fr Used for $HOME et $GROUP, experimental data & VO software toolkit ATLAS Software release issues fixed 3 RO servers sustain 6K jobs Migrate File Server from SAN to DAS Migration from 25 V24x (Sparc) to 34 Fire X42xx (x86) 2TB SAS Solaris 10 + ZFS Client performance issues Particularly on new hardware (DELL C6100, 24 Core HT) Linux kernel tuning OpenAFS linux client on SL5 (cache size increase, # daemon, …) Pierre-Emmanuel Brinette Storage@CCIN2P3

BACKUP IBM TSM : User data (AFS), still using the old TSM/AFS client Experimental data (astro) 19 IN2P3 laboratories all over France over WAN 1/10Gb links 1 billion files / 700 TiB / 1500 LTO 4 Future Migrate to v6.x (end 2010) Metadata on DB2 DB 2  4 servers TSM/AFS client : Internal tools based on Anders Magnusson client. Waiting for LTO 6 Pierre-Emmanuel Brinette Storage@CCIN2P3

High End High Capacity LEGO® Bricks Hardware : Disk 2 DS8300 =~ 256 TiB SAN / FC Disk 12 k Direct FC attachment to server Usage : Intensive IO/s , reliability AFS, Metadata, Web cluster, TSM, … 7 DDN DCS9550 =~ 1.2 PiB SAN / SATA 7.2 k Direct FC attachment to server Usage : High throughput storage GPFS , HPSS 250 Thumper/Thor =~ 8 PiB DAS / SATA 7.2 k Standalone server, 2/4 Gb/s Solaris 10 / ZFS Usage : Distributed storage software dCache, xRootd, SRB, iRods Pierre-Emmanuel Brinette Storage@CCIN2P3

AFS Oracle Cluster Hardware : Disk Pillar Data System Axiom 500 OPERA experimental data 40 TiB  120 TiB Mixed SATA/FC Disk SAN attached 34 SUN (Oracle) Fire X4240 16 SAS 10K disk / server Pierre-Emmanuel Brinette Storage@CCIN2P3

Hardware : Robotics & Tape Powder Horn STK 9840 STK 9940 Retired 4 SL8500 / 10000 Slots Redundant bots multihost 94 T10K-A/B drives 16 LTO 4 Theoretical capacity : 40 PB 3 type of drives T10K-A (120/500 GB) and T10K-B (1TB) for HPSS LTO 4 for TSM SAN : Brocade Director 48000 Monitoring using StorSentry [2] Preventive recommendations ( “Change drive”, “Copy & replace tape”, …) Good results, simplify robotics operations. Short-Term plan LTO 4 for HPSS (archiving of BaBar data from SLAC) Mid-Term plan T10K-C for HPSS, remove T10K-A Sport [2] http://indico.cern.ch/contributionDisplay.py?contribId=23&confId=61917 Pierre-Emmanuel Brinette Storage@CCIN2P3

Hardware : Next Purchase Financial Crisis !  Budget reduction in the next years (2012 …) Sun X45xx (Thor) sold by Oracle  Prices increases (>> 1 € / GiB) DELL PowerEdge R510 12 * 2TB (3”5) 7.2k SAS + DELL PowerVault MD1200 Or DELL PowerVault MD1220 24 * 600 GB (2”5) 10k SAS 0.27 € / GiB 54 TiB usable (2 MD1200) 4*1 Gbits 5 year support Benchmarks results : MD1200 : OK for dCache but not for xRootd MD1220 : To be tested Pierre-Emmanuel Brinette Storage@CCIN2P3

Disk Benchmark XIO : Simple and lightweight disks benchmark Multithread / Multiplatform (Linux, AIX, SunOS) Works on FS & RAW devices Able to define different simultaneous I/O workload Applications profiles : dCache : 128 Thread, R&W 1MiB sequential blocks, 60 % read operations Xrootd : 128 Thread, write 1MiB sequential block, read 16 KiB random blocks, 95% read operation Output statistics in CSV Used for hardware purchase/tests to evaluate performances for the last 5 years mfiles/cfiles : Simple filesystems benchmark Filesystems benchmark/disk stress tools Small Configuration file for creating large filesystem (M files) Read & control files (compute checksum)  CPU & Disk burning Used to control the real power usage Pierre-Emmanuel Brinette Storage@CCIN2P3

Data lifecycle 50% never read ! Reasons: Idea : Todd Herr / LLNL HPSS User Forum 2010 Reasons: Personal production, archiving, log files stored with data files, … Tape backend for storage middleware (dCache, …) The storage is infinite from user the user point of view ! Idea : Involve the experiments (customer) representative to manage the users data Define a data lifecycle with experiments Pierre-Emmanuel Brinette Storage@CCIN2P3

Summary dCache SRB Xrootd : iRods HPSS AFS GPFS TSM LCG : v1.9.5r22 (LCG / v1.9.0r9 (EGEE) +300 pools on 140 servers (x4540 Thor on Solaris10/ZFS) 4,8 PiB 16 M files SRB V 3.5 250 TiB disk + 3 TiB on HPSS 8 Thor 32 TB 10/ZFS + MCAT on oracle Xrootd : (march 2010 Releasse) 30 Thor, 32/64 TO 1.2 PiB iRods V2.4.1 9 Thor 32 TB HPSS V 6.2.3 on AIX p550 9.5 PB / 26 M files + 2000 different users 60 drive T10K-B / 34 drives T10K-A on 39 AIX (p505/p510/p520) Tape servers 12 IBM x3650 attached to 4 DDN Disk (4*160 TiB) / 2 FC 4Gb / 10 Gb Eth AFS V1.4.12.1 71 TB 50 servers GPFS V 3.2.1r22 ~ 900 TiB, 170 M files , 60 filesystemes 750 TiB on DCS9550, 16 I/O node IBM x3650 100 TiB on DS8300 (Metadata & small FS) TSM 2 servers (AIX 5) w/ 4 TiB on DS8300 950 M files, 16 LTO 4 drives 1500 LTO 4 3 TB day Pierre-Emmanuel Brinette Storage@CCIN2P3