Optimisation of Grid Enabled Storage at Small Sites Jamie K. Ferguson University of Glasgow – Jamie K. Ferguson – University.

Slides:

Advertisements

Similar presentations

ESLEA and HEPs Work on UKLight Network. ESLEA Exploitation of Switched Lightpaths in E- sciences Applications Exploitation of Switched Lightpaths in E-

Advertisements

Alastair Dewhurst, Dimitrios Zilaskos RAL Tier1 Acknowledgements: RAL Tier1 team, especially John Kelly and James Adams Maximising job throughput using.

EGEE is a project funded by the European Union under contract IST Using SRM: DPM and dCache G.Donvito,V.Spinoso INFN Bari

Enhanced Availability With RAID CC5493/7493. RAID Redundant Array of Independent Disks RAID is implemented to improve: –IO throughput (speed) and –Availability.

Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.

ATLAS computing in Geneva Szymon Gadomski, NDGF meeting, September 2009 S. Gadomski, ”ATLAS computing in Geneva", NDGF, Sept 091 the Geneva ATLAS Tier-3.

IFIN-HH LHCB GRID Activities Eduard Pauna Radu Stoica.

Data Storage Willis Kim 14 May Types of storages Direct Attached Storage – storage hardware that connects to a single server Direct Attached Storage.

Outline Network related issues and thinking for FAX Cost among sites, who has problems Analytics of FAX meta data, what are the problems  The main object.

March 27, IndiaCMS Meeting, Delhi1 T2_IN_TIFR of all-of-us, for all-of-us, by some-of-us Tier-2 Status Report.

1 INDIACMS-TIFR TIER-2 Grid Status Report IndiaCMS Meeting, Sep 27-28, 2007 Delhi University, India.

Data oriented job submission scheme for the PHENIX user analysis in CCJ Tomoaki Nakamura, Hideto En’yo, Takashi Ichihara, Yasushi Watanabe and Satoshi.

Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.

Filesytems and file access Wahid Bhimji University of Edinburgh, Sam Skipsey, Chris Walker …. Apr-101Wahid Bhimji – Files access.

ScotGrid: a Prototype Tier-2 Centre – Steve Thorn, Edinburgh University SCOTGRID: A PROTOTYPE TIER-2 CENTRE Steve Thorn Authors: A. Earl, P. Clark, S.

Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.

David Hutchcroft on behalf of John Bland Rob Fay Steve Jones And Mike Houlden [ret.] * /.\ /..‘\ /'.‘\ /.''.'\ /.'.'.\ /'.''.'.\ ^^^[_]^^^ * /.\ /..‘\

03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.

Mass Storage System Forum HEPiX Vancouver, 24/10/2003 Don Petravick (FNAL) Olof Bärring (CERN)

Data management for ATLAS, ALICE and VOCE in the Czech Republic L.Fiala, J. Chudoba, J. Kosina, J. Krasova, M. Lokajicek, J. Svec, J. Kmunicek, D. Kouril,

Tier1 Status Report Martin Bly RAL 27,28 April 2005.

Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.

Data transfer over the wide area network with a large round trip time H. Matsunaga, T. Isobe, T. Mashimo, H. Sakamoto, I. Ueda International Center for.

6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.

Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.

ICOM 6115: Computer Systems Performance Measurement and Evaluation August 11, 2006.

Testing the UK Tier 2 Data Storage and Transfer Infrastructure C. Brew (RAL) Y. Coppens (Birmingham), G. Cowen (Edinburgh) & J. Ferguson (Glasgow) 9-13.

Jens G Jensen e-Science Centre hepsysmanix HEPiX report for hepsysman RAL, 10 May 2006.

1 Database mini workshop: reconstressing athena RECONSTRESSing: stress testing COOL reading of athena reconstruction clients Database mini workshop, CERN.

BNL Facility Status and Service Challenge 3 HEPiX Karlsruhe, Germany May 9~13, 2005 Zhenping Liu, Razvan Popescu, and Dantong Yu USATLAS/RHIC Computing.

Caitriana Nicholson, CHEP 2006, Mumbai Caitriana Nicholson University of Glasgow Grid Data Management: Simulations of LCG 2008.

Local Monitoring at SARA Ron Trompert SARA. Ganglia Monitors nodes for Load Memory usage Network activity Disk usage Monitors running jobs.

1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.

Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]

HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford.

Status SC3 SARA/Nikhef 20 juli Status & results SC3 throughput phase SARA/Nikhef Mark van de Sanden.

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.

BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.

Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.

CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,

Tier-2 storage A hardware view. HEP Storage dCache –needs feed and care although setup is now easier. DPM –easier to deploy xrootd (as system) is also.

Site Report: Prague Jiří Chudoba Institute of Physics, Prague WLCG GridKa+T2s Workshop.

Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.

Install, configure and test ICT Networks

Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.

PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.

Multimedia Retrieval Architecture Electrical Communication Engineering, Indian Institute of Science, Bangalore – , India Multimedia Retrieval Architecture.

Data transfers and storage Kilian Schwarz GSI. GSI – current storage capacities vobox LCG RB/CE GSI batchfarm: ALICE cluster (67 nodes/480 cores for batch.

Testing Infrastructure Wahid Bhimji Sam Skipsey Intro: what to test Existing testing frameworks A proposal.

BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.

1 5/4/05 Fermilab Mass Storage Enstore, dCache and SRM Michael Zalokar Fermilab.

J Jensen/J Gordon RAL Storage Storage at RAL Service Challenge Meeting 27 Jan 2005.

Improving Performance using the LINUX IO Scheduler Shaun de Witt STFC ISGC2016.

INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.

A Computing Tier 2 Node Eric Fede – LAPP/IN2P3. 2 Eric Fede – 1st Chinese-French Workshop Plan What is a Tier 2 –Context and definition To be a Tier 2.

DB Questions and Answers open session (comments during session) WLCG Collaboration Workshop, CERN Geneva, 24 of April 2008.

The CMS Beijing Tier 2: Status and Application Xiaomei Zhang CMS IHEP Group Meeting December 28, 2007.

Disks; Distributed Systems

The Beijing Tier 2: status and plans

High Availability Linux (HA Linux)

James Casey, IT-GD, CERN CERN, 5th September 2005

Service Challenge 3 CERN

BNL FTS services Hironori Ito.

RT2003, Montreal Niko Neufeld, CERN-EP & Univ. de Lausanne

STORM & GPFS on Tier-2 Milan

Grid Canada Testbed using HEP applications

Disks; Distributed Systems

lundi 25 février 2019 FTS configuration

Presentation transcript:

Optimisation of Grid Enabled Storage at Small Sites Jamie K. Ferguson University of Glasgow – Jamie K. Ferguson – University of Glasgow Graeme A. Stewart - University of Glasgow Greig A. Cowan - University of Edinburgh

● Typical Tier 2 & Purpose of the Inbound Transfer Tests ● Details of the hardware/software configuration for the File Transfers ● Analysis of Results Introduction

LHC and the LCG ● LHC – most powerful instrument ever built in the field of physics ● Generate huge amounts of data every second it is running ● Retention of 10PB annually to be processed at sites ● Use case is typically files of size ~GB, many of which are cascaded down to be stored at T2s until analysis jobs process them

Typical Tier2 - Definition ● Limited Hardware Resources – (In GridPP) Using dCache or dpm as SRM – Few (one or two) Disk Servers – Few Terabytes of RAIDed Disk ● Limited Manpower – Not enough time to Configure and/or Administer a Sophisticated Storage System – Ideally want something just to work “out of the box”

Importance of Good Write (and Read) Rates ● Experiments Desire Good in/out Rates – Write more stressful than read, hence our focus – Expected data transfer rates (T1==>T2) will be directly proportional to the storage at a T2 site – Few 100Mbps for small(ish) sites up to several Gbps for large CMS sites ● Limiting Factor could be one of many things – I know this from recently coordinating 24hour tests between all 19 of the GridPP T2 member institutes ● We also yielded file transfer failure rates

Glite File Transfer Service ● Used FTS to manage transfers – Easy to use file transfer management software – Uses SURLs for source and destination – Experiments shall also use this software – Able to set channel parameters N f and N s – Able to monitor each job, and each transfer within each job. ● Pending, Active, Done, Failed, etc.

What Variables Were Investigated? ● Destination srm – dCache (v ) – dpm (v1.4.5) ● The underlying File system on the destination – ext2, ext3, jfs, xfs ● Two Transfer-Channel Parameters – No. of Parallel Files – No. of GridFTP Streams ● Example => N f =5, N s =3

Software Components ● Dcap and rfio are the transportation layers for dCache and dpm respectively ● Under this software stack is the filesystem itself. e.g. ext2 ● Above this stack was the filetransfer.py script – See sfer#filetransfer

Software Components - dpm ● All the daemons of the destination dpm were running on the same machine ● dCache had a similar setup in terms everything housed in a single node

Hardware Components ● Source was a dpm. – High performance machine ● Destination was single node dual core Xeon CPU ● Machines were on same network. – Connected via 1GB link which had negligible other traffic. – No firewall between source and destination ● No iptables loaded ● Destination had three 1.7TB partitions – Raid 5 – 64K stripe

Kernels and Filesystems ● A CERN contributed rebuild of the standard SL kernel was used to investigate xfs. – This differs from the first Kernel only in the addition of xfs support – Instructions on how to install kernel at – Necessary RPMs available from ftp://ftp.scientificlinux.org/linux/scientific/305/i386/ contrib/RPMS/xfs/

Method ● 30 source files, each of size 1GB were used – This size is typical of the sizes of LCG files that shall be used by LHC experiments ● Both dCache and dpm were used during testing ● Each kernel/Filesystem was tested - 4 such pairs ● Values of 1,3,5,10 were used for No. Files and No. Streams - giving a matrix of 16 test results ● Each test was repeated 4 times to attain a mean. – Outlying results (~ < 50% of other results) were retested ● This prevented failures in higher level components e.g. FTS adversely affecting results

Results – Average Rates ● All results are in Mbps

Average dCache rate vs. N f

Average dCache rate vs. N s

Average dpm rate vs. N f

Average dpm rate vs. N s

Results – Average Rates ● In our tests dpm outperformed dCache for every average Nf, Ns

Results – Average Rates ● Transfer rates are greater when using jfs and xfs rather than ext2 or ext3 ● Rates for ext2 are better than ext3 due to the fact that ext2 does not suffer from journalling overheads

Results - Average Rates ● Having more than one N f on the channel substantially improves the transfer rate for both SRMs and for all filesystems. And for both SRMs, the average rate is similar for N f =3,5,10 ● dCache – N s = 1 is the optimal value for all filesystems ● dpm – N s = 1 is the optimal value for ext2 and ext3 – For jfs and xfs rate seems independent of N s ● For both SRMs, the average rate is similar for N s =3,5,10

Results – Error (Failure) Rates ● Failures, in both cases tended to be caused by a failure to correctly call srmSetDone() in FTS resulting from a high machine load ● Recommended to separate the SRM daemons and disk servers, especially at larger sites

Results – Error (Failure) Rates ● dCache – small number of errors for the ext2 and ext3 filesystems ● caused by high machine load – No errors for the jfs and xfs filesystems ● dpm – all filesystems had errors ● As in dCache case, caused by high machine load – Error rate for jfs was particularly high, but this was down to many errors in one single transfer

Results – FTS Parameters ● N f – Initial tests indicate that N f set at a high value (15) causes a large load on machine when first batch of files completes. Subsequent batches time-out. – Caused by post-transfer SRM protocol negotiations occurring simultaneously ● N s – > 1 caused slower rates for ¾ of the SRM/filesystem combinations – Multiple streams causes a file to be split up and sent down different TCP channels – This results in “random writes” to the disk. – Single streams cause the data packets to arrive sequentially and can be written sequentially also

Future Work ● Use SL4 as OS – allows testing of 2.6 kernel ● Different stripe size for RAID configuration ● TCP read and write buffer sizes – Linux kernel-networking tuning parameters ● Additional hardware, e.g. More disk servers ● More realistic simulation – Simultaneous reading/writing – Local file access ● Other filesystems? – e.g. reiser, but this filesystem is more applicable to holding small files, not the sizes that shall exist on the LCG

Conclusions ● Choice of SRM application should be made at site level based on resources available ● Using newer high performance filesystem jfs or xfs increases inbound rate – Howto move to xfs filesystem without loosing data matting_Howto matting_Howto ● High value for Nf – Although too high will cause other problems ● Low value for Ns – I recommended N s =1 and N f =8 for GridPP inter-T2 tests that I'm currently conducting