SLAC Experience on Bestman and Xrootd Storage Wei Yang Alex Sim US ATLAS Tier2/Tier3 meeting at Univ. of Chicago Aug 19-20, 2009 1.

Slides:



Advertisements
Similar presentations
Andrew McNab - Manchester HEP - 17 September 2002 Putting Existing Farms on the Testbed Manchester DZero/Atlas and BaBar farms are available via the Testbed.
Advertisements

VMware Infrastructure Alex Dementsov Tao Yang Clarkson University Feb 28, 2007.
Outline Network related issues and thinking for FAX Cost among sites, who has problems Analytics of FAX meta data, what are the problems  The main object.
PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University.
Experiences Deploying Xrootd at RAL Chris Brew (RAL)
Setup your environment : From Andy Hass: Set ATLCURRENT file to contain "none". You should then see, when you login: $ bash Doing hepix login.
16 th May 2006Alessandra Forti Storage Alessandra Forti Group seminar 16th May 2006.
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
A. Sim, CRD, L B N L 1 US CMS Workshop, Mar. 3, 2009 Berkeley Storage Manager (BeStMan) Alex Sim Scientific Data Management Research Group Computational.
The SLAC Cluster Chuck Boeheim Assistant Director, SLAC Computing Services.
Tier 3 Data Management, Tier 3 Rucio Caches Doug Benjamin Duke University.
Multi-Tiered Storage with Xrootd at ATLAS Western Tier 2 Andrew Hanushevsky Wei Yang SLAC National Accelerator Laboratory 1CHEP2012, New York
A. Sim, CRD, L B N L 1 OSG Applications Workshop 6/1/2005 OSG SRM/DRM Readiness and Plan Alex Sim / Jorge Rodriguez Scientific Data Management Group Computational.
Xrootd, XrootdFS and BeStMan Wei Yang US ATALS Tier 3 meeting, ANL 1.
D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Grid Lab About the need of 3 Tier storage 5/22/121CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid.
Experience with the Thumper Wei Yang Stanford Linear Accelerator Center May 27-28, 2008 US ATLAS Tier 2/3 workshop University of Michigan, Ann Arbor.
OSG Storage Architectures Tuesday Afternoon Brian Bockelman, OSG Staff University of Nebraska-Lincoln.
LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.
Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April
 CASTORFS web page - CASTOR web site - FUSE web site -
Status & Plan of the Xrootd Federation Wei Yang 13/19/12 US ATLAS Computing Facility Meeting at 2012 OSG AHM, University of Nebraska, Lincoln.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Optimisation of Grid Enabled Storage at Small Sites Jamie K. Ferguson University of Glasgow – Jamie K. Ferguson – University.
Test Results of the EuroStore Mass Storage System Ingo Augustin CERNIT-PDP/DM Padova.
KIT – The cooperation of Forschungszentrum Karlsruhe GmbH und Universität Karlsruhe (TH) Hadoop on HEPiX storage test bed at FZK Artem Trunov.
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
Status SC3 SARA/Nikhef 20 juli Status & results SC3 throughput phase SARA/Nikhef Mark van de Sanden.
Derek Ross E-Science Department DCache Deployment at Tier1A UK HEP Sysman April 2005.
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT DPM / LFC and FTS news Ricardo Rocha ( on behalf of the IT/GT/DMS.
1 Use of SRM File Streaming by Gateway Alex Sim Arie Shoshani May 2008.
US ATLAS Western Tier 2 Status Report Wei Yang Nov. 30, 2007 US ATLAS Tier 2 and Tier 3 workshop at SLAC.
The CMS Top 5 Issues/Concerns wrt. WLCG services WLCG-MB April 3, 2007 Matthias Kasemann CERN/DESY.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.
Doug Benjamin Duke University. 2 ESD/AOD, D 1 PD, D 2 PD - POOL based D 3 PD - flat ntuple Contents defined by physics group(s) - made in official production.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.
Scalla As a Full-Fledged LHC Grid SE Wei Yang, SLAC Andrew Hanushevsky, SLAC Alex Sims, LBNL Fabrizio Furano, CERN SLAC National Accelerator Laboratory.
BaBar Cluster Had been unstable mainly because of failing disks Very few (
1 Xrootd-SRM Andy Hanushevsky, SLAC Alex Romosan, LBNL August, 2006.
DMLite GridFTP frontend Andrey Kiryanov IT/SDC 13/12/2013.
T3g software services Outline of the T3g Components R. Yoshida (ANL)
A. Mohapatra, T. Sarangi, HEPiX-Lincoln, NE1 University of Wisconsin-Madison CMS Tier-2 Site Report D. Bradley, S. Dasu, A. Mohapatra, T. Sarangi, C. Vuosalo.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Bestman & Xrootd Storage System at SLAC Wei Yang Andy Hanushevsky Alex Sim Junmin Gu.
Testing Infrastructure Wahid Bhimji Sam Skipsey Intro: what to test Existing testing frameworks A proposal.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Introduce Caching Technologies using Xrootd Wei Yang 10/28/14ATLAS TIM 2014 Univ. Chicago1.
Western Tier 2 Site at SLAC Wei Yang US ATLAS Tier 2 Workshop Harvard University August 17-18, 2006.
1 5/4/05 Fermilab Mass Storage Enstore, dCache and SRM Michael Zalokar Fermilab.
New Features of Xrootd SE Wei Yang US ATLAS Tier 2/Tier 3 meeting, University of Texas, Arlington,
SLACFederated Storage Workshop Summary Andrew Hanushevsky SLAC National Accelerator Laboratory April 10-11, 2014 SLAC.
A. Sim, CRD, L B N L 1 Production Data Management Workshop, Mar. 3, 2009 BeStMan and Xrootd Alex Sim Scientific Data Management Research Group Computational.
Atlas Tier 3 Overview Doug Benjamin Duke University.
BeStMan/DFS support in VDT OSG Site Administrators workshop Indianapolis August Tanya Levshina Fermilab.
KIT - University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association Xrootd SE deployment at GridKa WLCG.
Riccardo Zappi INFN-CNAF SRM Breakout session. February 28, 2012 Ingredients 1. Basic ingredients (Fabric & Conn. level) 2. (Grid) Middleware ingredients.
a brief summary for users
Diskpool and cloud storage benchmarks used in IT-DSS
Berkeley Storage Manager (BeStMan)
StoRM Architecture and Daemons
Artem Trunov for ALICE visit to FZK Jul
Bernd Panzer-Steindel, CERN/IT
Data Federation with Xrootd Wei Yang US ATLAS Computing Facility meeting Southern Methodist University, Oct 11-12, 2011.
GFAL 2.0 Devresse Adrien CERN lcgutil team
Ákos Frohner EGEE'08 September 2008
The INFN Tier-1 Storage Implementation
Introduction of Week 3 Assignment Discussion
Presentation transcript:

SLAC Experience on Bestman and Xrootd Storage Wei Yang Alex Sim US ATLAS Tier2/Tier3 meeting at Univ. of Chicago Aug 19-20,

Over View of what SLAC has 2

Redirector Xrootd/cmsd Redirector Xrootd/cmsd Name space xrootd Name space xrootd Data server Xrootd/cmsd XrdCnd Data server Xrootd/cmsd XrdCnd BeStMan –G XrootdFS BeStMan –G XrootdFS GridFTP w/ Xrootd DSI or XrootdFS GridFTP w/ Xrootd DSI or XrootdFS End Users XrootdFS End Users XrootdFS Remote site Data server Xrootd/cmsd XrdCnd Data server Xrootd/cmsd XrdCnd Data server Xrootd/cmsd XrdCnd Data server Xrootd/cmsd XrdCnd Data server Xrootd/cmsd XrdCnd Data server Xrootd/cmsd XrdCnd SLAC Internet Free Zone SLAC network with Internet World SURL TURL Storage Architecture Batch nodes TXNetFile or “cp” Batch nodes TXNetFile or “cp” 3

Storage Components  Bestman Gateway  T2/T3g  XrootdFS  For users and minimum T3g Usage is like NFS Based on Xrootd Posix library and FUSE Bestman, dq2 clients, and Unix tools need it  GridFTP for Xrootd  WT2 for a while Globus GridFTP + Data Storage Interface (DSI) module for Xrootd/Posix  Xrootd Core  All Babar needed is this layer redirector and data servers More functionality 4

Storage hardware at SLAC Bestman Gateway and Xrootd redirector/CNS Dual AMD Opteron 244 (single core), 1.8Ghz, 2GB, 1Gb DQ2 site service, LFC and GUMS also use this type of machine. GridFTP requires CPU power and fat network pipe 2 host, each has 2x2 AMD Opteron 2218, 2.6Ghz, 8GB, 3x1Gb Xrootd data servers on Thumpers 2x2 core AMD Opteron, 16GB, 4x1Gb, 48x 0.5-1TB SATA drives Solaris and ZFS, prevent silent data corruptions. Optimized for reliability and capacity, not for IO ops Will be Thors, 2x6 cores, 32GB, 4x1Gb, 48x 1-1.5TB SATA drives 5

Storage software deployment at SLAC OSG provides bestman-xrootd as a SE Including Bestman-Gateway, Xrootd, DSI module, XrootdFS ATLAS production env. is not installed from OSG Handle special need from ATLAS SLAC often runs newer releases before they are pushed to OSG Have a tiny Bestman-Xrootd installation from OSG 6

Bestman at SLAC 7

Difference between BeStMan Full mode and BeStMan Gateway mode Full implementation of SRM v2.2Full implementation of SRM v2.2 Support for dynamic space reservationSupport for dynamic space reservation Support for request queue management and space managementSupport for request queue management and space management Plug-in support for mass storage systemsPlug-in support for mass storage systems Follows the SRM v2.2 specificationFollows the SRM v2.2 specification Full implementation of SRM v2.2Full implementation of SRM v2.2 Support for dynamic space reservationSupport for dynamic space reservation Support for request queue management and space managementSupport for request queue management and space management Plug-in support for mass storage systemsPlug-in support for mass storage systems Follows the SRM v2.2 specificationFollows the SRM v2.2 specification Support for essential subset of SRM v2.2Support for essential subset of SRM v2.2 Support for pre-defined static space tokensSupport for pre-defined static space tokens Faster performance without queue and space managementFaster performance without queue and space management Follows the SRM functionalities needed by ATLAS and CMSFollows the SRM functionalities needed by ATLAS and CMS Support for essential subset of SRM v2.2Support for essential subset of SRM v2.2 Support for pre-defined static space tokensSupport for pre-defined static space tokens Faster performance without queue and space managementFaster performance without queue and space management Follows the SRM functionalities needed by ATLAS and CMSFollows the SRM functionalities needed by ATLAS and CMS 8

9

Bestman-Gateway operation at SLAC Stable! we tuned a few parameters Java heap size: 1300MB (on a 2GB machine) Recently increased the # of contains thread from 5 to 25 Failure modes When Xrootd servers are under stress Xrootd stat() call takes too long: result in HTTP time out or CONNECT time out Redirector can’t locate a file, result in file not found Panda jobs (not going though SRM interface) will also suffer 10

Xrootd at SLAC 11

Accessing Xrootd data by ATLAS DDM site services use SRM and GridFTP Calculates ADLER32 checksum at Xrootd data server Panda production jobs copy Xrootd data to batch nodes Expect that data accessing pattern is mostly sequential Will process all data in a data file Panda analysis jobs read directly from Xrootd servers For ROOT files, non-ROOT files are still copied to batch nodes Expect frequent random reads May only need a small fraction of data in a file Interactive users at SLAC accessing data via XrootdFS Mounted on ATLAS interactive machines Required to run dq2 client tools in and out of Xrootd servers. 12

XrootdFS NFS like data accessing Can implement its own read cache Relatively expensive compare to direct accessing Xrootd storage cluster User space Kernel space File system layer User probram XrootdFS daemon Client host 13

Failure modes and Disaster Recovery Xrootd servers are too busy Xrootd server is light weight, normally not the bottleneck File system and hardware are Redirector can’t find files Panda jobs, SRM, GridFTP will get “file not found” Simple stat() takes too long, FTS complains that SRM connection time out. Composite Name Space failure Will result in SRM failure because the underline XrootdFS will fail Can be reconstructed from data servers New CNS mechanism is much better When a data server is down and data is lost No tape backup, need to bring data back from other sites Added a table to LFC database to record the data server name of each file/guid New feature may allow us to continue operate while bringing data back 14

Time Offset Len Offset + Len Performance, ATLAS jobs’ data reading pattern Actual read request sent over the wire by a Xrootd client

Small block reading dominates in ATLAS jobs 16

Reading are from head to tail in general, with large jumps 17

Mixture of sequential and random reads, most are small reads 18

A small (~8K) read ahead may improve the reading performance 19

Performance: HammerCloud test (558) No read ahead: Extra long CPU time, low through put 20 μ = 4.4 μ = 6667 μ = 61.3

Performance, XrdClient Xrootd client read ahead cache Cache size needs to be big enough to host a ROOT file’s base tree XNet.ReadCacheSize: in $HOME/.rootrc Complicate read ahead algorithm Read ahead size should be big enough to avoid been tuned off XNet.ReadAheadSize: in $HOME/.rootrc  don’t follow! Read ahead may cause repeated data transfer. Developer contacted Experimenting  Inflate small reads to ~ 8K and turn on/off read ahead A rare but fatal bug exists in Xrootd client used by all ATLAS releases Problem fixed in the latest Xrootd and ROOT releases ATLAS release : use libXrdClient.so from ROOT 5.24 No solution for pre releases 21

Performance, HammerCloud (564) Improved, there may still be room to improve 22 μ = 56.6 (final) μ = 8.4 (final) μ = 3493 (final) c/f = 6978/766 (final)

Wish List Bestman-G framework allowing plug-in module for storage/file systems Bestman-Gateway configuration file is messy, some info are static A module developed by storage systems team can better handle the underline FS/SS A reference implementation for Posix file system or Ext3 is very useful Solid State Disk support by Xrootd Current storage configuration is optimized for capacity and reliability, at the expense of performance SSD as a cache is not forbiddingly expensive SSD as a file system cache might not be optimized for HEP data analysis 23

Other interesting topics Improved CNS, in progress Use other T1/T2s as Mass Storage System (not the ALICE mode) Metric to determine if the underline IO system is busy Monitoring 24