Scalla/xrootd Introduction Andrew Hanushevsky, SLAC SLAC National Accelerator Laboratory Stanford University 6-April-09 ATLAS Western Tier 2 User’s Forum
ATLAS WT2 UF 6-Apr-092 Outline File servers NFS & xrootd How xrootd manages file data Multiple file servers (i.e., clustering) Considerations and pitfalls Getting to xrootd hosted file data Available programs and interfaces
ATLAS WT2 UF 6-Apr-093 File Server Types Application Linux NFS Server Linux NFS Client Client Machine Server Machine Alternatively xrootd is nothing more than an application level file server & client using another protocol DataFiles Application Linux Linux Client Machine Server Machine xroot Server DataFiles xroot Client
ATLAS WT2 UF 6-Apr-094 Why Not Just Use NFS? NFS V2 & V3 inadequate Scaling problems with large batch farms Unwieldy when more than one server needed NFS V4? Relatively new Multiple server support still being vetted Still has a single point of failure problems
ATLAS WT2 UF 6-Apr-095 NFS & Multiple File Servers Which Server? Linux NFS Server Server Machine A DataFiles Application Linux NFS Client Client Machine Linux NFS Server Server Machine B DataFiles open(“/foo”); NFS can’t naturally deal with this problem. Typical ad hoc solutions are cumbersome, restrictive and error prone! cp /foo /tmp
ATLAS WT2 UF 6-Apr-096 xrootd & Multiple File Servers I DataFiles Application Linux Client Machine Linux Server Machine B DataFiles open(“/foo”); xroot Client Linux Server Machine A xroot Server Linux Server Machine R xroot Server /foo Redirector 1 Who has /foo? 2 I do! 3 Try B 4 open(“/foo”); xrdcp root://R//foo /tmp The xroot client does all of these steps automatically without application (user) intervention!
ATLAS WT2 UF 6-Apr-097 File Discovery Considerations I The redirector does not have a catalog of files It always asks each server, and Caches the answers in memory for a “while” So, it won’t ask again when asked about a past lookup Allows real-time configuration changes Clients never see the disruption Does have some side-effects The lookup takes less than a microsecond when files exist Much longer when a requested file does not exist!
ATLAS WT2 UF 6-Apr-098 xrootd & Multiple File Servers II DataFiles Application Linux Client Machine Linux Server Machine B DataFiles open(“/foo”); xroot Client Linux Server Machine A xroot Server Linux Server Machine R xroot Server /foo Redirector 1 Who has /foo? 2Nope!5 File deemed not to exist if there is no response after 5 seconds! xrdcp root://R//foo /tmp
ATLAS WT2 UF 6-Apr-099 File Discovery Considerations II System optimized for “file exists” case! Penalty for going after missing files Aren’t new files, by definition, missing? Yes, but that involves writing data! The system is optimized for reading data So, creating a new file will suffer a 5 second delay Can minimize the delay by using the xprep command Primes the redirector’s file memory cache ahead of time Can files appear to be missing any other way?
ATLAS WT2 UF 6-Apr-0910 Missing File vs. Missing Server In xroot files exist to the extent servers exist The redirector cushions this effect for 10 minutes Afterwards, the redirector cannot tell the difference This allows partially dead server clusters to continue Jobs hunting for “missing” files will eventually die But jobs cannot rely on files actually being missing xroot cannot provide a definitive answer to “ s: x” This requires manual safety for file creation
ATLAS WT2 UF 6-Apr-0911 Safe File Creation Avoiding the basic problem.... Today’s new file may be on yesterday’s dead server Generally, do not re-use output file names Otherwise, serialize file creation Use temporary file names when creating new files E.g., path/....root.temp Remove temporary to clean-up any previous failures E.g., -f xrdcp option or truncate option on open Upon success, rename the temporary to its permanent name
ATLAS WT2 UF 6-Apr-0912 Getting to xrootd hosted data Use the root framework Automatically, when files named root://.... Manually, use TXNetFile() object Note: identical TFile() object will not work with xrootd! xrdcp The copy command xprep The redirector seeder command Via fuse on atlint01.slac.stanford.edu POSIX preload library
ATLAS WT2 UF 6-Apr-0913 Copying xrootd hosted data xrdcp [options] source dest Copies data to/from xrootd servers Some handy options: -ferase dest before copying source -sstealth mode (i.e., produce no status messages) -S nuse n parallel streams (use only across WAN)
ATLAS WT2 UF 6-Apr-0914 Preparing xrootd hosted data xprep [options] host[:port] [path [...]] Prepares xrootd access via redirector host:port Minimizes wait time if you are creating many files Some handy options: -wfile will be created or written -f fnfile fn holds a list of paths, one per line
ATLAS WT2 UF 6-Apr-0915 Interactive xrootd hosted data Atlas xroot redirector mounted as a file system “/xrootd” on atlint01.slac.stanford.edu Use this for typical operations dq2-get dq2-put dq2-ls rm
ATLAS WT2 UF 6-Apr-0916 For Everything Else POSIX preload library (libXrdPosixPreload.so) Works with any POSIX I/O compliant program Provides direct access to xrootd hosted data Does not need any changes to the application Just run the binary as is Talk to Wei or Andy if you want to use it
ATLAS WT2 UF 6-Apr-0917 Conclusion We hope that this is an effective environment Production Analysis But, we need your feedback What is unclear What is missing What is not working What can work even better
ATLAS WT2 UF 6-Apr-0918 Future Directions More simplicity! cnsdcmsd Integrating the cnsd into cmsd Reduces configuration issues Pre-linking the extended open file system (ofs) Less configuration options Tutorial-like guides! Apparent need as we deploy at smaller sites
ATLAS WT2 UF 6-Apr-0919 Acknowledgements Software Contributors Alice: Derek Feichtinger CERN: Fabrizio Furano, Andreas Peters Fermi: Tony Johnson (Java) Root: Gerri Ganis, Beterand Bellenet, Fons Rademakers STAR/BNL: Pavel Jackl SLAC: Jacek Becla, Tofigh Azemoon, Wilko Kroeger BeStMan LBNL: Alex Sim, Junmin Gu, Vijaya Natarajan (BeStMan team) Operational Collaborators BNL, FZK, IN2P3, RAL, UVIC, UTA Partial Funding US Department of Energy Contract DE-AC02-76SF00515 with Stanford University