Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scalla In’s & Out’s xrootdcmsd xrootd /cmsd Andrew Hanushevsky SLAC National Accelerator Laboratory OSG Administrator’s Work Shop Stanford University/SLAC.

Similar presentations


Presentation on theme: "Scalla In’s & Out’s xrootdcmsd xrootd /cmsd Andrew Hanushevsky SLAC National Accelerator Laboratory OSG Administrator’s Work Shop Stanford University/SLAC."— Presentation transcript:

1 Scalla In’s & Out’s xrootdcmsd xrootd /cmsd Andrew Hanushevsky SLAC National Accelerator Laboratory OSG Administrator’s Work Shop Stanford University/SLAC 13-November-08 http://xrootd.slac.stanford.edu

2 13-November-082: http://xrootd.slac.stanford.edu Goals A good understanding of xrootd xrootd structure cmsd Clustering & cmsd How configuration directives apply Cluster interconnections How it really works The oss Storage System & the cacheFS Scalla SRM & Scalla FUSExroodFScnsd Position of FUSE, xroodFS, cnsd The big picture

3 13-November-083: http://xrootd.slac.stanford.edu Scalla What is Scalla? SCA Structured Cluster Architecture for LLA Low Latency Access xrootd Low Latency Access to data via xrootd servers Protocol includes high performance features cmsd Structured Clustering provided by cmsd servers Exponentially scalable and self organizing

4 13-November-084: http://xrootd.slac.stanford.edu xrootd What is xrootd? A specialized file server Provides access to arbitrary files Allows reads/writes with offset/length Think of it as a specialized NFS server Then why not use NFS? Does not scale well Can’t map a single namespace on all the servers xrootd All xrootd servers can be clustered to look like “one” server

5 13-November-085: http://xrootd.slac.stanford.edu xrootd The xrootd Server Process Manager Protocol Implementation Logical File System Physical Storage System Clustering Interface xrootdProcess xrootdServer

6 13-November-086: http://xrootd.slac.stanford.edu xrootd How Is xrootd Clustered? cmsd By a management service provided by cmsd processes xrootd Oversees the health and name space on each xrootd server Maps file names to the servers that have the file xrootd Informs client via an xrootd server about the file’s location All done in real time without using any databases xrootdcmsd Each xrootd server process talks to a local cmsd process Communicate over a Unix named (i.e., file system) socket cmsdcmsd Local cmsd’s communicate to a manager cmsd elsewhere Communicate over a TCP socket role Each process has a specific role in the cluster

7 13-November-087: http://xrootd.slac.stanford.edu xrootdcmsd xrootd & cmsd Relationships Clustering Interface xrootdProcess xrootdServer cmsdProcess cmsd Manager cmsd elsewhere

8 13-November-088: http://xrootd.slac.stanford.edu How Are The Relationship Described? Relationships described in a configuration file You normally need only one such file for all servers But all servers need such a file The file tells each component its role & what to do Done via component specific directives One line per directive component_namedirective [ parameters ] who it applies to what to do all | acc | cms | sec| ofs | oss | xrd | xrootd

9 13-November-089: http://xrootd.slac.stanford.edu Directives versus Components xrd.directive xrootd.directive ofs.directive oss.directive all.directive cms.directive xrootd.fslib /…/XrdOfs.so placed in the configuration file

10 13-November-0810: http://xrootd.slac.stanford.edu Where Can I Learn More? Start With Scalla Configuration File Syntax http://xrootd.slac.stanford.edu/doc/dev/Syntax_config.htm System related parts have their own manuals Xrd/XRootd Configuration Reference xrd.xrootd. Describes xrd. and xrootd. directives Scalla Open File System & Open Storage System Configuration Reference ofs.oss. Describes ofs. and oss. directives Cluster Management Service Configuration Reference cms. Describes cms. directives all. Every manual tells you when you must use all.

11 13-November-0811: http://xrootd.slac.stanford.edu The Bigger Picture xrootd cmsd xrootd cmsd xrootd cmsd Data Server Node a.slac.stanford.edu Manager Node x.slac.stanford.edu Data Server Node b.slac.stanford.edu Which one do clients connect to? all.role server all.role manager if x.slac.stanford.edu all.manager x.slac.stanford.edu 1213 Configuration File: Note: All processes can be started in any order!

12 13-November-0812: http://xrootd.slac.stanford.edu Then How Do I Get To A Server? xrootd Clients always connect to manager’s xrootd Client’s think this is the right file server But the manager only pretends to be a file server Clients really don’t know the difference Manager finds out which server has client’s file Then magic happens…

13 13-November-0813: http://xrootd.slac.stanford.edu The Magic xrootd cmsd xrootd cmsd xrootd cmsd Data Server Node a.slac.stanford.edu Manager Node x.slac.stanford.edu Data Server Node b.slac.stanford.edu client open(“/foo”) Locate /foo Goto a open(“/foo”) /foo Node a has /foo Have /foo? I have /foo! Is Redirection!

14 13-November-0814: http://xrootd.slac.stanford.edu Request Redirection Most requests redirected to the “right” server Provides point-to-point I/O Redirection for existing files  few milliseconds 1 st time Results cached; subsequent redirection is done in microseconds Allows load balancing cms.perfcms.sched Many options; see the cms.perf & cms.sched directives Cognizant of failing servers Can automatically choose another working server cms.delay See the cms.delay directive

15 13-November-0815: http://xrootd.slac.stanford.edu Pause For Some Terminology Manager The processes whose assigned role is “manager” all.role manager Typically this is a distinguished node Redirector xrootd The xrootd process on the manager’s node Server The processes whose assigned role is “server” all.role server This is the end-point node that actually supplies the file data

16 13-November-0816: http://xrootd.slac.stanford.edu How Many Managers Can I Have? Up to eight but usually you’ll want only two Avoids single-point hardware and software failures cmsd Redirectors automatically cross-connect to all of the manager cmsd’s cmsd Servers automatically connect to all of the manager cmsd’s xrootd Clients randomly pick one of the working manager xrootd’s cmsd Redirectors algorithmically pick one of the working cmsd’s Allows you load balance manager nodes if you wish See the all.manager directive This also allows you to do serial restarts Eases administrative maintenance The cluster goes into safe mode if all the managers die or if too many servers die

17 13-November-0817: http://xrootd.slac.stanford.edu A Robust Configuration xrootd cmsd xrootd cmsd xrootd cmsd Data Server Node a.slac.stanford.edu Central Manager Node x.slac.stanford.edu Data Server Node b.slac.stanford.edu xrootd cmsd Central Manager Node y.slac.stanford.edu all.role server all.role manager if x.slac.stanford.edu all.manager x.slac.stanford.edu:1213 all.role manager if y.slac.stanford.edu all.manager y.slac.stanford.edu:1213 Redirectors

18 13-November-0818: http://xrootd.slac.stanford.edu Don’t forget the plus! How Do I Handle Multiple Managers? Ask your network administrator to… Assign the manager IP addresses to a common host name xy.domain.edu x.domain.edu, y.domain.edu Make sure that DNS load balancing does not apply! Use xy.domain.edu everywhere instead of x or y root://x.domain.edu,y.domain.edu// root://xy.domain.edu// The client will choose one of x or y In the configuration file do one of the following all.manager x.domain.edu:1213 all.manager y.domain.edu:1213 all.manager xy.domain.edu+:1213 or

19 13-November-0819: http://xrootd.slac.stanford.edu A Quick Recapitulation The system is highly structured xrootd Server xrootd’s provide the data xrootd Manager xrootd’ provide the redirection cmsd The cmsd’s manage the cluster Locate files and monitor the health of all the servers Client’s initially contact a redirector They are then redirected to a data server The structure is described by the config file Usually the same one is used everywhere

20 13-November-0820: http://xrootd.slac.stanford.edu Things You May Want To Do Automatically restart failing processes Best done via a crontab entry running a restart script xrootdcmsd Most people use root but you can use the xrootd/cmsd’s uid cmsd Renice server cmsd’s As root: renice –n -10 –p cmsd_pid cmsd Allows cmsd to get CPU even when the system is busy Can be automated via the start-up script One reason why most people use root for start/restart

21 13-November-0821: http://xrootd.slac.stanford.edu Things You Really Need To Do Plan for log and core file management /var/adm/xrootd/core & /var/adm/xrootd/logs Log rotation can be automated via command line options Over-ride the default administrative path See the all.adminpath directive Place where Unix named sockets are created /tmp is the (bad) default consider using /var/adm/xrootd/admin Plan on configuring your storage space & SRM xrootd These are xrootd specific ofs & oss options FUSEcnsd SRM requires you run FUSE, cnsd, and BestMan

22 13-November-0822: http://xrootd.slac.stanford.edu oss.usage all.export oss.cache Server Storage Configuration The questions to ask… What paths do I want to export (i.e., make available)? Will I have more than one file system on the server? Will I be providing SRM access? Will I need to support SRM space tokens?

23 13-November-0823: http://xrootd.slac.stanford.edu Exporting Paths Use the all.export directive xrootd Used by xrootd to allow access to exported paths cmsd Used by cmsd to search for files in exported paths Many options available r/o and r/w are the two most common Refer to the manual Scalla Open File System & Open Storage System Configuration Reference

24 13-November-0824: http://xrootd.slac.stanford.edu But My Exports Are Mounted Elsewhere! Common issue Say you need to mount your file system on /myfs But you want to export /atlas within /myfs What to do? Use the oss.localroot directive Only the oss component needs to know about this oss.localroot /myfs all.export /atlas Makes /atlas a visible path but internally always prefixes it with /myfs So, open(“/atlas/foo”) actually opens “/myfs/atlas/foo”

25 13-November-0825: http://xrootd.slac.stanford.edu Multiple File Systems The oss allows you to aggregate partitions Each partition is mounted as a separate file system An exported path can refer to all the partitions The oss automatically handles it by creating symlinks File name in /atlas is a symlink to an actual file in /mnt1 or /mnt2 /mnt1 /mnt2 /atlas Mounted Partitions hold file data File system used to hold exported file paths symlink oss.cache public /mnt1 xa oss.cache public /mnt2 xa all.export /atlas The oss CacheFS

26 13-November-0826: http://xrootd.slac.stanford.edu OSS CacheFS Logic Example Client creates a new file “/atlas/myfile” The oss selects a suitable partition Searches for space in /mnt1 and /mnt2 using LRU order Creates a null file in the selected partition Let’s call it /mnt1/public/00/file0001 Creates two symlinks /atlas/myfile /mnt1/public/00/file0001 /mnt/public/00/file0001.pfn /atlas/myfile Client can then write the data

27 13-November-0827: http://xrootd.slac.stanford.edu Why Use The oss CacheFS? No need if you can have one file system Use the OS volume manager if you have one and Not worried about large logical partitions or fsck time However, We use the CacheFS to support SRM space tokens Done by mapping tokens to virtual or physical partitions The oss supports both

28 13-November-0828: http://xrootd.slac.stanford.edu SRM Static Space Token Refresher Encapsulates fixed space characteristics Type of space E.g., Permanence, performance, etc. Implies a specific quota Using a particular arbitrary name E.g., atlasdatadisk, atlasmcdisk, atlasuserdisk, etc. Typically used to create new files Think of it as a space profile

29 13-November-0829: http://xrootd.slac.stanford.edu Partitions as a Space Token Paradigm Disk partitions map well to SRM space tokens A set of partitions embody a set of space attributes Performance, quota, etc. A static space token defines a set of space attributes Partitions and static space tokens are interchangeable We take the obvious step Use oss CacheFS partitions for SRM space tokens Simply map space tokens on a set of partitions The oss CacheFS supports real and virtual partitions So you really don’t need physical partitions here

30 13-November-0830: http://xrootd.slac.stanford.edu Virtual vs. Real Partitions Simple two step process Define your real partitions (one or more) These are file system mount-points Map virtual partitions on top of real ones Virtual partitions can share real partitions By convention, virtual partition names equal static token names Yields implicit SRM space token support oss.cache atlasdatadisk /store1 xa oss.cache atlasmcdisk /store1 xa oss.cache atlasuserdisk /store2 xa Virtual Partition Name Real Partition Mount Two virtual partitions Sharing the same Physical partition

31 13-November-0831: http://xrootd.slac.stanford.edu Space Tokens vs. Virtual Partitions Partitions selected by virtual partition name Configuration file: New files “cgi-tagged” with space token name root://host:1094//atlas/mcdatafile?cgroup=atlasmcdisk The default is “public” But space token names equal virtual partition names File will be allocated in the desired real/virtual partition oss.cache atlasdatadisk /store1 xa oss.cache atlasmcdisk /store1 xa oss.cache atlasuserdisk /store2 xa

32 13-November-0832: http://xrootd.slac.stanford.edu Virtual vs. Real Partitions Non-overlapping virtual partitions (R=V) A real partition represents a hard quota Implies space token gets fixed amount of space Overlapping virtual partitions (R  V) Hard quota applies to multiple virtual partitions Implies space token gets an undetermined amount of space Need usage tracking and external quota management

33 13-November-0833: http://xrootd.slac.stanford.edu Partition Usage Tracking The oss tracks usage by partition Automatic for real partitions Configurable for virtual partitions oss.usage {nolog | log dirpath} Since Virtual Partitions  SRM Space Tokens Usage is also automatically tracked by space token POSIX getxattr() returns usage information See Linux man page

34 13-November-0834: http://xrootd.slac.stanford.edu Partition Quota Management Quotas applied by partition Automatic for real partitions Must be enabled for virtual partitions oss.usage quotafile filepath Currenty, quotas are not enforced by the oss POSIX getxattr() returns quota information FUSExrootdFS Used by FUSE/xrootdFS to enforce quotas Required to run a full featured SRM

35 13-November-0835: http://xrootd.slac.stanford.edu The Quota File Lists quota for each virtual partition Hence, also a quota for each static space token Simple multi-line format vpname nnnn[k | m | g | t]\n vpname’s are in 1-to-1 correspondence with space token names The oss re-reads it whenever it changes FUSExrootdFS Useful only for FUSE/xrootdFS Quotas need to apply to the whole cluster

36 13-November-0836: http://xrootd.slac.stanford.edu Considerations Files cannot be easily reassigned space tokens Must manually “move” file across partitions Can always get original space token name Use file-specific getxattr() call Quotas for virtual partitions are “soft” Time causality prevents a real hard limit Use real partitions if hard limit needed

37 13-November-0837: http://xrootd.slac.stanford.edu Scalla SRM & Scalla: The Big Issue Scalla Scalla implements a distributed name space Very scalable and efficient Sufficient for data analysis SRM needs a single view of the complete name space This requires deploying additional components cnsd Composite Name Space Daemon (cnsd) Provides the complete name space FUSExrootdFS FUSE/xrootdFS Provides the single view via a file system interface Compatible with all stand-alone SRM’s (e.g., BestMan & StoRM)

38 13-November-0838: http://xrootd.slac.stanford.edu The Composite Name Space xrootd A new xrootd instance is used to maintain the complete name space for the cluster Only holds the full paths & file sizes, no more Normally runs on one of the manager nodes cnsd The cnsd needs to run on all the server nodes xrootd Captures xrootd name space requests (e.g., rm) xrootd Re-Issues the request to the new xrootd instance This is the cluster’s composite name space Composite because each server node adds to the name space There is no pre-registration of names; it all happens on-the-fly

39 13-November-0839: http://xrootd.slac.stanford.edu Composite Name Space Implemented Redirector xrootd@myhost:1094 Name Space xrootd@myhost:2094 DataServers Manager cnsd ofs.forward 3way myhost:2094 mkdir mv rm rmdir trunc ofs.notify closew create |/opt/xrootd/bin/cnsd xrootd.redirect myhost:2094 dirlist create/trunc mkdir mv rm rmdir opendir() refers to the directory structure maintained at myhost:2094 Client opendir() Not needed because redirector has access

40 13-November-0840: http://xrootd.slac.stanford.edu Some Caveats Name space is reasonably accurate Usually sufficient for SRM operations cnsd cnsd’s do log events to circumvent transient failures xrootd The log is replayed when the name space xrootd recovers But, the log is not infinite Invariably inconsistencies will arise The composite name space can be audited Means comparing and resolving multiple name spaces Time consuming in terms of elapsed time But can happen while the system is running Tools to do this are still under development Consider contributing such software

41 13-November-0841: http://xrootd.slac.stanford.edu The Single View Now that there is a composite cluster name space we need an SRM-compatible view The easiest way is to use a file system view BestMan and StoRM actually expect this FUSE The additional component is FUSE

42 13-November-0842: http://xrootd.slac.stanford.edu FUSE What is FUSE FUse Filesystem in Userspace Implements a file system as a user space program Linux 2.4 and 2.6 only Refer to http://fuse.sourceforge.net/ FUSE xrootd Can use FUSE to provide xrootd access Looks like a mounted file system xrootdFS We call it xrootdFS Two versions currently exist Wei Yang at SLAC (packaged with VDT) Andreas Peters at CERN (packaged with Castor)

43 13-November-0843: http://xrootd.slac.stanford.edu xrootdFS FUSExrootd xrootdFS (Linux/FUSE/xrootd) Redirector xrootd:1094 Name Space xroot:2094RedirectorHost ClientHost opendir create mkdir mv rm rmdir xrootd POSIX Client Kernel User Space SRM POSIX File System Interface FUSE FUSE/Xroot Interface Should run cnsd on servers to capture non-FUSE events

44 13-November-0844: http://xrootd.slac.stanford.edu xrootdFS SLAC xrootdFS Performance Sun V20z RHEL4 2x 2.2Ghz AMD Opteron 4GB RAM 1Gbit/sec Ethernet Client VA Linux 1220 RHEL3 2x 866Mhz Pentium 3 1GB RAM 100Mbit/sec Ethernet Unix dd, globus-url-copy & uberftp 5-7MB/sec with 128KB I/O block size Unix cp 0.9MB/sec with 4KB I/O block size Conclusion: Do not use it for data transfers!

45 13-November-0845: http://xrootd.slac.stanford.edu More Caveats FUSE FUSE must be administratively installed Requires root access Difficult if many machines (e.g., batch workers) Easier if it only involves an SE node (i.e., SRM gateway) Performance is limited FUSE Kernel-FUSE interactions are not cheap FUSE CERN modified FUSE shows very good transfer performance Rapid file creation (e.g., tar) is limited Recommend that it be kept away from general users

46 13-November-0846: http://xrootd.slac.stanford.edu Putting It All Together xrootd cmsd xrootd cmsd Data Server Nodes Manager Node SRM Node BestMangridFTP xrootd xrootdFS xrootd Basic xrootd Cluster + xrootd Name Space xrootd = LHC Grid Access cnsd + SRM Node xrootdFS (BestMan, xrootdFS, gridFTP) + cnsd

47 13-November-0847: http://xrootd.slac.stanford.edu Acknowledgements Software Contributors CERN: Derek Feichtinger, Fabrizio Furano, Andreas Peters Fermi: Tony Johnson (Java) Root: Gerri Ganis, Bertrand Bellenot SLAC: Jacek Becla, Tofigh Azemoon, Wilko Kroeger Operational Collaborators BNL, INFN, IN2P3 Partial Funding US Department of Energy Contract DE-AC02-76SF00515 with Stanford University


Download ppt "Scalla In’s & Out’s xrootdcmsd xrootd /cmsd Andrew Hanushevsky SLAC National Accelerator Laboratory OSG Administrator’s Work Shop Stanford University/SLAC."

Similar presentations


Ads by Google