May 12, 2005Batch Workshop HEPiX Karlsruhe 1 Preparing for the Grid— Changes in Batch Systems at Fermilab HEPiX Batch System Workshop Karlsruhe, Germany Ken Schumacher, Steven Timm
May 12, 2005Batch Workshop HEPiX Karlsruhe 2 Introduction All big experiments at Fermilab (CDF, D0, CMS) are moving to grid-based processing. This talk will cover the following: –Batch scheduling at Fermilab before the grid. –Changes of big Fermilab clusters to Condor and why it happened –Future requirements for batch scheduling at Fermilab
May 12, 2005Batch Workshop HEPiX Karlsruhe 3 Before Grid--FBSNG Fermilab had four main clusters, CDF Reconstruction Farm, D0 Reconstruction Farm, General Purpose Farm, CMS. All used FBSNG (Farms Batch System Next Generation). Most early activities on these farms were reconstruction of experimental data and generation of Monte Carlo. All referred to generically as “Reconstruction Farms”
May 12, 2005Batch Workshop HEPiX Karlsruhe 4 FBSNG scheduling in Reconstruction Farms Dedicated reconstruction farm (CDF, D0) –Large cluster dedicated to one experiment –Small team of experts submits all jobs –Scheduling is trivial Shared reconstruction farm (General Purpose) –Small cluster shared by 10 experiments, each with one or more queues –Each experiment has maximum quota of CPU’s they can use at once –Each experiment has maximum share of farm it can use when farm is oversubscribed –Most queues do not have time limits. Priority is calculated taking into account the average time jobs that have been running in the queue –Special queues for I/O jobs that run on the head node and go to and from mass storage –Guaranteed scheduling means that everything will eventually run Other queues may be manually held to let a job run May have to temporarily idle some nodes in order to let a large parallel job start up.
May 12, 2005Batch Workshop HEPiX Karlsruhe 5 FBSNG Advantages and Disadvantages Advantages –Light resource consumption by batch system daemons –Simple design—based on resource counting rather than load measuring and balancing –Cost--No per-node license fee –Customized for Fermilab Strong Authentication requirements (Kerberos). –Quite reliable, rarely if ever does FBSNG software fail. Disadvantages –Designed strictly for Fermilab Run II production –Doesn’t have grid-friendly features (x509 authentication), although it could be added.
May 12, 2005Batch Workshop HEPiX Karlsruhe 6 Grid can use any batch system, Why Condor? Free software (but you can buy support). Supported by large team at U. of Wisconsin (and not by Fermilab programmers) Widely deployed in multi-hundred node clusters. New versions of Condor allow Kerberos 5 and x509 authentication Comes with Condor-G which simplifies submission of grid jobs Condor-C components allow for interoperation of independent Condor pools. Some of our grid-enabled users take advantage of the extended Condor features, so it is the fastest way to get our users on the grid. USCMS production cluster at Fermilab has switched to Condor, CDF reconstruction farms cluster is switching. General Purpose Farms, which are smaller, also plan to switch to Condor to be compatible with the two biggest compute resources on site.
May 12, 2005Batch Workshop HEPiX Karlsruhe 7 Rise of Analysis Clusters Experiments now use multi-hundred node Linux clusters for analysis as well, replacing expensive central machines –CDF Central Analysis Facility (CAF) originally used FBSNG— Now has switched to Condor. –D0 Central Analysis Backend (CAB) uses PBS/Torque –USCMS User Analysis Facility (UAF) used FBSNG as primitive load balancer for interactive shells—will switch to Cisco load balancer shortly. Heterogeneous job mix Many different users and groups have to be prioritized within the experiment
May 12, 2005Batch Workshop HEPiX Karlsruhe 8 CAF software In CDF terms, CAF refers to the cluster and the software that makes it go. CDF collaborators (UCSD+INFN) wrote a series of wrappers around FBSNG referred to as “CAF”. –Wrappers allow connection to debug running job, or tail files on job that is running, many other things –Also added monitoring functions –Users are tracked by Kerberos principal, and prioritized with different batch queues, but all jobs run with just a few userID’s, making management easy. dCAF is distributed CAF, the same setup replicated at dedicated CDF resources around the world. Info at
May 12, 2005Batch Workshop HEPiX Karlsruhe 9 CondorCAF in production CDF changed batch system to Condor in analysis facility Also rewrote monitoring software to work with Condor Condor “computing on demand” capacity allows users to list files, tail files, debug on batch nodes. Lots of work from the Condor team to get them going with Kerberos authentication and the large number of nodes (~700). Now half of CDF reconstruction farm also running Condor Rest of CDF reconstruction farm will convert once validation is complete SAM is data delivery and bookkeeping mechanism –used to fetch data files, keep track of intermediate files, store the results. –Replaces user-written bookkeeping system that was high-maintenance Next steps, GlideCAF to make CAF work with Condor Glide-ins across the grid on non-dedicated resources.
May 12, 2005Batch Workshop HEPiX Karlsruhe 10 Screen from CondorCAF monitoring
May 12, 2005Batch Workshop HEPiX Karlsruhe 11 SAMGrid D0 is using SAMGrid for all remote generation of Monte Carlo and reprocessing at several sites world wide. D0 Farms at FNAL are biggest site. Special job managers written to do intelligent handling of production and Monte Carlo requests All job requests and data requests go through head nodes to the outside net. Significant scalability issues, but it is in production. D0 reconstruction farms at Fermilab will continue to use FBSNG.
May 12, 2005Batch Workshop HEPiX Karlsruhe 12 Open Science Grid Continuation of efforts that were begun in Grid3. Integration testing has been ongoing since February Provisioning and deployment is occurring as we speak. At Fermilab, USCMS production cluster and General Purpose Farms will be initial presence on OSG. 10 Virtual Organizations so far, mostly US-based: –USATLAS (ATLAS collaboration) –USCMS (CMS collaboration) –SDSS (Sloan Digital Sky Survey) –fMRI (functional Magnetic Resonance Imaging, based at Dartmouth) –GADU (Applied Genomics, based at Argonne) –GRASE (Engineering applications, based at SUNY Buffalo) –LIGO (Laser Interferometer Gravitational Observatory) –CDF (Collider Detector at Fermilab) –STAR (Solenoidal Tracker at RHIC—BNL) –iVDGL (International Virtual Data Grid Laboratory)
May 12, 2005Batch Workshop HEPiX Karlsruhe 13 Structure of General Purpose Farms OSG Compute Element One node runs Globus Gatekeeper and does all communication with the grid Software comes from VDT (Virtual data toolkit, In this configuration this gatekeeper is also the Condor master. Condor software is part of VDT. Will make a separate Condor head node later once software configuration is stable. All grid software is exported by NFS to the compute nodes. No change to compute node install is necessary.
May 12, 2005Batch Workshop HEPiX Karlsruhe 14 Fermigrid Fermigrid is an internal project at Fermilab to get different Fermilab resources to be able to interoperate, and be available to the Open Science Grid Fermilab will start with General Purpose Farms and CMS being available to OSG and to each other. All non-Fermi organizations will send jobs through common site gatekeeper. Site gatekeeper will route jobs to the appropriate cluster, probably using Condor-C, details to be determined. Fermigrid provides VOMS server to manage all the Fermilab-based Virtual Organizations Fermigrid provides GUMS server to map the grid Distinguished Names to unix userid’s.
May 12, 2005Batch Workshop HEPiX Karlsruhe 15 FNSFO FBSNG HEAD NODE ENSTORE GP Farms FBSNG Worker Nodes 102 currently ENCP FBS Submit NFS RAID Current Farms Configuration
May 12, 2005Batch Workshop HEPiX Karlsruhe 16 FNGP- OSG Gate- keeper FNPCSRV1 FBSNG HEAD NODE GP Farms FBSNG Worker Nodes 102 currently ENSTORE Condor WN 14 currently New Condor WN 40 (coming this summer) Configuration with Grid NFS RAID FBS Submit Fermigrid1 Site gatekeeper Condor submit Job from OSG Job from Fermilab
May 12, 2005Batch Workshop HEPiX Karlsruhe 17 Requirements Scheduling –Current FBSNG installation in general purpose farms has complicated shares and quotas –Have to find best way to replicate this in Condor. –Hardest case to handle—low priority long jobs come into the farm while it is idle and fill it up. Do we pre-empt? Suspend? Grid credentials and mass storage –Need to verify that we can use Storage Resource Manager and gridftp from compute nodes, not just head node. Grid credentials—authentication + authorization –Condor has Kerberos 5 and x509 authentication –Need way to pass these credentials through the Globus GRAM bridge to the batch system –Otherwise local as well as grid jobs end up running non- authenticated and trusting the gatekeeper.
May 12, 2005Batch Workshop HEPiX Karlsruhe 18 Requirements 2 Accounting and auditing –Need features to track which groups and which users are using the resources –VO’s need to know who within their VO is using resources –Site admins need to know who is crashing their batch system Extended VO Privilege –Should be able to set priorities in the batch system and mass storage system by virtual organization and role. –In other words, Production Manager should be able to jump ahead of Joe Graduate Student in the queue. Practical Sysadmin concerns –Some grid user mapping scenarios visualize hundreds of pool userid’s per VO. –Have to give all of these accounts, quotas, home directories, etc. –Would be very nice to do as CondorCAF does and run with a few user id’s traceable back to kerberos principal or grid credential.