Presentation is loading. Please wait.

Presentation is loading. Please wait.

Farms User Meeting April 27 2005--Steven Timm 1 Farms Users meeting 4/27/2005

Similar presentations


Presentation on theme: "Farms User Meeting April 27 2005--Steven Timm 1 Farms Users meeting 4/27/2005"— Presentation transcript:

1 Farms User Meeting April 27 2005--Steven Timm 1 Farms Users meeting 4/27/2005 http://www-oss.fnal.gov/scs/farms/farm_users

2 Farms User Meeting April 27 2005--Steven Timm 2 Agenda Events on farm past two weeks Scheduled downtimes New Users –M. Kostin, Accel. Division –A. Lebedev, E907/MIPP Existing User reports Special Presentation: Upcoming Transition of General Purpose Farms to Condor and Grid

3 Farms User Meeting April 27 2005--Steven Timm 3 Issues in last 2 weeks Thermal problems in LCC over weekend, no nodes went down. Down nodes on CDF farm: 1 out of 98 FBSNG, 1 out of 72 condor/CAF Down nodes on D0 farm— 12 out of 444 nodes Down nodes GP farm—0 out of 102 GP Farms networking was upgraded to gigabit on all nodes that are capable

4 Farms User Meeting April 27 2005--Steven Timm 4 Downtimes GP Farms—none scheduled D0 farms—moving 3 racks of worker nodes to GCC, to be scheduled CDF Farms, upgrade of condor/CAF nodes to SLF304, in progress

5 Farms User Meeting April 27 2005--Steven Timm 5

6 6 QueueProcess typeShareQPrioTime (GHz-hr) Quota 1CPU=100 AccelAccel_Worker305200 AugerAuger_Worker2.506400 Dark EnergyDES2.504000 E898E898_Worker3.050010000 E898 ShortE898_Short3.01000126400 E907e9072.501000 KTeVFast (inf)9000n/a KTeVLongKTeV_Long1.0010000 KTeVKTeV_Medium3.0100066400 Minos 3.050010000 MinosShortMinos_Short3.01000126400 Run2MC 1.502000 SDSSImage3.010001210000 SDSSSpectro3.010002400 Theory 1.505000 General Purpose Farms Allocations

7 Farms User Meeting April 27 2005--Steven Timm 7

8 8

9 9

10 10 GRID on General Purpose Farms Executive Summary: –A 14-node test cluster is available for testing Condor and grid jobs now –Plan tentatively to add new nodes to Condor/grid cluster this summer –Hope to complete transition to Condor batch system by end of calendar year 2005 –Local and grid submissions will still be allowed on General Purpose Farms –Existing GP Farms users will have same priority whether submitting via grid or locally –We will make sure appropriate training, documentation and support is available to help users with the transition. –Testing currently ongoing with first grid-enabled user SDSS/DES

11 Farms User Meeting April 27 2005--Steven Timm 11 Outline: Why use the Grid? Why use Condor Virtual Organizations The Open Science Grid GP Farms on the Open Science Grid Fermigrid Access to mass storage

12 Farms User Meeting April 27 2005--Steven Timm 12 Why the Grid? General Purpose Farms have limited resources and equipment budget All Fermilab CD resources have mandate from division to interoperate Adding a grid interface to the farms enables us to interoperate with the larger clusters at Fermilab (specifically CMS, CDF) and make use of extra resources. Negotiation to use resources of the Open Science Grid off-site is in progress as well.

13 Farms User Meeting April 27 2005--Steven Timm 13 Why Condor? Free software (but you can buy support). Supported by large team at U. of Wisconsin (and not by Fermilab programmers) Widely deployed in multi-hundred node clusters at Fermilab (CDF, CMS). New versions of Condor allow Kerberos 5 and x509 authentication Comes with Condor-G which simplifies submission of grid jobs Condor-C components allow for interoperation of independent Condor pools Some of our grid-enabled users take advantage of the extended Condor features, so it is the fastest way to get our users on the grid.

14 Farms User Meeting April 27 2005--Steven Timm 14 Virtual Organizations Each experiment is a Virtual Organization Membership is managed by VOMS software (Virtual Organization Management Service) and VOMRS software (Virtual Organization Management Registration Service) Virtual Organizations have already been created for all major user groups on the General Purpose Farms as part of Fermigrid project. We need at least one responsible person from each user group that is using the farms to say who should be members of their virtual organization. Groups we have identified: –sdss, ktev, miniboone, hypercp, minos, numi, accelerator, ppd_astro, ppd_theory, patriot (run2mc),auger

15 Farms User Meeting April 27 2005--Steven Timm 15 Open Science Grid Continuation of efforts that were begun in Grid3. Integration testing has been ongoing since February Provisioning and deployment is occurring as we speak. General Purpose Farms and CMS will both be Fermilab presences on the Open Science Grid 10 Virtual Organizations so far, mostly US-based: –USATLAS –USCMS –SDSS –fMRI (functional Magnetic Resonance Imaging, based at Dartmouth) –GADU (Applied Genomics, based at Argonne) –GRASE (Engineering applications, based at SUNY Buffalo) –LIGO –CDF –STAR –iVDGL http://www.opensciencegrid.org

16 Farms User Meeting April 27 2005--Steven Timm 16 Current Fermi GP farms OSG presence Node fngp-osg as gatekeeper and condor master –(Dell dual Xeon 3.6 GHz) Software comes from the Virtual Data Toolkit –http://www.cs.wisc.edu/vdt 14 worker nodes as condor pool (fnpc201-214) Can successfully run batch jobs submitted locally via Condor and across the grid via Condor-G Has passed all validation tests of the Open Science Grid Using the extended privilege authorization from the VO Privilege Project –Each group can define different roles for their users. –We can map whole group to one userid, several userids, or a pool of userid’s.

17 Farms User Meeting April 27 2005--Steven Timm 17 Current Architecture All home directories and staging areas are served off of FNSFO and will be accessible as before All OSG sites have $app and $data directories for applications and data transfer, these are served off of fngp-osg by NFS All VDT-related software (globus, condor, etc) served off of fngp-osg Grid jobs come in directly to fngp-osg and are farmed out to the 14 condor nodes.

18 Farms User Meeting April 27 2005--Steven Timm 18 Goals for GP Farms Grid Deployment GP Farms is very busy > 90% Two big productions about to start Need to preserve lions share of CPU cycles for existing users Jobs from groups that are not GP Farms users will have only opportunistic use of the farms. –Run at lowest priority (10 -6 of regular priority) –Limited in how many jobs they can start at once. At the moment OSG jobs confined to condor pool of 14 slow nodes that weren’t otherwise getting used at all. GP Farms users will be able to access allocated share of resources whether they come in via grid or not.

19 Farms User Meeting April 27 2005--Steven Timm 19 FNSFO FBSNG HEAD NODE ENSTORE GP Farms FBSNG Worker Nodes 102 currently ENCP FBS Submit NFS RAID Current Farms Configuration

20 Farms User Meeting April 27 2005--Steven Timm 20 FNGP- OSG Gate- keeper FNPCSRV1 FBSNG HEAD NODE GP Farms FBSNG Worker Nodes 102 currently ENSTORE Condor WN 14 currently New Condor WN 40 (coming this summer) Configuration with Grid NFS RAID FBS Submit Fermigrid1 Site gatekeeper Condor submit Job from OSG Job from Fermilab

21 Farms User Meeting April 27 2005--Steven Timm 21 Fermigrid Interface Fermigrid is providing common site services for virtual organization management (VOMS) and user mapping (GUMS) These services expected to be online in next month or two. All non-Fermi jobs will eventually go through site Fermigrid gatekeeper and be farmed out to the other clusters.

22 Farms User Meeting April 27 2005--Steven Timm 22 Access to mass storage Study currently under way. Encp access to Enstore will remain available from the head node. Want to open dccp, gridftp, srmcp interfaces to dCache Before this is done, more study needed on –Authentication mechanisms—can we access mass storage from the worker nodes –Resource load—public dCache would need to expand its disk pool if the demand increases significantly.

23 Farms User Meeting April 27 2005--Steven Timm 23 Support and Documentation http://grid.fnal.gov/fermigrid http://www- oss.fnal.gov/scs/public/farms/grid/http://www- oss.fnal.gov/scs/public/farms/grid/ http://www.ivdgl.org/osg-int/ http://plone.opensciencegrid.org/ http://www.opensciencegrid.org/ http://www.cs.wisc.edu/vdt http://www.cs.wisc.edu/condor

24 Farms User Meeting April 27 2005--Steven Timm 24 Things to watch and try http://www-oss.fnal.gov/scs/public/farms/grid/ being continuously updated as we know more about what works.http://www-oss.fnal.gov/scs/public/farms/grid/ Hope to add sample Condor jobs shortly Those familiar with Condor can log into fngp-osg and try to submit local test jobs now. –Source /export/osg/grid/setup.csh to get all the software setup Grid job submission won’t work until we get the virtual organizations populated (except for SDSS). More presentations coming at these meetings in weeks to come Hope to organize a workshop this summer.


Download ppt "Farms User Meeting April 27 2005--Steven Timm 1 Farms Users meeting 4/27/2005"

Similar presentations


Ads by Google