Presentation is loading. Please wait.

Presentation is loading. Please wait.

John Kewley e-Science Centre CCLRC Daresbury Laboratory 15 th March 2005 Paradyn / Condor Week Madison, WI Caging the CCLRC Compute Zoo (Activities at.

Similar presentations


Presentation on theme: "John Kewley e-Science Centre CCLRC Daresbury Laboratory 15 th March 2005 Paradyn / Condor Week Madison, WI Caging the CCLRC Compute Zoo (Activities at."— Presentation transcript:

1 John Kewley e-Science Centre CCLRC Daresbury Laboratory 15 th March 2005 Paradyn / Condor Week Madison, WI Caging the CCLRC Compute Zoo (Activities at CCLRC) John Kewley j.kewley@dl.ac.uk http://www.e-science.clrc.ac.uk/web/staff/john_kewley

2 Presenter Name Facility Name John Kewley e-Science Centre CCLRC Daresbury Laboratory 15 th March 2005 Paradyn / Condor Week Madison, WI Outline What is a Compute Zoo? Caging Problems A Trip to the Zoo Uses for a Compute Zoo

3 John Kewley e-Science Centre CCLRC Daresbury Laboratory 15 th March 2005 Paradyn / Condor Week Madison, WI What is a Compute Zoo?

4 Presenter Name Facility Name John Kewley e-Science Centre CCLRC Daresbury Laboratory 15 th March 2005 Paradyn / Condor Week Madison, WI Compute Farm Homogenous: large numbers of (near) identical resources Often co-located physically: a training room, lab workstations or a large cluster Centrally managed, often by dedicated staff Typical of many Condor Pools: excellent for High Throughput Computing

5 Presenter Name Facility Name John Kewley e-Science Centre CCLRC Daresbury Laboratory 15 th March 2005 Paradyn / Condor Week Madison, WI Compute Farm

6 Presenter Name Facility Name John Kewley e-Science Centre CCLRC Daresbury Laboratory 15 th March 2005 Paradyn / Condor Week Madison, WI Compute Zoo Heterogeneous: resources are of many different operating systems and architectures Located across a site Individually, or variously managed Of minimal use for HTC

7 Presenter Name Facility Name John Kewley e-Science Centre CCLRC Daresbury Laboratory 15 th March 2005 Paradyn / Condor Week Madison, WI Compute Zoo

8 John Kewley e-Science Centre CCLRC Daresbury Laboratory 15 th March 2005 Paradyn / Condor Week Madison, WI Caging Problems (Firewall Mirroring)

9 Presenter Name Facility Name John Kewley e-Science Centre CCLRC Daresbury Laboratory 15 th March 2005 Paradyn / Condor Week Madison, WI Firewalls within a Condor Pool Some resource owners have firewalls on their personal workstations Since Condor needs each submit node to be able to talk to every potential execute node, this necessitates the opening of every firewall in the pool to every submit node when it is added. Between adding the new node and the firewalls being updated, the firewalled nodes will be unavailable for use. Or are they? Maybe someone should tell Condor!

10 Presenter Name Facility Name John Kewley e-Science Centre CCLRC Daresbury Laboratory 15 th March 2005 Paradyn / Condor Week Madison, WI Adding a new machine to the pool If we add a new machine to the pool, the existing firewalls may not have anticipated this. The firewalls will likely block this new machine A Job may still match for the newly added machine to the firewalled resource. This job will not be able to run Parts of the system can jam as a result. o condor_q on submitting node oSubsequent parts of the submit script o(maybe also parts of the central node)

11 Presenter Name Facility Name John Kewley e-Science Centre CCLRC Daresbury Laboratory 15 th March 2005 Paradyn / Condor Week Madison, WI Private networks Similar "jams" occur if part of your pool (or flock of pools) is on a network that is unavailable to some of the other nodes How can we permit jobs from submit nodes that can access the private network to run on these nodes whilst preventing Condor sending jobs from other submit nodes there?

12 Presenter Name Facility Name John Kewley e-Science Centre CCLRC Daresbury Laboratory 15 th March 2005 Paradyn / Condor Week Madison, WI How can we get round this? 1.Restrict the number of submit nodes 2.Automatically update the firewall files 3.Ensure everything is up-to-date 4.Permit pool to evolve whilst persuading Condor to “avoid” going to nodes where the job can’t run

13 Presenter Name Facility Name John Kewley e-Science Centre CCLRC Daresbury Laboratory 15 th March 2005 Paradyn / Condor Week Madison, WI Firewall Mirroring (1) 1.Each machine with a firewall declares the fact in its ClassAds: HAS_FIREWALL = TRUE 2.Also, which machines and/or subnets it permits to access its Condor ports (mirroring FW table settings): FW_ALLOWS_113 = TRUE FW_ALLOWS_rjavig6 = TRUE 3.Finally, it needs to export these settings: STARTD_EXPRS = HAS_FIREWALL, FW_ALLOWS_113, \ FW_ALLOWS_rjavig6

14 Presenter Name Facility Name John Kewley e-Science Centre CCLRC Daresbury Laboratory 15 th March 2005 Paradyn / Condor Week Madison, WI Firewall Mirroring (2) To ensure that jobs can only go to resources they can reach, 1.Ensure that submit machines declare their subnet and hostname: MY_SUBNET = 113 MY_HOST = condor 2.Use these value in the following expression which is added to all REQUIREMENTS for jobs from this machine: APPEND_REQUIREMENTS = ( \ (HAS_FIREWALL =!= TRUE) || \ (FW_ALLOWS_$(MY_HOST) == TRUE) || \ (FW_ALLOWS_$(MY_SUBNET) == TRUE) )

15 Presenter Name Facility Name John Kewley e-Science Centre CCLRC Daresbury Laboratory 15 th March 2005 Paradyn / Condor Week Madison, WI And Private Networks? Same solution can be used for private networks by pretending they have a firewall and declaring which other nodes have access to that network

16 John Kewley e-Science Centre CCLRC Daresbury Laboratory 15 th March 2005 Paradyn / Condor Week Madison, WI A Trip to the Zoo (Viewing the Pool)

17 Presenter Name Facility Name John Kewley e-Science Centre CCLRC Daresbury Laboratory 15 th March 2005 Paradyn / Condor Week Madison, WI The CCLRC Compute Zoo 2x Windows XP Professional 2x Windows 2000 Professional 1x Windows NT 4.0 Workstation 7x SuSE Linux 9.0 2x SuSE Linux 8.0 1x SuSE Linux 9.1 5x White Box Enterprise Linux 3.0 1x Red Hat Enterprise Linux AS release 3.0 1x Red Hat Enterprise Linux WS release 3.0 3x Red Hat Linux 9 2x Red Hat Linux 8.0 2x Red Hat Linux 7.3 1x Mandrake Linux 10.1 1x Gentoo Linux 1.4

18 Presenter Name Facility Name John Kewley e-Science Centre CCLRC Daresbury Laboratory 15 th March 2005 Paradyn / Condor Week Madison, WI Viewing the Pool http://tardis.dl.ac.uk/Condor/cgi-bin/CondorStatus.cgi http://tardis.dl.ac.uk/Condor/cgi-bin/WiscStatus.cgi

19 John Kewley e-Science Centre CCLRC Daresbury Laboratory 15 th March 2005 Paradyn / Condor Week Madison, WI Uses of a Zoo

20 Presenter Name Facility Name John Kewley e-Science Centre CCLRC Daresbury Laboratory 15 th March 2005 Paradyn / Condor Week Madison, WI “Build and Test” The CCLRC pool was part of the UK Grid Engineering Task Force “Build and Test” project. Software bundles were distributed to a variety of OS types around the flocked pool for building and testing. This type of (flocked) pool relies on heterogeneity and small numbers of each type are all that are required. http://polaris.ecs.soton.ac.uk:65000/ http://wiki.nesc.ac.uk/read/sfct?HomePage

21 Presenter Name Facility Name John Kewley e-Science Centre CCLRC Daresbury Laboratory 15 th March 2005 Paradyn / Condor Week Madison, WI Other non-HTC Uses I want to ensure my code compiles without warnings and/or runs its basic tests on oAs many OSs as possible oWith as many different compilers as possible I want to perform a release build of my product for platform X, but I only have accounts on A, B and C I have several server-licensed products and many potential occasional users. How can this be made available to them more easily (within the bounds of the licence of course!)

22 John Kewley e-Science Centre CCLRC Daresbury Laboratory 15 th March 2005 Paradyn / Condor Week Madison, WI What other uses are there for a Compute Zoo?


Download ppt "John Kewley e-Science Centre CCLRC Daresbury Laboratory 15 th March 2005 Paradyn / Condor Week Madison, WI Caging the CCLRC Compute Zoo (Activities at."

Similar presentations


Ads by Google