Presentation is loading. Please wait.

Presentation is loading. Please wait.

Condor Project Computer Sciences Department University of Wisconsin-Madison Grids and Condor Barcelona,

Similar presentations


Presentation on theme: "Condor Project Computer Sciences Department University of Wisconsin-Madison Grids and Condor Barcelona,"— Presentation transcript:

1 Condor Project Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu http://www.cs.wisc.edu/condor Grids and Condor Barcelona, 2006

2 2 http://www.cs.wisc.edu/condor Agenda  Extended user’s tutorial  Advanced Uses of Condor Java programs DAGMan Stork MW Grid Computing  Case studies, and a discussion of your application‘s needs

3 3 http://www.cs.wisc.edu/condor Resources  There are many resources (machines) in the world, and many are or can be made available!  Groups of machines may be labeled as grids  Welcome to the power of the grid !

4 4 http://www.cs.wisc.edu/condor Condor and Grids  Condor has always been a tool to harness grid computing  Condor’s mechanisms have evolved as technologies have evolved. Roughly categorized:  Flocking  Glidein  The grid universe

5 5 http://www.cs.wisc.edu/condor Flocking A way for jobs to run within a different, separate Condor pool Condor runs here, and Condor runs there here there

6 6 http://www.cs.wisc.edu/condor Connect Condor Pools with Flocking  Flocking is a Condor-specific technology  Flocking is enabled with configuration  Jobs flock from here to there when they cannot be run here due to lack of available machines

7 7 http://www.cs.wisc.edu/condor Configuration  Configuration files contain lots of the administrative information used by Condor  Format is like that in submit description files: AttributeName = Value

8 8 http://www.cs.wisc.edu/condor Configuration here  For jobs to be able to flock from here to there  In the configuration file on the pool where jobs flock from: FLOCK_TO = FLOCK_COLLECTOR_HOSTS = $(FLOCK_TO) FLOCK_NEGOTIATOR_HOSTS = $(FLOCK_TO) HOSTALLOW_NEGOTIATOR_SCHEDD = $(COLLECTOR_HOST), $(FLOCK_NEGOTIATOR_HOSTS)

9 9 http://www.cs.wisc.edu/condor Configuration there  In the configuration file on the pool where jobs flock to: FLOCK_FROM =,...,  To make security work: HOSTALLOW_WRITE_COLLECTOR = $(HOSTALLOW_WRITE), $(FLOCK_FROM) HOSTALLOW_WRITE_STARTD = $(HOSTALLOW_WRITE), $(FLOCK_FROM) HOSTALLOW_READ_COLLECTOR = $(HOSTALLOW_READ), $(FLOCK_FROM) HOSTALLOW_READ_STARTD = $(HOSTALLOW_READ), $(FLOCK_FROM)

10 10 http://www.cs.wisc.edu/condor Submit Description File Enable file transfer: universe = vanilla executable = myjob.exe input = myjob.input output = myjob.output log = myjob.log should_transfer_files = YES when_to_transfer_output = ON_EXIT queue

11 11 http://www.cs.wisc.edu/condor The Glidein Concept  Assume: We need more machines, and we have permission to use a set of machines  Glidein temporarily adds a set of machines to the local pool

12 12 http://www.cs.wisc.edu/condor Glidein  In addition, Glidein solves the problem: “My job needs to run on that particular resource, and my job needs Condor.”  For example: a job that must run under the standard universe

13 13 http://www.cs.wisc.edu/condor Glidein  Condor sends and runs its own executables on the resource  The needed resource appears to temporarily join the local Condor pool !

14 14 http://www.cs.wisc.edu/condor Glidein run condor_glidein to add the remote resource to the local pool local pool remote resource the master and startd daemons become grid universe jobs using gt2

15 15 http://www.cs.wisc.edu/condor Making Glidein Work  Change the configuration to give access permission ( HOSTALLOW_WRITE ) to the remote resource  No changes to jobs’ submit description files!  But, do enable file transfer in the submit description file: universe = vanilla executable = myjob.exe input = myjob.input output = myjob.output log = myjob.log should_transfer_files = YES when_to_transfer_output = ON_EXIT queue

16 16 http://www.cs.wisc.edu/condor Force Job to Glidein Resource In the submit description file: universe = standard executable = ajob.exe input = ajob.input output = ajob.output log = ajob.log requirements = \ ( machine == “example.mcs.anl.gov" ) \ && Arch != "" && OpSys != "" queue

17 17 http://www.cs.wisc.edu/condor The Grid Universe Most useful when 1.We want to send a job off to a far away machine 2.We want to hand a job to another batch processing system on the local machine 3.We want to send a job off to a far away machine, in order to hand that job to another batch processing system on that machine

18 18 http://www.cs.wisc.edu/condor The Grid Universe  All handled in the submit description file  Supports several back end types:  Globus: GT2, GT3, GT4  NorduGrid  UNICORE  Condor  PBS  LSF

19 19 http://www.cs.wisc.edu/condor Condor-G  Condor-G describes jobs to be handed off to a machine, and the machine is utilizing Globus middleware  gt 2: Globus Toolkit 1 or 2 or the pre-web services GRAM  gt 3: Globus Toolkit 3  gt 4: Globus Toolkit 4 or WS GRAM

20 20 http://www.cs.wisc.edu/condor Submit Description File For gt2: universe = grid input = job1.input output = job1.result log = job1.log grid_resource = gt2 example.wisc.edu/jobmanager queue jobmanager jobmanager-condor jobmanager-pbs jobmanager-lsf jobmanager-sge One of:

21 21 http://www.cs.wisc.edu/condor For gt3: universe = grid input = job2.input output = job2.result log = job2.log grid_resource = gt3 http://198.51.254.40:8080/osga/services/base /gram/XXXManagedJobFactoryService queue Submit Description File Fork Condor PBS LSF SGE XXX is one of: IP address:Port number

22 22 http://www.cs.wisc.edu/condor For gt4: universe = grid input = job3.input output = job3.result log = job3.log grid_resource = gt4 https://198.51.254.40:8080/wsrf/ service/ManagedJobFactoryService XXX queue Submit Description File Fork Condor PBS LSF SGE XXX is one of: IP address:Port number OR Host name:Port number

23 23 http://www.cs.wisc.edu/condor Nordugrid and the Submit Description File universe = grid input = job4.input output = job4.result log = job4.log grid_resource = nordugrid ngexample.com queue

24 24 http://www.cs.wisc.edu/condor Unicore and the Submit Description File universe = grid input = job5.input output = job5.result log = job5.log grid_resource = unicore usite.example.com vsite keystore_file = /frieda/certificates/keystore keystore_alias = “frieda” keystore_passphrase_file = /frieda/private/passphrase queue vsite is the name of the Unicore virtual resource

25 25 http://www.cs.wisc.edu/condor PBS and the Submit Description File  Details of the PBS installation in $(GLITE_LOCATION)/etc/batch_gahp.config universe = grid input = job6.input output = job6.result log = job6.log grid_resource = pbs queue

26 26 http://www.cs.wisc.edu/condor LSF and the Submit Description File  Details of the LSF installation in $(GLITE_LOCATION)/etc/batch_gahp.config universe = grid input = job7.input output = job7.result log = job7.log grid_resource = lsf queue

27 27 http://www.cs.wisc.edu/condor Condor-C  Condor is running here, and Condor is running over there  For the case where We want to send a job off to a far away machine, in order to hand that job to another batch processing system on that machine

28 28 http://www.cs.wisc.edu/condor Condor-C and the Submit Description File universe = grid input = job8.input output = job8.result log = job8.log grid_resource = condor joe@remotemachine.example.com remotecentralmanager.example.com +remote_jobuniverse = 5 +remote_requirements = True +remote_ShouldTransferFiles = "YES" +remote_WhenToTransferOutput = "ON_EXIT" queue schedd name collector machine name vanilla universe

29 29 http://www.cs.wisc.edu/condor Credentials  Not just anybody can use any resource at any time...  Key concepts: Authentication verification of an identity Authorization permission to do something

30 30 http://www.cs.wisc.edu/condor Authentication If Frieda says “I am Frieda.”, how do we distinguish this from if Frieda says “I am George Bush.” ?

31 31 http://www.cs.wisc.edu/condor Authentication  Bush can do whatever he pleases  If Frieda claims to be Bush, (and this is accepted), then Frieda can do whatever she pleases  Authentication attempts to verify the identity of the entity that is communicating

32 32 http://www.cs.wisc.edu/condor Authorization  Who is allowed (permitted) to do what  Frieda may run gt4 jobs on the Open Science Grid machines  Fred may write to files in /usr/bin  the Unix user root may do anything!  Can be implemented with a list of those authorized

33 33 http://www.cs.wisc.edu/condor Condor and Authentication Authentication within Condor comes in many forms. Here are three. 1.File system: Have the entity write a file. The OS attaches a name to the file owner. Condor checks that the entity’s claim is the same as the file owner. 2.GSI (Grid Security Infrastructure) 3.Kerberos

34 34 http://www.cs.wisc.edu/condor Authentication Idea A centralized certificate authority (CA) does verification of an entity’s identity. When satisfied, the CA issues a signed certificate (also called a credential) I am Frieda CA

35 35 http://www.cs.wisc.edu/condor Authentication To authenticate, the entity presents the certificate All is well, if we trust the CA and the remote machine I am Frieda CA

36 36 http://www.cs.wisc.edu/condor GSI Authentication  GSI uses X.509 certificates  Grid universe, submitting to back end types using Globus middleware (gt2, gt3, gt4), as well as nordugrid, and unicore use X.509 certificates  Condor can also use GSI

37 37 http://www.cs.wisc.edu/condor Revocation, Trust, and Proxies  The CA may revoke a credential  Frieda gives the signed credential to the remote machine. If the remote machine is malicious, it could impersonate Frieda. Therefore, a password protects the credential.  A proxy is a credential that includes the password, but is only valid for a specific (short) time period.  MyProxy software enables GSI proxy management


Download ppt "Condor Project Computer Sciences Department University of Wisconsin-Madison Grids and Condor Barcelona,"

Similar presentations


Ads by Google