Download presentation
Presentation is loading. Please wait.
Published byClifford Cannon Modified over 9 years ago
1
Condor Project Computer Sciences Department University of Wisconsin-Madison condor-admin@cs.wisc.edu http://www.cs.wisc.edu/condor Grids and Condor Barcelona, 2006
2
2 http://www.cs.wisc.edu/condor Agenda Extended user’s tutorial Advanced Uses of Condor Java programs DAGMan Stork MW Grid Computing Case studies, and a discussion of your application‘s needs
3
3 http://www.cs.wisc.edu/condor Resources There are many resources (machines) in the world, and many are or can be made available! Groups of machines may be labeled as grids Welcome to the power of the grid !
4
4 http://www.cs.wisc.edu/condor Condor and Grids Condor has always been a tool to harness grid computing Condor’s mechanisms have evolved as technologies have evolved. Roughly categorized: Flocking Glidein The grid universe
5
5 http://www.cs.wisc.edu/condor Flocking A way for jobs to run within a different, separate Condor pool Condor runs here, and Condor runs there here there
6
6 http://www.cs.wisc.edu/condor Connect Condor Pools with Flocking Flocking is a Condor-specific technology Flocking is enabled with configuration Jobs flock from here to there when they cannot be run here due to lack of available machines
7
7 http://www.cs.wisc.edu/condor Configuration Configuration files contain lots of the administrative information used by Condor Format is like that in submit description files: AttributeName = Value
8
8 http://www.cs.wisc.edu/condor Configuration here For jobs to be able to flock from here to there In the configuration file on the pool where jobs flock from: FLOCK_TO = FLOCK_COLLECTOR_HOSTS = $(FLOCK_TO) FLOCK_NEGOTIATOR_HOSTS = $(FLOCK_TO) HOSTALLOW_NEGOTIATOR_SCHEDD = $(COLLECTOR_HOST), $(FLOCK_NEGOTIATOR_HOSTS)
9
9 http://www.cs.wisc.edu/condor Configuration there In the configuration file on the pool where jobs flock to: FLOCK_FROM =,..., To make security work: HOSTALLOW_WRITE_COLLECTOR = $(HOSTALLOW_WRITE), $(FLOCK_FROM) HOSTALLOW_WRITE_STARTD = $(HOSTALLOW_WRITE), $(FLOCK_FROM) HOSTALLOW_READ_COLLECTOR = $(HOSTALLOW_READ), $(FLOCK_FROM) HOSTALLOW_READ_STARTD = $(HOSTALLOW_READ), $(FLOCK_FROM)
10
10 http://www.cs.wisc.edu/condor Submit Description File Enable file transfer: universe = vanilla executable = myjob.exe input = myjob.input output = myjob.output log = myjob.log should_transfer_files = YES when_to_transfer_output = ON_EXIT queue
11
11 http://www.cs.wisc.edu/condor The Glidein Concept Assume: We need more machines, and we have permission to use a set of machines Glidein temporarily adds a set of machines to the local pool
12
12 http://www.cs.wisc.edu/condor Glidein In addition, Glidein solves the problem: “My job needs to run on that particular resource, and my job needs Condor.” For example: a job that must run under the standard universe
13
13 http://www.cs.wisc.edu/condor Glidein Condor sends and runs its own executables on the resource The needed resource appears to temporarily join the local Condor pool !
14
14 http://www.cs.wisc.edu/condor Glidein run condor_glidein to add the remote resource to the local pool local pool remote resource the master and startd daemons become grid universe jobs using gt2
15
15 http://www.cs.wisc.edu/condor Making Glidein Work Change the configuration to give access permission ( HOSTALLOW_WRITE ) to the remote resource No changes to jobs’ submit description files! But, do enable file transfer in the submit description file: universe = vanilla executable = myjob.exe input = myjob.input output = myjob.output log = myjob.log should_transfer_files = YES when_to_transfer_output = ON_EXIT queue
16
16 http://www.cs.wisc.edu/condor Force Job to Glidein Resource In the submit description file: universe = standard executable = ajob.exe input = ajob.input output = ajob.output log = ajob.log requirements = \ ( machine == “example.mcs.anl.gov" ) \ && Arch != "" && OpSys != "" queue
17
17 http://www.cs.wisc.edu/condor The Grid Universe Most useful when 1.We want to send a job off to a far away machine 2.We want to hand a job to another batch processing system on the local machine 3.We want to send a job off to a far away machine, in order to hand that job to another batch processing system on that machine
18
18 http://www.cs.wisc.edu/condor The Grid Universe All handled in the submit description file Supports several back end types: Globus: GT2, GT3, GT4 NorduGrid UNICORE Condor PBS LSF
19
19 http://www.cs.wisc.edu/condor Condor-G Condor-G describes jobs to be handed off to a machine, and the machine is utilizing Globus middleware gt 2: Globus Toolkit 1 or 2 or the pre-web services GRAM gt 3: Globus Toolkit 3 gt 4: Globus Toolkit 4 or WS GRAM
20
20 http://www.cs.wisc.edu/condor Submit Description File For gt2: universe = grid input = job1.input output = job1.result log = job1.log grid_resource = gt2 example.wisc.edu/jobmanager queue jobmanager jobmanager-condor jobmanager-pbs jobmanager-lsf jobmanager-sge One of:
21
21 http://www.cs.wisc.edu/condor For gt3: universe = grid input = job2.input output = job2.result log = job2.log grid_resource = gt3 http://198.51.254.40:8080/osga/services/base /gram/XXXManagedJobFactoryService queue Submit Description File Fork Condor PBS LSF SGE XXX is one of: IP address:Port number
22
22 http://www.cs.wisc.edu/condor For gt4: universe = grid input = job3.input output = job3.result log = job3.log grid_resource = gt4 https://198.51.254.40:8080/wsrf/ service/ManagedJobFactoryService XXX queue Submit Description File Fork Condor PBS LSF SGE XXX is one of: IP address:Port number OR Host name:Port number
23
23 http://www.cs.wisc.edu/condor Nordugrid and the Submit Description File universe = grid input = job4.input output = job4.result log = job4.log grid_resource = nordugrid ngexample.com queue
24
24 http://www.cs.wisc.edu/condor Unicore and the Submit Description File universe = grid input = job5.input output = job5.result log = job5.log grid_resource = unicore usite.example.com vsite keystore_file = /frieda/certificates/keystore keystore_alias = “frieda” keystore_passphrase_file = /frieda/private/passphrase queue vsite is the name of the Unicore virtual resource
25
25 http://www.cs.wisc.edu/condor PBS and the Submit Description File Details of the PBS installation in $(GLITE_LOCATION)/etc/batch_gahp.config universe = grid input = job6.input output = job6.result log = job6.log grid_resource = pbs queue
26
26 http://www.cs.wisc.edu/condor LSF and the Submit Description File Details of the LSF installation in $(GLITE_LOCATION)/etc/batch_gahp.config universe = grid input = job7.input output = job7.result log = job7.log grid_resource = lsf queue
27
27 http://www.cs.wisc.edu/condor Condor-C Condor is running here, and Condor is running over there For the case where We want to send a job off to a far away machine, in order to hand that job to another batch processing system on that machine
28
28 http://www.cs.wisc.edu/condor Condor-C and the Submit Description File universe = grid input = job8.input output = job8.result log = job8.log grid_resource = condor joe@remotemachine.example.com remotecentralmanager.example.com +remote_jobuniverse = 5 +remote_requirements = True +remote_ShouldTransferFiles = "YES" +remote_WhenToTransferOutput = "ON_EXIT" queue schedd name collector machine name vanilla universe
29
29 http://www.cs.wisc.edu/condor Credentials Not just anybody can use any resource at any time... Key concepts: Authentication verification of an identity Authorization permission to do something
30
30 http://www.cs.wisc.edu/condor Authentication If Frieda says “I am Frieda.”, how do we distinguish this from if Frieda says “I am George Bush.” ?
31
31 http://www.cs.wisc.edu/condor Authentication Bush can do whatever he pleases If Frieda claims to be Bush, (and this is accepted), then Frieda can do whatever she pleases Authentication attempts to verify the identity of the entity that is communicating
32
32 http://www.cs.wisc.edu/condor Authorization Who is allowed (permitted) to do what Frieda may run gt4 jobs on the Open Science Grid machines Fred may write to files in /usr/bin the Unix user root may do anything! Can be implemented with a list of those authorized
33
33 http://www.cs.wisc.edu/condor Condor and Authentication Authentication within Condor comes in many forms. Here are three. 1.File system: Have the entity write a file. The OS attaches a name to the file owner. Condor checks that the entity’s claim is the same as the file owner. 2.GSI (Grid Security Infrastructure) 3.Kerberos
34
34 http://www.cs.wisc.edu/condor Authentication Idea A centralized certificate authority (CA) does verification of an entity’s identity. When satisfied, the CA issues a signed certificate (also called a credential) I am Frieda CA
35
35 http://www.cs.wisc.edu/condor Authentication To authenticate, the entity presents the certificate All is well, if we trust the CA and the remote machine I am Frieda CA
36
36 http://www.cs.wisc.edu/condor GSI Authentication GSI uses X.509 certificates Grid universe, submitting to back end types using Globus middleware (gt2, gt3, gt4), as well as nordugrid, and unicore use X.509 certificates Condor can also use GSI
37
37 http://www.cs.wisc.edu/condor Revocation, Trust, and Proxies The CA may revoke a credential Frieda gives the signed credential to the remote machine. If the remote machine is malicious, it could impersonate Frieda. Therefore, a password protects the credential. A proxy is a credential that includes the password, but is only valid for a specific (short) time period. MyProxy software enables GSI proxy management
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.