Progress Report Barnett Chiu
Glidein Code Updates and Tests (1) Major modifications to condor_glidein code are as follows: 1. Command Options: 1a. an option "type" is added to select between 1a. an option "type" is added to select between schedd and startd glidein with default being startd. schedd and startd glidein with default being startd. 1b. an option “tcp” is added to force TCP connection 1b. an option “tcp” is added to force TCP connection 1c. Other options will be included for selecting gram services 1c. Other options will be included for selecting gram services and supporting batch systems such as PBS and LSF. and supporting batch systems such as PBS and LSF. 2. DAEMON_LIST: 2a. For startd-based glidein, have master spawns the startd 2a. For startd-based glidein, have master spawns the startd This is done by including master and startd in the DAEMON_LIST This is done by including master and startd in the DAEMON_LIST 2b. For schedd-based glidein, have master spawns the schedd 2b. For schedd-based glidein, have master spawns the schedd Similarly, include both master and schedd in DAEMON_LIST Similarly, include both master and schedd in DAEMON_LIST
Glidein Code Updates and Tests (2) 3. Added code to adjust $SERVER_URL based on type of glidein e.g. GLIDEIN_SERVER_URL can be set to: e.g. GLIDEIN_SERVER_URL can be set to: Roughly speaking, the way I distinguish between startd and schedd glidein is Roughly speaking, the way I distinguish between startd and schedd glidein is that at the URL for schedd-based glidein should contain schedd_based that at the URL for schedd-based glidein should contain schedd_based directory … directory …
Glidein Code Updates and Tests (3) 4. Added a function named gen_main_schedd_config () that sets up schedd-related configurations. schedd-related configurations. 5. in do_remote_setup(), use a function pointer choose to either gen_main_schedd_config () or gen_main_config(), i.e. functions that generates necessary configurations for schedd glidein and startd glidein respectively. Function pointer offers the flexibility for choosing different types of glideins. startd glidein respectively. Function pointer offers the flexibility for choosing different types of glideins. E.g. Schedd-glidein can be further categorized in terms of supporting E.g. Schedd-glidein can be further categorized in terms of supporting different batch systems it supports such as LSF, PBS or other types of different batch systems it supports such as LSF, PBS or other types of batch systems as the grid technology evolves… batch systems as the grid technology evolves… E.g. Other types of glideins as Condor evolves… E.g. Other types of glideins as Condor evolves…
Glidein Code Updates and Tests (3) Authentication Authentication When condor_submit talks to schedd, it needs to authenticate itself When condor_submit talks to schedd, it needs to authenticate itself Several authentication schemes can be chosen: FS, KERBEROS, Several authentication schemes can be chosen: FS, KERBEROS, GSI, CLAIMTOBE GSI, CLAIMTOBE Configuration Configuration SEC_DEFAULT_AUTHENTICATION = OPTIONAL (or REQUIRED) SEC_DEFAULT_AUTHENTICATION = OPTIONAL (or REQUIRED) SEC_DEFAULT_AUTHENTICATION_METHODS = FS, GSI, SEC_DEFAULT_AUTHENTICATION_METHODS = FS, GSI, KERBEROS, CLAIMTOBE KERBEROS, CLAIMTOBE Both the submit machine and the glidein configuration file have to use Both the submit machine and the glidein configuration file have to use the same settings. the same settings. For the testing phase, use CLAIMTOBE so that the schedd trusts whoever For the testing phase, use CLAIMTOBE so that the schedd trusts whoever executes condor_submit executes condor_submit
Schedd-Glidein Demo (1) Command: // schedd glidein #1 Command: // schedd glidein #1 condor_glidein -count 1 -arch i686-pc-Linux-2.4 -setup_jobmanager=jobmanager-fork gridgk01.racf.bnl.gov/jobmanager-fork -type schedd –forcesetup condor_glidein -count 1 -arch i686-pc-Linux-2.4 -setup_jobmanager=jobmanager-fork gridgk01.racf.bnl.gov/jobmanager-fork -type schedd –forcesetup Command: // schedd glidein #2 Command: // schedd glidein #2 condor_glidein -count 1 -arch i686-pc-Linux-2.4 -setup_jobmanager=jobmanager-fork gridgk02.racf.bnl.gov/jobmanager-fork -type schedd –forcesetup condor_glidein -count 1 -arch i686-pc-Linux-2.4 -setup_jobmanager=jobmanager-fork gridgk02.racf.bnl.gov/jobmanager-fork -type schedd –forcesetup Command : // schedd glidein # 3, #4, #5 Command : // schedd glidein # 3, #4, #5 condor_glidein -count 3 -arch i686-pc-Linux-2.4 -setup_jobmanager=jobmanager-fork nostos.cs.wisc.edu/jobmanager-condor -type schedd –forcesetup condor_glidein -count 3 -arch i686-pc-Linux-2.4 -setup_jobmanager=jobmanager-fork nostos.cs.wisc.edu/jobmanager-condor -type schedd –forcesetup
Schedd-Glidein Demo (2) Command: condor_status -schedd Name Machine TotalRunningJobs TotalIdleJobs TotalHeldJobs gridgk01.r gridgk02.r gridui01.u ribera.cs ron.cs.wis vail.cs.wi TotalRunningJobs TotalIdleJobs TotalHeldJobs TotalRunningJobs TotalIdleJobs TotalHeldJobs Total Total 0 0 0
Demo (3) Command Command condor_status -schedd -l | grep -i Name | sed -e 's/Name[ ]*=[ condor_status -schedd -l | grep -i Name | sed -e 's/Name[ ]*=[ Output Output
Demo (4) Command: condor_status -schedd -long -constraint "is_glidein=?=true" or customized command condor_schedd_ad [schedd_name] or customized command condor_schedd_ad [schedd_name] MyType = "Scheduler“ TargetType = "" IS_GLIDEIN = TRUE CondorVersion = "$CondorVersion: Sep $" CondorPlatform = "$CondorPlatform: I386-LINUX_RHEL3 $" Machine = "ron.cs.wisc.edu" QuillEnabled = FALSE ScheddIpAddr = " " MyAddress = " " NumUsers = 0 Name = VirtualMemory = 0 TotalIdleJobs = 0 TotalRunningJobs = 0
Demo (5) How to submit jobs? How to submit jobs? Command: Command: condor_submit cgtest1 -remote condor_submit cgtest1 -remote Output: Output: condor_submit cgtest1 -remote condor_submit cgtest1 -remote Submitting job(s) Submitting job(s) WARNING: Log file /direct/usatlas+u/pleiades/test/log/nostos_echo.1.0 WARNING: Log file /direct/usatlas+u/pleiades/test/log/nostos_echo.1.0 is on NFS. is on NFS. This could cause log file corruption and is _not_ recommended. This could cause log file corruption and is _not_ recommended.. Logging submit event(s). Logging submit event(s). 1 job(s) submitted to cluster 1. 1 job(s) submitted to cluster 1. Spooling data files for 1 jobs... Spooling data files for 1 jobs... In PilotFactory project, cgtest1 would be replaced by a wrapper of pilotScheduler.py and its dependent programs included in transfer_input_files, so that the job that contains pilotScheduler program (i.e. Generator) can be submitted to the glidein schedd as a Condor-C job and then runs within the schedd as a scheduler universe job. In PilotFactory project, cgtest1 would be replaced by a wrapper of pilotScheduler.py and its dependent programs included in transfer_input_files, so that the job that contains pilotScheduler program (i.e. Generator) can be submitted to the glidein schedd as a Condor-C job and then runs within the schedd as a scheduler universe job. For more information, please check GPF in Pilot Factory Proposal For more information, please check GPF in Pilot Factory ProposalPilot Factory ProposalPilot Factory Proposal
Demo (6) Command: Command: condor_q -name condor_q -name Output: Output: -- Schedd: : -- Schedd: : ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 1.0 pleiades 2/26 15: :00:00 C ps auwfx 1.0 pleiades 2/26 15: :00:00 C ps auwfx 0 jobs; 0 idle, 0 running, 0 held 0 jobs; 0 idle, 0 running, 0 held
Documentation Updating Twiki page on Schedd-based Glidein Updating Twiki page on Schedd-based Glidein Condor-G and Glidein Performance and Functionality Accessment Condor-G and Glidein Performance and Functionality Accessment
Condor Utilities (1) For condor-G general tests, it is inconvenient to recreate job submission files … For condor-G general tests, it is inconvenient to recreate job submission files … condor_gen_gridjob: a program that automatically generates the submit file with condor_gen_gridjob: a program that automatically generates the submit file with simply a command: simply a command: [comm] condor_gen_gridjob --exec $HOME/myprog [comm] condor_gen_gridjob --exec $HOME/myprog --out $HOME/condor_test/ouput --out $HOME/condor_test/ouput --in $HOME/condor_test/input … --in $HOME/condor_test/input … [other commands] condor_gen_ccjob, condor_gen_vanilla, … etc [other commands] condor_gen_ccjob, condor_gen_vanilla, … etc Checking the individual classad published by a particular *schedd* Checking the individual classad published by a particular *schedd* e.g. Use condor_status –schedd –long to check for all *schedd* classads; e.g. Use condor_status –schedd –long to check for all *schedd* classads; however, it’s not straightforward for checking the published classad assoicated however, it’s not straightforward for checking the published classad assoicated with a particular instance of *schedd* condor_schedd_ad (done) with a particular instance of *schedd* condor_schedd_ad (done) [comm] condor_schedd_ad [comm] condor_schedd_ad
Condor Utilities (2) List the current avaiable *schedd* and check some of the important properties List the current avaiable *schedd* and check some of the important properties [usage] condor_schedd_list [-g|-h | … ] [usage] condor_schedd_list [-g|-h | … ] [comm] condor_schedd_list –g [comm] condor_schedd_list –g [output] [output] Listing glidein *schedd*... Some options for checking individual properties of a *schedd* are under way … Some options for checking individual properties of a *schedd* are under way … e.g. Machine = "tier2-02.uchicago.edu“ e.g. Machine = "tier2-02.uchicago.edu“ ScheddIpAddr = " “ ScheddIpAddr = " “ Name = (often needs to use in combination with other Name = (often needs to use in combination with other commands, e.g. submit jobs) commands, e.g. submit jobs) DaemonStartTime = DaemonStartTime = …
Condor Utilities (3) Other utilities for debugging Other utilities for debugging condor_pid_lookup condor_pid_lookup [comm] condor_pid_lookup -c gridgk01.racf.bnl.gov [comm] condor_pid_lookup -c gridgk01.racf.bnl.gov [output] [output] USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND agrd ? S Feb27 3:06 /usatlas/grid/agrd0926/Condor_glidein/6.8.1-i686-pc-Linux-2.4/condor_master -dyn –f Or, vise versa … Or, vise versa … [comm] condor_pid_lookup -c gridgk01.racf.bnl.gov condor_master [comm] condor_pid_lookup -c gridgk01.racf.bnl.gov condor_master condor_schedd_time condor_schedd_time [ comm] condor_schedd_time [ comm] condor_schedd_time [output] Fri 23 Feb :18:31 AM EST [output] Fri 23 Feb :18:31 AM EST [usage] degugging, can be used in combination with gridmanager log file and extract the desired [usage] degugging, can be used in combination with gridmanager log file and extract the desired section of information (condor_pid_lookup + condor_schedd_time) section of information (condor_pid_lookup + condor_schedd_time)