Download presentation
Presentation is loading. Please wait.
Published byAshley Taylor Modified over 9 years ago
1
Progress Report Barnett Chiu 02.26.07 @BNL
2
Glidein Code Updates and Tests (1) Major modifications to condor_glidein code are as follows: 1. Command Options: 1a. an option "type" is added to select between 1a. an option "type" is added to select between schedd and startd glidein with default being startd. schedd and startd glidein with default being startd. 1b. an option “tcp” is added to force TCP connection 1b. an option “tcp” is added to force TCP connection 1c. Other options will be included for selecting gram services 1c. Other options will be included for selecting gram services and supporting batch systems such as PBS and LSF. and supporting batch systems such as PBS and LSF. 2. DAEMON_LIST: 2a. For startd-based glidein, have master spawns the startd 2a. For startd-based glidein, have master spawns the startd This is done by including master and startd in the DAEMON_LIST This is done by including master and startd in the DAEMON_LIST 2b. For schedd-based glidein, have master spawns the schedd 2b. For schedd-based glidein, have master spawns the schedd Similarly, include both master and schedd in DAEMON_LIST Similarly, include both master and schedd in DAEMON_LIST
3
Glidein Code Updates and Tests (2) 3. Added code to adjust $SERVER_URL based on type of glidein e.g. GLIDEIN_SERVER_URL can be set to: e.g. GLIDEIN_SERVER_URL can be set to: http://gridui01.usatlas.bnl.gov:25880/glidein/binaries/schedd_based http://gridui01.usatlas.bnl.gov:25880/glidein/binaries/schedd_based http://gridui01.usatlas.bnl.gov:25880/glidein/binaries/startd_based http://gridui01.usatlas.bnl.gov:25880/glidein/binaries/startd_based Roughly speaking, the way I distinguish between startd and schedd glidein is Roughly speaking, the way I distinguish between startd and schedd glidein is that at the URL for schedd-based glidein should contain schedd_based that at the URL for schedd-based glidein should contain schedd_based directory … directory …
4
Glidein Code Updates and Tests (3) 4. Added a function named gen_main_schedd_config () that sets up schedd-related configurations. schedd-related configurations. 5. in do_remote_setup(), use a function pointer choose to either gen_main_schedd_config () or gen_main_config(), i.e. functions that generates necessary configurations for schedd glidein and startd glidein respectively. Function pointer offers the flexibility for choosing different types of glideins. startd glidein respectively. Function pointer offers the flexibility for choosing different types of glideins. E.g. Schedd-glidein can be further categorized in terms of supporting E.g. Schedd-glidein can be further categorized in terms of supporting different batch systems it supports such as LSF, PBS or other types of different batch systems it supports such as LSF, PBS or other types of batch systems as the grid technology evolves… batch systems as the grid technology evolves… E.g. Other types of glideins as Condor evolves… E.g. Other types of glideins as Condor evolves…
5
Glidein Code Updates and Tests (3) Authentication Authentication When condor_submit talks to schedd, it needs to authenticate itself When condor_submit talks to schedd, it needs to authenticate itself Several authentication schemes can be chosen: FS, KERBEROS, Several authentication schemes can be chosen: FS, KERBEROS, GSI, CLAIMTOBE GSI, CLAIMTOBE Configuration Configuration SEC_DEFAULT_AUTHENTICATION = OPTIONAL (or REQUIRED) SEC_DEFAULT_AUTHENTICATION = OPTIONAL (or REQUIRED) SEC_DEFAULT_AUTHENTICATION_METHODS = FS, GSI, SEC_DEFAULT_AUTHENTICATION_METHODS = FS, GSI, KERBEROS, CLAIMTOBE KERBEROS, CLAIMTOBE Both the submit machine and the glidein configuration file have to use Both the submit machine and the glidein configuration file have to use the same settings. the same settings. For the testing phase, use CLAIMTOBE so that the schedd trusts whoever For the testing phase, use CLAIMTOBE so that the schedd trusts whoever executes condor_submit executes condor_submit
6
Schedd-Glidein Demo (1) Command: // schedd glidein #1 Command: // schedd glidein #1 condor_glidein -count 1 -arch 6.8.1-i686-pc-Linux-2.4 -setup_jobmanager=jobmanager-fork gridgk01.racf.bnl.gov/jobmanager-fork -type schedd –forcesetup condor_glidein -count 1 -arch 6.8.1-i686-pc-Linux-2.4 -setup_jobmanager=jobmanager-fork gridgk01.racf.bnl.gov/jobmanager-fork -type schedd –forcesetup Command: // schedd glidein #2 Command: // schedd glidein #2 condor_glidein -count 1 -arch 6.8.1-i686-pc-Linux-2.4 -setup_jobmanager=jobmanager-fork gridgk02.racf.bnl.gov/jobmanager-fork -type schedd –forcesetup condor_glidein -count 1 -arch 6.8.1-i686-pc-Linux-2.4 -setup_jobmanager=jobmanager-fork gridgk02.racf.bnl.gov/jobmanager-fork -type schedd –forcesetup Command : // schedd glidein # 3, #4, #5 Command : // schedd glidein # 3, #4, #5 condor_glidein -count 3 -arch 6.8.1-i686-pc-Linux-2.4 -setup_jobmanager=jobmanager-fork nostos.cs.wisc.edu/jobmanager-condor -type schedd –forcesetup condor_glidein -count 3 -arch 6.8.1-i686-pc-Linux-2.4 -setup_jobmanager=jobmanager-fork nostos.cs.wisc.edu/jobmanager-condor -type schedd –forcesetup
7
Schedd-Glidein Demo (2) Command: condor_status -schedd Name Machine TotalRunningJobs TotalIdleJobs TotalHeldJobs agrd0926@gridgk01.ra gridgk01.r 0 0 0 agrd0926@gridgk02.ra gridgk02.r 0 0 0 pleiades@gridui01.us gridui01.u 0 0 0 pleiades@ribera.cs.w ribera.cs. 0 0 0 pleiades@ron.cs.wisc ron.cs.wis 0 0 0 pleiades@vail.cs.wis vail.cs.wi 0 0 0 TotalRunningJobs TotalIdleJobs TotalHeldJobs TotalRunningJobs TotalIdleJobs TotalHeldJobs Total 0 0 0 Total 0 0 0
8
Demo (3) Command Command condor_status -schedd -l | grep -i Name | sed -e 's/Name[ ]*=[ ]*\"\(.*@.*\)\"/\1/g‘ condor_status -schedd -l | grep -i Name | sed -e 's/Name[ ]*=[ ]*\"\(.*@.*\)\"/\1/g‘ Output Output agrd0926@gridgk01.racf.bnl.gov agrd0926@gridgk01.racf.bnl.gov agrd0926@gridgk02.racf.bnl.gov agrd0926@gridgk02.racf.bnl.gov pleiades@gridui01.usatlas.bnl.gov pleiades@gridui01.usatlas.bnl.gov pleiades@ribera.cs.wisc.edu pleiades@ribera.cs.wisc.edu pleiades@ron.cs.wisc.edu pleiades@ron.cs.wisc.edu pleiades@vail.cs.wisc.edu pleiades@vail.cs.wisc.edu
9
Demo (4) Command: condor_status -schedd -long -constraint "is_glidein=?=true" or customized command condor_schedd_ad [schedd_name] or customized command condor_schedd_ad [schedd_name] MyType = "Scheduler“ TargetType = "" IS_GLIDEIN = TRUE CondorVersion = "$CondorVersion: 6.8.1 Sep 6 2006 $" CondorPlatform = "$CondorPlatform: I386-LINUX_RHEL3 $" Machine = "ron.cs.wisc.edu" QuillEnabled = FALSE ScheddIpAddr = " " MyAddress = " " NumUsers = 0 Name = "pleiades@ron.cs.wisc.edu" VirtualMemory = 0 TotalIdleJobs = 0 TotalRunningJobs = 0
10
Demo (5) How to submit jobs? How to submit jobs? Command: Command: condor_submit cgtest1 -remote pleiades@ron.cs.wisc.edu condor_submit cgtest1 -remote pleiades@ron.cs.wisc.edu Output: Output: condor_submit cgtest1 -remote pleiades@ron.cs.wisc.edu condor_submit cgtest1 -remote pleiades@ron.cs.wisc.edu Submitting job(s) Submitting job(s) WARNING: Log file /direct/usatlas+u/pleiades/test/log/nostos_echo.1.0 WARNING: Log file /direct/usatlas+u/pleiades/test/log/nostos_echo.1.0 is on NFS. is on NFS. This could cause log file corruption and is _not_ recommended. This could cause log file corruption and is _not_ recommended.. Logging submit event(s). Logging submit event(s). 1 job(s) submitted to cluster 1. 1 job(s) submitted to cluster 1. Spooling data files for 1 jobs... Spooling data files for 1 jobs... In PilotFactory project, cgtest1 would be replaced by a wrapper of pilotScheduler.py and its dependent programs included in transfer_input_files, so that the job that contains pilotScheduler program (i.e. Generator) can be submitted to the glidein schedd as a Condor-C job and then runs within the schedd as a scheduler universe job. In PilotFactory project, cgtest1 would be replaced by a wrapper of pilotScheduler.py and its dependent programs included in transfer_input_files, so that the job that contains pilotScheduler program (i.e. Generator) can be submitted to the glidein schedd as a Condor-C job and then runs within the schedd as a scheduler universe job. For more information, please check GPF in Pilot Factory Proposal For more information, please check GPF in Pilot Factory ProposalPilot Factory ProposalPilot Factory Proposal
11
Demo (6) Command: Command: condor_q -name pleiades@ron.cs.wisc.edu condor_q -name pleiades@ron.cs.wisc.edu Output: Output: -- Schedd: pleiades@ron.cs.wisc.edu : -- Schedd: pleiades@ron.cs.wisc.edu : ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 1.0 pleiades 2/26 15:31 0+00:00:00 C 0 9.8 ps auwfx 1.0 pleiades 2/26 15:31 0+00:00:00 C 0 9.8 ps auwfx 0 jobs; 0 idle, 0 running, 0 held 0 jobs; 0 idle, 0 running, 0 held
12
Documentation Updating Twiki page on Schedd-based Glidein Updating Twiki page on Schedd-based Glidein http://www.usatlas.bnl.gov/twiki/bin/view/AtlasSoftware/ScheddGlidein http://www.usatlas.bnl.gov/twiki/bin/view/AtlasSoftware/ScheddGlidein Condor-G and Glidein Performance and Functionality Accessment Condor-G and Glidein Performance and Functionality Accessment http://www.usatlas.bnl.gov/twiki/bin/view/AtlasSoftware/CondorExperience http://www.usatlas.bnl.gov/twiki/bin/view/AtlasSoftware/CondorExperience
13
Condor Utilities (1) For condor-G general tests, it is inconvenient to recreate job submission files … For condor-G general tests, it is inconvenient to recreate job submission files … condor_gen_gridjob: a program that automatically generates the submit file with condor_gen_gridjob: a program that automatically generates the submit file with simply a command: simply a command: [comm] condor_gen_gridjob --exec $HOME/myprog [comm] condor_gen_gridjob --exec $HOME/myprog --out $HOME/condor_test/ouput --out $HOME/condor_test/ouput --in $HOME/condor_test/input … --in $HOME/condor_test/input … [other commands] condor_gen_ccjob, condor_gen_vanilla, … etc [other commands] condor_gen_ccjob, condor_gen_vanilla, … etc Checking the individual classad published by a particular *schedd* Checking the individual classad published by a particular *schedd* e.g. Use condor_status –schedd –long to check for all *schedd* classads; e.g. Use condor_status –schedd –long to check for all *schedd* classads; however, it’s not straightforward for checking the published classad assoicated however, it’s not straightforward for checking the published classad assoicated with a particular instance of *schedd* condor_schedd_ad (done) with a particular instance of *schedd* condor_schedd_ad (done) [comm] condor_schedd_ad pleiades@gridgk01.racf.bnl.gov [comm] condor_schedd_ad pleiades@gridgk01.racf.bnl.gov
14
Condor Utilities (2) List the current avaiable *schedd* and check some of the important properties List the current avaiable *schedd* and check some of the important properties [usage] condor_schedd_list [-g|-h | … ] [usage] condor_schedd_list [-g|-h | … ] [comm] condor_schedd_list –g [comm] condor_schedd_list –g [output] [output] Listing glidein *schedd*... --------------------------------------agrd0926@gridgk01.racf.bnl.govpleiades@nostos.cs.wisc.eduusatlas3@tier2-02.uchicago.edu Some options for checking individual properties of a *schedd* are under way … Some options for checking individual properties of a *schedd* are under way … e.g. Machine = "tier2-02.uchicago.edu“ e.g. Machine = "tier2-02.uchicago.edu“ ScheddIpAddr = " “ ScheddIpAddr = " “ Name = "usatlas3@tier2-02.uchicago.edu“ (often needs to use in combination with other Name = "usatlas3@tier2-02.uchicago.edu“ (often needs to use in combination with other commands, e.g. submit jobs) commands, e.g. submit jobs) DaemonStartTime = 1172706559 DaemonStartTime = 1172706559 …
15
Condor Utilities (3) Other utilities for debugging Other utilities for debugging condor_pid_lookup condor_pid_lookup [comm] condor_pid_lookup -c gridgk01.racf.bnl.gov 20044 [comm] condor_pid_lookup -c gridgk01.racf.bnl.gov 20044 [output] [output] USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND agrd0926 20044 0.0 0.2 8536 4456 ? S Feb27 3:06 /usatlas/grid/agrd0926/Condor_glidein/6.8.1-i686-pc-Linux-2.4/condor_master -dyn –f Or, vise versa … Or, vise versa … [comm] condor_pid_lookup -c gridgk01.racf.bnl.gov condor_master [comm] condor_pid_lookup -c gridgk01.racf.bnl.gov condor_master condor_schedd_time condor_schedd_time [ comm] condor_schedd_time agrd0926@gridgk01.racf.bnl.gov [ comm] condor_schedd_time agrd0926@gridgk01.racf.bnl.gov [output] Fri 23 Feb 2007 12:18:31 AM EST [output] Fri 23 Feb 2007 12:18:31 AM EST [usage] degugging, can be used in combination with gridmanager log file and extract the desired [usage] degugging, can be used in combination with gridmanager log file and extract the desired section of information (condor_pid_lookup + condor_schedd_time) section of information (condor_pid_lookup + condor_schedd_time)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.