Download presentation
Presentation is loading. Please wait.
Published byMadlyn Brooks Modified over 9 years ago
1
Pilot Factory using Schedd Glidein Barnett Chiu BNL 10.04.07
2
Problem to solve(1) Pilot Probe the resource (http, environment, interpreter, other executables …etc) Pull jobs from remote server (e.g. Panda server) Matchmaking Group jobs in different categories E.g Production jobs, Analysis jobs (CHARMM …), Test jobs … Other criteria: Number of CPUs, RAM … etc
3
Problem to Solve (2) Current approach of pilot submissions Local pool : Vanilla Remote pool: Condor-G Large amounts of user jobs (production + analysis) ~ large amount of Condor-G pilot jobs ~ computational overhead on gatekeepers (e.g. large memory consumptions)
4
Solution Is there any way to bypass GRAM to submit jobs to remote machines? Local submissions, but how? We need something that continuously submit local pilot jobs on the gatekeeper Solution: Pilot Factory
5
Pilot Factory Overview Pilot Factory is an application that combines the following ideas: schedd glidein pilot submission program (or pilot generator) What is glidein? Mini-Condor pool on a remote machine A complete Condor pool has at least 5 components: i.e. master, startd, schedd, collector, negotiator Glidein: {master, startd}, {master, schedd}, … etc Properly configured condor daemons submitted as batch job
6
Glidein (1) Two major steps Condor-G #1: installation glidein setup script condor configuration file glidein startup script download Condor binaries (http, gsiftp …etc) Condor-G #2: execution exec glidein startup script condor_master
7
Glidein (2) Central Manager Collector Submit Host Master schedd master schedd master startd Tarball server master startd master schedd Execute hosts … master startd master startd Glidein types ~/Condor_glidein Startup script Glidein config {master, schedd …} ?
8
Schedd Glidein Logics based on startd glidein (two Condor-G to set up ) Usage: By running glidein schedd on gatekeeper, the schedd then serves as a gateway between submit host and grid sites Mini Condor pool with schedd functionalities: Submit host Maintain persistent queue of jobs Communicate with native batch system and forward user jobs Condor, PBS, LSF, …etc Manipulate job queues through the followoing commands: condor_submit,condor_rm, condor_q, condor_hold, condor_release, condor_prio Security Features (GSI) Who is authorized to set up Pilot Factory?
9
Schedd Glidein Example (1) Command: // schedd glidein #1 condor_glidein -count 1 -arch 6.8.1-i686-pc-Linux-2.4 -setup_jobmanager=jobmanager-fork gridgk01.racf.bnl.gov/jobmanager-fork -type schedd –forcesetup Command: // schedd glidein #2 condor_glidein -count 1 -arch 6.8.1-i686-pc-Linux-2.4 -setup_jobmanager=jobmanager-fork gridgk02.racf.bnl.gov/jobmanager-fork -type schedd –forcesetup Command : // schedd glidein # 3, #4, #5 condor_glidein -count 3 -arch 6.8.1-i686-pc-Linux-2.4 -setup_jobmanager=jobmanager-fork nostos.cs.wisc.edu/jobmanager-fork -type schedd –forcesetup Use fork since we want schedd to be on gatekeeper!
10
Schedd Glidein Example (2) Command: condor_status -schedd Name Machine TotalRunningJobs TotalIdleJobs TotalHeldJobs agrd0926@gridgk01.ra gridgk01.r 0 0 0 agrd0926@gridgk02.ra gridgk02.r 0 0 0 pleiades@gridui01.us gridui01.u 0 0 0 pleiades@ribera.cs.w ribera.cs. 0 0 0 pleiades@ron.cs.wisc ron.cs.wis 0 0 0 pleiades@vail.cs.wis vail.cs.wi 0 0 0 TotalRunningJobs TotalIdleJobs TotalHeldJobs Total 0 0 0
11
Pilot Submission Program (Generator) Communicate with a DB server that maintains information about pilot jobs E.g. pilot_type, pilot_queue Pulls desired pilot script from an external server Periodically submit pilot jobs (with pilot script as executable) condor_submit qsub? No, not necessary, since …
12
Build Pilot Factory with Glidein Schedd glidein installed and executed on the gatekeeper User submit a Condor-C job with pilot generator as the executable Generator runs on the gatekeeper as a local universe job supervised by the glidein schedd Generator submits pilots Types, frequency adjustable by users Depending on the native batch system, pilots can be submitted as grid universe jobs Along with GAHP and related binaries, schedd has the ability to communicate different batch systems master schedd JobManager LSF PBS schedd Grid Resource ~ Pilot generator
13
Pilot Factory Glidein requestSubmit Pilots Pilot Factory Gatekeeper with {Globus, Condor|PBS|…} Cluster Worker Nodes Submit Node (Collector, Master, Negotiator, Schedd) Connected to Collector master schedd ~
14
Future Work Integrating pilot with Condor startd to implement startd-based pilot the startd-based pilot retrieves the payload of a user job in the same way as does the generic pilot but in addition, it also inherits functionalities of Condor startd. Original intention was to run PFs with the startd-pilots on worker nodes (too greedy, unacceptable?) Using Condor started makes it easier to integrate with gLexec Transform Generic PF (GPF) to Startd PF (SPF)
15
Reference [1] Schedd GlideinSchedd Glidein [2] Pilot FactoryPilot Factory [3] glideinWMS: An advanced applicationglideinWMS on glideins
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.