Miron Livny Computer Sciences Department University of Wisconsin-Madison Condor and (the) Grid (one of the CS X in PPDG)
“ … Since the early days of mankind the primary motivation for the establishment of communities has been the idea that by being part of an organized group the capabilities of an individual are improved. The great progress in the area of inter-computer communication led to the development of means by which stand-alone processing sub-systems can be integrated into multi-computer ‘communities’. … “ Miron Livny, “ Study of Load Balancing Algorithms for Decentralized Distributed Processing Systems.”, Ph.D thesis, July 1983.
Condor as a... › … Grid › … window to the Grid › … manager of Grid resources › … a source of Grid technology
Main Condor capabilities › Management of large collections of distributively owned heterogeneous resources (CPU, storage, network, software) › Management of large (10K) collections of jobs. › Remote Execution › Remote I/O › Checkpointing › Matchmaking › System administration
Condor Deployment (that we know of) › More than 4000 CPUs world-wide › More than 1200 CPUs at UW › More than 200 CPUs at INFN › More than 800 CPUs in industry.
A Simple Scenario Study the behavior of F(x,y,z) for 20 values of x, 10 values of y and 3 values of z (20*10*3 = 600) F takes on the average 3 hours to compute on a “typical” workstation ( total = 1800 hours ) F requires a “moderate” (128MB) amount of memory F performs “little” I/O - (x,y,z) is 15 MB and F(x,y,z) is 40 MB
How Can Condor Help?
Step I - get organized! › Turn your workstation into a “Personal Condor” › Write a script that creates 600 input files for each of the (x,y,z) combinations › Submit a cluster of 600 jobs to your personal Condor › Write a script that collects the data from the 600 output files › Go on a long vacation … (2.5 months)
Your Personal Condor will... ›... keep an eye on your jobs and will keep you posted on their progress ›... implement your policy on when the jobs can run on your workstation ›... implement your policy on the execution order of the jobs ›.. add fault tolerance to your jobs › … keep a log of your job activities
your workstation personal Condor 600 Condor jobs
Step II - build a Grid › Install Condor on the machine next door. › Install Condor on the machines in the class room. › Install Condor on the O2K in the basement. › Configure these machines to be part of your Condor pool/grid. › Go on a shorter vacation...
your workstation personal Condor 600 Condor jobs Group Condor
Step III - Take advantage of your friends › Get permission from “friendly” Condor pools/Grids to access their resources › Configure your personal Condor to “flock” to these pools/grids › reconsider your vacation plans...
your workstation friendly Condor personal Condor 600 Condor jobs Group Condor
Step IV - Think big! › Get access (account(s) + certificate(s)) to Globus managed Grid resources › Submit 599 “To Globus” Condor glide- in jobs to your personal Condor › When all your jobs are done, remove any pending glide-in jobs › Take the rest of the afternoon off...
A “To-Globus” glide-in job will... › … transform itself into a Globus job, › submit itself to Globus managed Grid resource, › be monitored by your personal Condor, › once the Globus job is allocated a resource, it will use a GSIFTP server to fetch Condor agents, start them, and add the resource to your personal Condor, › vacate the resource before it is revoked by the remote scheduler
your workstation friendly Condor personal Condor 600 Condor jobs Globus Grid PBS LSF Condor Group Condor 599 glide-ins
VizBench - send us your data and we will send you back a movie (a SC99 demo by NCSA)
Frame Rendering Managed and Powered by a Personal Condor A locally installed Personal Condor is used by the VizBench server to manage, monitor and control the execution of frame rendering tasks, manage local rendering resources and locate remote and Grid resources that are capable and willing to render frames
VizBench Web Server Viz- Bench Personal Condor jobs UW Condor NCSA Condor UNM Supercluster BU O2K Globus Gatekeeper Globus Gatekeeper
Grid Obstacles › Ownership Distribution › Customer Awareness › Size and Uncertainties › Technology Evolution › Physical Distribution (Sociology) (Education) (Robustness) (Portability) (Technology)
Visit us at